Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #50110
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2013-07-07 17:22 -0700 |
| Message-ID | <a35609c1-e56f-4180-8176-4405264da0a2@googlegroups.com> (permalink) |
| Subject | hex dump w/ or w/out utf-8 chars |
| From | blatt <ferdy.blatsco@gmail.com> |
Hi all,
but a particular hello to Chris Angelino which with their critics and
suggestions pushed me to make a full revision of my application on
hex dump in presence of utf-8 chars.
If you are not using python 3, the utf-8 codec can add further programming
problems, especially if you are not a guru....
The script seems very long but I commented too much ... sorry.
It is very useful (at least IMHO...)
It works under Linux. but there is still a little problem which I didn't
solve (at least programmatically...).
# -*- coding: utf-8 -*-
# px.py vers. 11 (pxb.py) # python 2.6.6
# hex-dump w/ or w/out utf-8 chars
# Using spaces as separators, this script shows
# (better than tabnanny) uncorrect indentations.
# to save output > python pxb.py hex.txt > px9_out_hex.txt
nLenN=3 # n. of digits for lines
# version almost thoroughly rewritten on the ground of
# the critics and modifications suggested by Chris Angelico
# in the first version the utf-8 conversion to hex was shown horizontaly:
# 005 # qwerty: non è unicode bensì ascii
# 2 7767773 666 ca 7666666 6667ca 676660
# 3 175249a efe 38 5e93f45 25e33c 13399a
# ... but I had to insert additional chars to keep the
# synchronization between the literal and the hex part
# 005 # qwerty: non è. unicode bensì. ascii
# 2 7767773 666 ca 7666666 6667ca 676660
# 3 175249a efe 38 5e93f45 25e33c 13399a
# in the second version I followed Chris suggestion:
# "to show the hex utf-8 vertically"
# 005 # qwerty: non è unicode bensì ascii
# 2 7767773 666 c 7666666 6667c 676660
# 3 175249a efe 3 5e93f45 25e33 13399a
# a a
# 8 c
# between the two solutions, I selected the first one + syncronization,
# which seems more compact and easier to program (... I'm lazy...)
# various run options:
# std : python px.py file
# bash cat : cat file | python px.py (alias hex)
# bash echo: echo line | python px.py " "
# works on any n. of bytes for utf-8
# For the user: it is helpful to have in a separate file
# all special characters of interest, together with their names.
# error:
# echo '345"789"'|hex > 345"789" 345"789"
# 33323332 instead of 333233320
# 3452789 a " " 34527892a
# ... correction: avoiding "\n at end of test-line
# echo "345'789'"|hex > 345'789'
# 333233320
# 34577897a
# same error in every run option
# If someone can solve this bug...
###################
import fileinput
import sys, commands
lF=[] # input file as list
for line in fileinput.input(): # handles all the details of args-or-stdin
lF.append(line)
sSpacesXLN = ' ' * (nLenN+1)
for n in xrange(len(lF)):
sLineHexND=lF[n].encode('hex') # ND = no delimiter (space)
sLineHex =lF[n].encode('hex').replace('20',' ')
sLineHexH =sLineHex[::2]
sLineHexL =sLineHex[1::2]
sSynchro=''
for k in xrange(0,len(sLineHexND),2):
if sLineHexND[k]<'8':
sSynchro+= sLineHexND[k]+sLineHexND[k+1]
k+=1
elif sLineHexND[k]=='c':
sSynchro+='c'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+'2e'
k+=3
elif sLineHexND[k]=='e':
sSynchro+='e'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+\
sLineHexND[k+4]+sLineHexND[k+5]+'2e2e'
k+=5
# text output (synchroinized)
print str(n+1).zfill(nLenN)+' '+sSynchro.decode('hex'),
print sSpacesXLN + sLineHexH
print sSpacesXLN + sLineHexL+ '\n'
If there are problems of understanding, probably due to fonts, the best
thing is import it in an editor with "mono" fonts...
As I already told to Chris... critics are welcome!
Bye, Blatt.
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
hex dump w/ or w/out utf-8 chars blatt <ferdy.blatsco@gmail.com> - 2013-07-07 17:22 -0700
Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-08 11:17 +1000
Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-08 05:48 +0000
Re: hex dump w/ or w/out utf-8 chars ferdy.blatsco@gmail.com - 2013-07-08 10:31 -0700
Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 03:52 +1000
Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 06:18 -0700
Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-11 23:32 +1000
Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 11:42 -0700
Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 11:44 -0700
Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-12 03:18 +0000
Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-12 14:42 -0700
Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-12 12:16 +1000
Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-13 00:56 -0700
Re: hex dump w/ or w/out utf-8 chars Lele Gaifax <lele@metapensiero.it> - 2013-07-13 10:24 +0200
Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 09:36 +0000
Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-13 19:46 +1000
Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 09:49 +0000
Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-13 20:09 +1000
Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-13 07:37 -0700
Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-13 15:02 -0400
Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-14 01:20 -0700
Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-14 10:44 +0000
Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-14 06:44 -0700
Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-24 06:28 -0700
Re: hex dump w/ or w/out utf-8 chars Neil Hodgson <nhodgson@iinet.net.au> - 2013-07-14 09:17 +1000
Re: hex dump w/ or w/out utf-8 chars ferdy.blatsco@gmail.com - 2013-07-08 10:53 -0700
Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 04:07 +1000
Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-08 16:56 -0400
Re: hex dump w/ or w/out utf-8 chars Neil Cerutti <neilc@norwich.edu> - 2013-07-09 12:22 +0000
Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-09 08:54 -0400
Re: hex dump w/ or w/out utf-8 chars Neil Cerutti <neilc@norwich.edu> - 2013-07-09 13:00 +0000
Re: hex dump w/ or w/out utf-8 chars Skip Montanaro <skip@pobox.com> - 2013-07-09 08:18 -0500
Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-09 09:23 -0400
Re: hex dump w/ or w/out utf-8 chars MRAB <python@mrabarnett.plus.com> - 2013-07-08 22:38 +0100
Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 07:49 +1000
Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 06:53 +0000
Re: hex dump w/ or w/out utf-8 chars Joshua Landau <joshua.landau.ws@gmail.com> - 2013-07-08 23:02 +0100
Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-08 18:45 -0400
Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 08:51 +1000
Re: hex dump w/ or w/out utf-8 chars MRAB <python@mrabarnett.plus.com> - 2013-07-09 00:32 +0100
Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 06:46 +0000
Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 07:00 +0000
Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-09 02:34 -0700
Re: hex dump w/ or w/out utf-8 chars Chris “Kwpolska” Warrick <kwpolska@gmail.com> - 2013-07-09 12:15 +0200
Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 16:32 +0000
Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-10 01:52 -0700
Re: hex dump w/ or w/out utf-8 chars Joshua Landau <joshua@landau.ws> - 2013-07-12 23:01 +0100
Re: hex dump w/ or w/out utf-8 chars Tim Roberts <timr@probo.com> - 2013-07-12 20:42 -0700
Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 04:51 +0000
csiph-web