Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #50110

hex dump w/ or w/out utf-8 chars

Newsgroups comp.lang.python
Date 2013-07-07 17:22 -0700
Message-ID <a35609c1-e56f-4180-8176-4405264da0a2@googlegroups.com> (permalink)
Subject hex dump w/ or w/out utf-8 chars
From blatt <ferdy.blatsco@gmail.com>

Show all headers | View raw


Hi all,
but a particular hello to Chris Angelino which with their critics and
suggestions pushed me to make a full revision of my application on
hex dump in presence of utf-8 chars.
If you are not using python 3, the utf-8 codec can add further programming
problems, especially if you are not a guru....
The script seems very long but I commented too much ... sorry.
It is very useful (at least IMHO...)
It works under Linux. but there is still a little problem which I didn't
solve (at least programmatically...).


# -*- coding: utf-8 -*-
# px.py vers. 11 (pxb.py)   # python 2.6.6
# hex-dump w/ or w/out utf-8 chars
# Using spaces as separators, this script shows
# (better than tabnanny)  uncorrect  indentations.

# to save output > python pxb.py hex.txt > px9_out_hex.txt

nLenN=3          # n. of digits for lines

# version almost thoroughly rewritten on the ground of
# the critics and modifications suggested by Chris Angelico

# in the first version the utf-8 conversion to hex was shown horizontaly:

# 005 # qwerty: non è unicode bensì ascii
#     2 7767773 666 ca 7666666 6667ca 676660
#     3 175249a efe 38 5e93f45 25e33c 13399a

# ... but I had to insert additional chars to keep the
#     synchronization between the literal and the hex part

# 005 # qwerty: non è. unicode bensì. ascii
#     2 7767773 666 ca 7666666 6667ca 676660
#     3 175249a efe 38 5e93f45 25e33c 13399a

# in the second version I followed Chris suggestion:
# "to show the hex utf-8 vertically"

# 005 # qwerty: non è unicode bensì ascii
#     2 7767773 666 c 7666666 6667c 676660
#     3 175249a efe 3 5e93f45 25e33 13399a
#                   a             a
#                   8             c

# between the two solutions, I selected the first one + syncronization,
#     which seems more compact and easier to program (... I'm lazy...)

# various run options:
# std      :             python px.py file
# bash cat : cat  file | python px.py (alias hex)
# bash echo: echo line | python px.py    "    "

# works on any n. of bytes for utf-8

# For the user: it is helpful to have in a separate file
# all special characters of interest, together with their names.

# error:

# echo '345"789"'|hex    > 345"789"              345"789"
#                          33323332  instead of  333233320
#                          3452789 a    "    "   34527892a

# ... correction: avoiding "\n at end of test-line
# echo "345'789'"|hex   >  345'789'
#                          333233320
#                          34577897a

# same error in every run option

# If someone can solve this bug...

###################


import fileinput
import sys, commands

lF=[]                           # input file as list
for line in fileinput.input():  # handles all the details of args-or-stdin
    lF.append(line)
sSpacesXLN = ' ' * (nLenN+1)


for n in xrange(len(lF)):
    sLineHexND=lF[n].encode('hex')     # ND = no delimiter (space)
    sLineHex  =lF[n].encode('hex').replace('20','  ')
    sLineHexH =sLineHex[::2]
    sLineHexL =sLineHex[1::2]

    sSynchro=''
    for k in xrange(0,len(sLineHexND),2):
        if sLineHexND[k]<'8':
            sSynchro+= sLineHexND[k]+sLineHexND[k+1]
            k+=1
        elif sLineHexND[k]=='c':
            sSynchro+='c'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+'2e'
            k+=3
        elif sLineHexND[k]=='e':
            sSynchro+='e'+sLineHexND[k+1]+sLineHexND[k+2]+sLineHexND[k+3]+\
                          sLineHexND[k+4]+sLineHexND[k+5]+'2e2e'
            k+=5

    # text output (synchroinized)
    print str(n+1).zfill(nLenN)+' '+sSynchro.decode('hex'),
    print sSpacesXLN + sLineHexH
    print sSpacesXLN + sLineHexL+ '\n'


If there are problems of understanding, probably due to fonts, the best
thing is import it in an editor with "mono" fonts...

As I already told to Chris... critics are welcome!

Bye, Blatt.









Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

hex dump w/ or w/out utf-8 chars blatt <ferdy.blatsco@gmail.com> - 2013-07-07 17:22 -0700
  Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-08 11:17 +1000
  Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-08 05:48 +0000
  Re: hex dump w/ or w/out utf-8 chars ferdy.blatsco@gmail.com - 2013-07-08 10:31 -0700
    Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 03:52 +1000
      Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 06:18 -0700
        Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-11 23:32 +1000
          Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 11:42 -0700
            Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-11 11:44 -0700
            Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-12 03:18 +0000
              Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-12 14:42 -0700
            Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-12 12:16 +1000
              Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-13 00:56 -0700
                Re: hex dump w/ or w/out utf-8 chars Lele Gaifax <lele@metapensiero.it> - 2013-07-13 10:24 +0200
                Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 09:36 +0000
                Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-13 19:46 +1000
                Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 09:49 +0000
                Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-13 20:09 +1000
                Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-13 07:37 -0700
                Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-13 15:02 -0400
                Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-14 01:20 -0700
                Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-14 10:44 +0000
                Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-14 06:44 -0700
                Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-24 06:28 -0700
                Re: hex dump w/ or w/out utf-8 chars Neil Hodgson <nhodgson@iinet.net.au> - 2013-07-14 09:17 +1000
  Re: hex dump w/ or w/out utf-8 chars ferdy.blatsco@gmail.com - 2013-07-08 10:53 -0700
    Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 04:07 +1000
    Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-08 16:56 -0400
      Re: hex dump w/ or w/out utf-8 chars Neil Cerutti <neilc@norwich.edu> - 2013-07-09 12:22 +0000
        Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-09 08:54 -0400
          Re: hex dump w/ or w/out utf-8 chars Neil Cerutti <neilc@norwich.edu> - 2013-07-09 13:00 +0000
            Re: hex dump w/ or w/out utf-8 chars Skip Montanaro <skip@pobox.com> - 2013-07-09 08:18 -0500
            Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-09 09:23 -0400
    Re: hex dump w/ or w/out utf-8 chars MRAB <python@mrabarnett.plus.com> - 2013-07-08 22:38 +0100
    Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 07:49 +1000
      Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 06:53 +0000
    Re: hex dump w/ or w/out utf-8 chars Joshua Landau <joshua.landau.ws@gmail.com> - 2013-07-08 23:02 +0100
    Re: hex dump w/ or w/out utf-8 chars Dave Angel <davea@davea.name> - 2013-07-08 18:45 -0400
    Re: hex dump w/ or w/out utf-8 chars Chris Angelico <rosuav@gmail.com> - 2013-07-09 08:51 +1000
    Re: hex dump w/ or w/out utf-8 chars MRAB <python@mrabarnett.plus.com> - 2013-07-09 00:32 +0100
      Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 06:46 +0000
    Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 07:00 +0000
      Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-09 02:34 -0700
        Re: hex dump w/ or w/out utf-8 chars Chris “Kwpolska” Warrick <kwpolska@gmail.com> - 2013-07-09 12:15 +0200
          Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-09 16:32 +0000
            Re: hex dump w/ or w/out utf-8 chars wxjmfauth@gmail.com - 2013-07-10 01:52 -0700
        Re: hex dump w/ or w/out utf-8 chars Joshua Landau <joshua@landau.ws> - 2013-07-12 23:01 +0100
          Re: hex dump w/ or w/out utf-8 chars Tim Roberts <timr@probo.com> - 2013-07-12 20:42 -0700
          Re: hex dump w/ or w/out utf-8 chars Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-13 04:51 +0000

csiph-web