Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <a35609c1-e56f-4180-8176-4405264da0a2@googlegroups.com>
References: <a35609c1-e56f-4180-8176-4405264da0a2@googlegroups.com>
Date: Mon, 8 Jul 2013 11:17:17 +1000
Subject: Re: hex dump w/ or w/out utf-8 chars
From: Chris Angelico <rosuav@gmail.com>
To: python-list@python.org
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.4362.1373246245.3114.python-list@python.org>
Lines: 62
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:50113

On Mon, Jul 8, 2013 at 10:22 AM, blatt <ferdy.blatsco@gmail.com> wrote:
> Hi all,
> but a particular hello to Chris Angelino which with their critics and
> suggestions pushed me to make a full revision of my application on
> hex dump in presence of utf-8 chars.

Hiya! Glad to have been of assistance :)

> As I already told to Chris... critics are welcome!

No problem.

> # -*- coding: utf-8 -*-
> # px.py vers. 11 (pxb.py)   # python 2.6.6
> # hex-dump w/ or w/out utf-8 chars
> # Using spaces as separators, this script shows
> # (better than tabnanny)  uncorrect  indentations.
>
> # to save output > python pxb.py hex.txt > px9_out_hex.txt
>
> nLenN=3          # n. of digits for lines
>
> # chomp heaps and heaps of comments

Little nitpick, since you did invite criticism :) When I went to copy
and paste your code, I skipped all the comments and started at the
line of hashes... and then didn't have the nLenN definition. Posting
code to a forum like this is a huge invitation to try the code (it's
the very easiest way to know what it does), so I would recommend
having all your comments at the top, and all the code in a block
underneath. It'd be that bit easier for us to help you. Not a big
deal, though, I did figure out what was going on :)

>     sLineHex  =lF[n].encode('hex').replace('20','  ')

Here's the problem. Your hex string ends with "220a", and the
replace() method doesn't concern itself with the divisions between
bytes. It finds the second 2 of 22 and the leading 0 of 0a and
replaces them.

I think the best solution may be to avoid the .encode('hex') part,
since it's not available in Python 3 anyway. Alternatively (if Py3
migration isn't a concern), you could do something like this:

    sLineHexND=lF[n].encode('hex')     # ND = no delimiter (space)
    sLineHex  =sLineHexND # No reason to redo the encoding
    twentypos=0
    while True:
        twentypos=sLineHex.find("20",twentypos)
        if twentypos==-1: break # We've reached the end of the string
        if not twentypos%2: # It's at an even-numbered position, replace it
            sLineHex=sLineHex[:twentypos]+'  '+sLineHex[twentypos+2:]
        twentypos+=1
    # then continue on as before

>     sLineHexH =sLineHex[::2]
>     sLineHexL =sLineHex[1::2]
> [ code continues ]

Hope that helps!

ChrisA