Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'anyway.': 0.05; 'encoding': 0.05; 'output': 0.05; 'position,': 0.05; '-*-': 0.07; 'revision': 0.07; 'utf-8': 0.07; 'string': 0.09; 'bytes.': 0.09; 'coding:': 0.09; 'migration': 0.09; 'part,': 0.09; 'spaces': 0.09; 'python': 0.11; "(it's": 0.16; 'definition.': 0.16; 'delimiter': 0.16; 'dump': 0.16; 'finds': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'helps!': 0.16; 'hex': 0.16; 'redo': 0.16; 'skipped': 0.16; 'true:': 0.16; 'pushed': 0.16; 'skip:= 10': 0.16; 'wrote:': 0.18; 'all,': 0.19; 'bit': 0.19; 'code,': 0.22; 'replace': 0.24; 'mon,': 0.24; 'script': 0.25; 'this:': 0.26; 'second': 0.26; 'subject:/': 0.26; 'header:In- Reply-To:1': 0.27; 'chris': 0.29; 'am,': 0.29; "doesn't": 0.30; 'message-id:@mail.gmail.com': 0.30; 'went': 0.31; 'code': 0.31; 'comments': 0.31; 'easier': 0.31; 'lines': 0.31; 'posting': 0.31; 'concern': 0.31; 'continues': 0.31; 'figure': 0.32; 'could': 0.34; 'problem.': 0.35; 'skip:s 30': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; "didn't": 0.36; 'method': 0.36; 'shows': 0.36; 'application': 0.37; 'easiest': 0.38; 'ends': 0.38; 'to:addr:python-list': 0.38; 'little': 0.38; 'itself': 0.39; 'skip:. 10': 0.39; 'though,': 0.39; 'to:addr:python.org': 0.39; 'hope': 0.61; 'break': 0.61; 'forum': 0.61; 'skip:t 30': 0.61; 'full': 0.61; 'you.': 0.62; 'save': 0.62; 'reached': 0.63; 'assistance': 0.66; 'between': 0.67; 'jul': 0.74; 'invitation': 0.79; 'glad': 0.83; '(better': 0.84; "it'd": 0.84; 'critics': 0.91; 'deal,': 0.93; '2013': 0.98; 'invite': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=/b59VH4rYlC61JgXyiAeZ93TLNWt3i1ovr7MTEfZ7tk=; b=vmS54+h8znHIclS+ENmAG+2gtvYC98tDKM08C0vHcUI4R9yJC7z/+3C7b2gAWM1yhV S8gtbgoGVCAfIGmixlznacjCocmMWZI8R1aM+VuyUKH0R1PRZa3+PVoebj/UUieorTIS S17uBBgtQtLDR4CBUXtRgxui58xnzpnB0C2iPzlIqvsc3pmJjaWB2nwykSs44xa+9EF1 C31uPuiWbJwXpDcDcIjYjIb+xS7G1vu2caVUPuB7nMJm2Fnq1ildBatzIDc3Y2nq+0b6 u0wlgptlkGvDNNQDSNa4XLsrSlPk9t2q+q60HTp8mog9j6SJxDt6xz37ZUrOZ2c1eZxx jFmA== MIME-Version: 1.0 X-Received: by 10.52.120.77 with SMTP id la13mr10631127vdb.23.1373246237183; Sun, 07 Jul 2013 18:17:17 -0700 (PDT) In-Reply-To: References: Date: Mon, 8 Jul 2013 11:17:17 +1000 Subject: Re: hex dump w/ or w/out utf-8 chars From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 62 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1373246245 news.xs4all.nl 15926 [2001:888:2000:d::a6]:45511 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:50113 On Mon, Jul 8, 2013 at 10:22 AM, blatt wrote: > Hi all, > but a particular hello to Chris Angelino which with their critics and > suggestions pushed me to make a full revision of my application on > hex dump in presence of utf-8 chars. Hiya! Glad to have been of assistance :) > As I already told to Chris... critics are welcome! No problem. > # -*- coding: utf-8 -*- > # px.py vers. 11 (pxb.py) # python 2.6.6 > # hex-dump w/ or w/out utf-8 chars > # Using spaces as separators, this script shows > # (better than tabnanny) uncorrect indentations. > > # to save output > python pxb.py hex.txt > px9_out_hex.txt > > nLenN=3 # n. of digits for lines > > # chomp heaps and heaps of comments Little nitpick, since you did invite criticism :) When I went to copy and paste your code, I skipped all the comments and started at the line of hashes... and then didn't have the nLenN definition. Posting code to a forum like this is a huge invitation to try the code (it's the very easiest way to know what it does), so I would recommend having all your comments at the top, and all the code in a block underneath. It'd be that bit easier for us to help you. Not a big deal, though, I did figure out what was going on :) > sLineHex =lF[n].encode('hex').replace('20',' ') Here's the problem. Your hex string ends with "220a", and the replace() method doesn't concern itself with the divisions between bytes. It finds the second 2 of 22 and the leading 0 of 0a and replaces them. I think the best solution may be to avoid the .encode('hex') part, since it's not available in Python 3 anyway. Alternatively (if Py3 migration isn't a concern), you could do something like this: sLineHexND=lF[n].encode('hex') # ND = no delimiter (space) sLineHex =sLineHexND # No reason to redo the encoding twentypos=0 while True: twentypos=sLineHex.find("20",twentypos) if twentypos==-1: break # We've reached the end of the string if not twentypos%2: # It's at an even-numbered position, replace it sLineHex=sLineHex[:twentypos]+' '+sLineHex[twentypos+2:] twentypos+=1 # then continue on as before > sLineHexH =sLineHex[::2] > sLineHexL =sLineHex[1::2] > [ code continues ] Hope that helps! ChrisA