Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <roy-13C7CE.23494705012014@news.panix.com>
References: <lablra$1mc$2@ger.gmane.org> <labmaj$8u2$1@ger.gmane.org> <lad05k$gf6$1@ger.gmane.org> <CAPTjJmqBeoTLxXiKVcsvk395qgKt+Qv+jF_sOpzi7CgZmBjQcw@mail.gmail.com> <52CA13BD.4050708@stoneleaf.us> <mailman.5001.1388976943.18130.python-list@python.org> <roy-7ED5DF.23241105012014@news.panix.com> <mailman.5004.1388983234.18130.python-list@python.org> <roy-13C7CE.23494705012014@news.panix.com>
Date: Mon, 6 Jan 2014 15:59:34 +1100
Subject: Re: "More About Unicode in Python 2 and 3"
From: Chris Angelico <rosuav@gmail.com>
Cc: "python-list@python.org" <python-list@python.org>
Content-Type: text/plain; charset=UTF-8
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.5006.1388984378.18130.python-list@python.org>
Lines: 30
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:63270

On Mon, Jan 6, 2014 at 3:49 PM, Roy Smith <roy@panix.com> wrote:
> Thanks.  But, I see I didn't formulate my problem statement well.  I was
> (naively) assuming there wouldn't be a built-in codec for rot-13.  Let's
> assume there isn't; I was trying to find a case where you had to treat
> the data as integers in one place and text in another.  How would you do
> that?

I assumed that you would have checked that one, and answered
accordingly :) Though I did dig into the EBCDIC part of the question.

My thinking is that, if you're working with integers, you probably
mean either bytes (so encode it before you do stuff - typical for
crypto) or codepoints / Unicode ordinals (so use ord()/chr()). In
other languages there are ways to treat strings as though they were
arrays of integers (lots of C-derived languages treat 'a' as 97 and
"a"[0] as 97 also; some extend this to the full Unicode range), and
even there, I almost never actually use that identity much. There's
only one case that I can think of where I did a lot of
string<->integer-array transmutation, and that was using a diff
function that expected an integer array - if the transformation to and
from strings hadn't been really easy, that function would probably
have been written to take strings.

The Py2 str.translate() method was a little clunky to use, but
presumably fast to execute - you build up a lookup table and translate
through that. The Py3 equivalent takes a dict mapping the from and to
values. Pretty easy to use. And it lets you work with codepoints or
strings, as you please.

ChrisA