Path: csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <4853fddf-5e4d-4c11-9a19-5a1dbe4cbc20@googlegroups.com>
References: <a81cd504-d889-4aa1-9daa-6df3448b4da8@googlegroups.com> <1874857c-68ef-4c1b-b15a-46ef47df9445@googlegroups.com> <mailman.3784.1345854291.4697.python-list@python.org> <1cb3f062-eb45-4b0c-977b-76afb099923c@googlegroups.com> <k1a40u$r47$2@ger.gmane.org> <mailman.3793.1345888006.4697.python-list@python.org> <f6266544-d67c-4589-a3ed-c14428ead237@googlegroups.com> <mailman.3816.1345933655.4697.python-list@python.org> <4853fddf-5e4d-4c11-9a19-5a1dbe4cbc20@googlegroups.com>
From: Ian Kelly <ian.g.kelly@gmail.com>
Date: Sun, 26 Aug 2012 09:50:41 -0600
Subject: Re: Flexible string representation, unicode, typography, ...
To: Python <python-list@python.org>
Content-Type: text/plain; charset=ISO-8859-1
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3842.1345996272.4697.python-list@python.org>
Lines: 32
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:27932

On Sun, Aug 26, 2012 at 12:59 AM,  <wxjmfauth@gmail.com> wrote:
> Sorry, you do not get it.
>
> The rune is an alias for int32. A sequence of runes is a
> sequence of int32's. Go do not spend its time in using a
> machinery to work with, to differentiate, to keep in memory
> this sequence according to the *characers* composing this
> "array of code points".
>
> The message is even stronger. Use runes to work comfortably [*]
> with unicode:
> rune -> int32 -> utf32 -> unicode (the perfect scheme, cann't be
> better)

I understand what rune is.  I think you've missed my complaint, which
is that although rune is the basic building block of Unicode strings
-- representing a single Unicode character -- strings in Go are not
built from runes but from bytes.  If you want to do any actual work
with Unicode strings, then you have to first convert them to runes or
arrays of runes.  The conceptual cost of this is that the object
you're working with is no longer a string.

You call this the "perfect scheme" for working with Unicode.  Why does
the "perfect scheme" for Unicode make it *easier* to write buggy code
that only works for ASCII than to write correct code that works for
all characters?  This is IMO where Python 3 gets it right.  When you
want to work with Unicode strings, you just work with Unicode strings
-- none of this nonsense of first explicitly converting the string to
an array of ints that looks nothing like a string at a high level.
The only place Python 3 makes you worry about converting strings is at
the boundaries of your program, where decoding from bytes to strings
and back is necessary.