Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #59544

Re: python 3.3 repr

References (2 earlier) <roy-66E351.09004515112013@news.panix.com> <BD21979F-E8CB-41EA-9136-6C052D65DEE0@panix.com> <mailman.2660.1384526610.18130.python-list@python.org> <0d383a3c-247f-4b6a-9a18-7e7fadeb6047@googlegroups.com> <52864018.9020205@chamonix.reportlab.co.uk>
Date 2013-11-16 03:01 +1100
Subject Re: python 3.3 repr
From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.2674.1384531302.18130.python-list@python.org> (permalink)

Show all headers | View raw


On Sat, Nov 16, 2013 at 2:39 AM, Robin Becker <robin@reportlab.com> wrote:
>> Dealing with bytes and Unicode is complicated, and the 2->3 transition is
>> not easy, but let's please not spread the misunderstanding that somehow the
>> Flexible String Representation is at fault.  However you store Unicode code
>> points, they are different than bytes, and it is complex having to deal with
>> both.  You can't somehow make the dichotomy go away, you can only choose
>> where you want to think about it.
>>
>> --Ned.
>
> .......
> I don't think that's what I said; the flexible representation is just an
> added complexity that has come about because of the wish to store strings in
> a compact way. The requirement for such complexity is the unicode type
> itself (especially the storage requirements) which necessitated some
> remedial action.
>
> There's no point in fighting the change to using unicode. The type wasn't
> required for any technical reason as other languages didn't go this route
> and are reasonably ok, but there's no doubt the change made things more
> difficult.

There's no perceptible difference between a 3.2 wide build and the 3.3
flexible representation. (Differences with narrow builds are bugs, and
have now been fixed.) As far as your script's concerned, Python 3.3
always stores strings in UTF-32, four bytes per character. It just
happens to be way more efficient on memory, most of the time.

Other languages _have_ gone for at least some sort of Unicode support.
Unfortunately quite a few have done a half-way job and use UTF-16 as
their internal representation. That means there's no difference
between U+0012, U+0123, and U+1234, but U+12345 suddenly gets handled
differently. ECMAScript actually specifies the perverse behaviour of
treating codepoints >U+FFFF as two elements in a string, because it's
just too costly to change.

There are a small number of languages that guarantee correct Unicode
handling. I believe bash scripts get this right (though I haven't
tested; string manipulation in bash isn't nearly as rich as a proper
text parsing language, so I don't dig into it much); Pike is a very
Python-like language, and PEP 393 made Python even more Pike-like,
because Pike's string has been variable width for as long as I've
known it. A handful of other languages also guarantee UTF-32
semantics. All of them are really easy to work with; instead of
writing your code and then going "Oh, I wonder what'll happen if I
give this thing weird characters?", you just write your code, safe in
the knowledge that there is no such thing as a "weird character"
(except for a few in the ASCII set... you may find that code breaks if
given a newline in the middle of something, or maybe the slash
confuses you).

Definitely don't fight the change to Unicode, because it's not a
change at all... it's just fixing what was buggy. You already had a
difference between bytes and characters, you just thought you could
ignore it.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 11:28 +0000
  Re: python 3.3 repr Ned Batchelder <ned@nedbatchelder.com> - 2013-11-15 03:38 -0800
    Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 12:16 +0000
      Re: python 3.3 repr Ned Batchelder <ned@nedbatchelder.com> - 2013-11-15 05:54 -0800
        Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 14:29 +0000
        Re: python 3.3 repr Serhiy Storchaka <storchaka@gmail.com> - 2013-11-15 16:40 +0200
        Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 14:52 +0000
    Re: python 3.3 repr Roy Smith <roy@panix.com> - 2013-11-15 09:25 -0500
    Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 14:43 +0000
      Re: python 3.3 repr Ned Batchelder <ned@nedbatchelder.com> - 2013-11-15 07:08 -0800
        Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 15:39 +0000
        Re: python 3.3 repr Antoon Pardon <antoon.pardon@rece.vub.ac.be> - 2013-11-15 16:49 +0100
        Re: python 3.3 repr Chris Angelico <rosuav@gmail.com> - 2013-11-16 03:01 +1100
          Re: python 3.3 repr Neil Cerutti <neilc@norwich.edu> - 2013-11-15 17:47 +0000
            Re: python 3.3 repr Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-16 01:09 +0000
      Re: python 3.3 repr Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-15 17:10 +0000
        Re: python 3.3 repr Chris Angelico <rosuav@gmail.com> - 2013-11-16 04:29 +1100
        Re: python 3.3 repr Cousin Stanley <cousinstanley@gmail.com> - 2013-11-15 10:45 -0700
    Re: python 3.3 repr Joel Goldstick <joel.goldstick@gmail.com> - 2013-11-15 09:50 -0500
    Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 15:03 +0000
    Re: python 3.3 repr Joel Goldstick <joel.goldstick@gmail.com> - 2013-11-15 10:07 -0500
    Re: python 3.3 repr Chris Angelico <rosuav@gmail.com> - 2013-11-16 02:08 +1100
    Re: python 3.3 repr Robin Becker <robin@reportlab.com> - 2013-11-15 15:18 +0000
    Re: python 3.3 repr Roy Smith <roy@panix.com> - 2013-11-15 10:32 -0500
    Re: python 3.3 repr William Ray Wing <wrw@mac.com> - 2013-11-15 11:30 -0500
    Re: python 3.3 repr Zero Piraeus <z@etiol.net> - 2013-11-15 14:06 -0300
    Re: python 3.3 repr Chris Angelico <rosuav@gmail.com> - 2013-11-16 04:11 +1100
    Re: python 3.3 repr Serhiy Storchaka <storchaka@gmail.com> - 2013-11-15 19:37 +0200
  Re: python 3.3 repr Gene Heskett <gheskett@wdtv.com> - 2013-11-15 11:36 -0500
  Re: python 3.3 repr Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-15 17:58 +0000
  Re: python 3.3 repr Gene Heskett <gheskett@wdtv.com> - 2013-11-15 14:23 -0500

csiph-web