Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35134

Re: Py 3.3, unicode / upper()

Path csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'broken': 0.03; 'python': 0.09; 'encode': 0.09; 'pep': 0.09; 'pointless': 0.09; 'sane': 0.09; 'semantics': 0.09; 'subject:()': 0.09; '\xe2\x80\x94': 0.09; 'stored': 0.10; 'dec': 0.15; 'all?': 0.16; 'bug,': 0.16; 'buggy': 0.16; 'character).': 0.16; 'codec': 0.16; 'contrived': 0.16; 'design?': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'illustrate': 0.16; 'storing': 0.16; 'subject:3.3': 0.16; 'subject:unicode': 0.16; 'translation': 0.16; 'string': 0.17; 'wrote:': 0.17; 'byte': 0.17; 'bytes': 0.17; 'example.': 0.17; 'thu,': 0.17; 'unicode': 0.17; '>>>': 0.18; 'input': 0.18; 'memory': 0.18; 'platforms': 0.18; 'windows': 0.19; '(not': 0.20; '"",': 0.22; '3.2': 0.22; 'runs': 0.22; 'example': 0.23; 'linux': 0.24; 'header:In-Reply-To:1': 0.25; 'supported': 0.26; '(most': 0.27; 'am,': 0.27; 'handling': 0.27; 'newer': 0.27; 'message-id:@mail.gmail.com': 0.27; 'strings,': 0.29; 'character': 0.29; 'points': 0.29; 'code': 0.31; 'point': 0.31; 'gets': 0.32; 'file': 0.32; 'german': 0.32; 'achieving': 0.33; 'builds': 0.33; "he's": 0.33; 'idle': 0.33; 'traceback': 0.33; 'handle': 0.33; 'to:addr:python-list': 0.33; '(with': 0.33; 'likely': 0.33; "can't": 0.34; 'received:google.com': 0.34; 'faster': 0.35; 'especially': 0.35; 'received:209.85.220': 0.35; 'received:209.85': 0.35; 'but': 0.36; 'characters': 0.36; 'anything': 0.36; 'does': 0.37; 'why': 0.37; 'rather': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'things': 0.38; 'gives': 0.39; 'instead': 0.39; 'to:addr:python.org': 0.39; 'build': 0.39; 'space': 0.39; 'header:Received:5': 0.40; 'think': 0.40; 'your': 0.60; 'skip:u 10': 0.60; 'most': 0.61; 'subject:, ': 0.61; 'wide': 0.62; 'different': 0.63; 'email addr:gmail.com': 0.63; 'more': 0.63; 'show': 0.63; 'our': 0.65; '20,': 0.65; 'savings': 0.75; 'counts': 0.81; 'catastrophic': 0.84; 'different.': 0.84; 'ridiculously': 0.84; 'terrible': 0.84; 'beloved': 0.91; 'yours.': 0.93
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=78sLgvFKInyIZHimO4u8u9tfwN4VTjNUUop2qzo//YY=; b=jrHbYIxNmp0GMGueHUmND4VchOj8UBqr1NpVlsy38LNG/5cGCgmAy9/fX+j8LVNwLz CZbp+gBX9hDI5oAFmuKqy/9jkokmNHcclt87jh1TpXAt2p/et/9OprKLAqc+xMLgvdp0 ntTYnSXU3nvHrb2ZBpOLKg2VxIHE9ZknpwM0RBCBj6OhnHxrZ/cOcOfMyJ2as9G0L3CO kSdsmqwBWwBn3b3vpXYH1Bl2VLwX5foVlHNNcuR2RaJQ3S8L6qMl050+0V+7lWAe+/KF jVSlkhb6NZf+TSavGRj+KR68YB6RtIy3bBhJpgJDham52wtiOLdQqW1vmokPS7yy/pRp HClw==
MIME-Version 1.0
In-Reply-To <kaslsb$iue$1@news.albasani.net>
References <2adb4a25-8ea3-441f-b8c0-ee6c87e4b19f@googlegroups.com> <kaslsb$iue$1@news.albasani.net>
Date Thu, 20 Dec 2012 02:40:43 +1100
Subject Re: Py 3.3, unicode / upper()
From Chris Angelico <rosuav@gmail.com>
To python-list@python.org
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1057.1355931653.29569.python-list@python.org> (permalink)
Lines 61
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1355931653 news.xs4all.nl 6934 [2001:888:2000:d::a6]:53335
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:35134

Show key headers only | View raw


On Thu, Dec 20, 2012 at 2:18 AM, Johannes Bauer <dfnsonfsduifb@gmx.de> wrote:
> On 19.12.2012 15:23, wxjmfauth@gmail.com wrote:
>> I was using the German word "Straße" (Strasse) — German
>> translation from "street" — to illustrate the catastrophic and
>> completely wrong-by-design Unicode handling in Py3.3, this
>> time from a memory point of view (not speed):
>>
>>>>> sys.getsizeof('Straße')
>> 43
>>>>> sys.getsizeof('STRAẞE')
>> 50
>>
>> instead of a sane (Py3.2)
>>
>>>>> sys.getsizeof('Straße')
>> 42
>>>>> sys.getsizeof('STRAẞE')
>> 42
>
> How do those arbitrary numbers prove anything at all? Why do you draw
> the conclusion that it's broken by design? What do you expect? You're
> very vague here. Just to show how ridiculously pointless your numers
> are, your example gives 84 on Python3.2 for any input of yours.

You may not be familiar with jmf. He's one of our resident trolls, and
he has a bee in his bonnet about PEP 393 strings, on the basis that
they take up more space in memory than a narrow build of Python 3.2
would, for a string with lots of BMP characters and one non-BMP. In
3.2 narrow builds, strings were stored in UTF-16, with *surrogate
pairs* for non-BMP characters. This means that len() counts them
twice, as does string indexing/slicing. That's a major bug, especially
as your Python code will do different things on different platforms -
most Linux builds of 3.2 are "wide" builds, storing characters in four
bytes each.

PEP 393 brings wide build semantics to all Pythons, while achieving
memory savings better than a narrow build can (with PEP 393 strings,
any all-ASCII or all-Latin-1 strings will be stored one byte per
character). Every now and then, though, jmf points out *yet again*
that his beloved and buggy narrow build consumes less memory and runs
faster than the oh so terrible 3.3 on some contrived example. It gets
rather tiresome.

Interestingly, IDLE on my Windows box can't handle the bolded
characters very well...

>>> s="\U0001d407\U0001d41e\U0001d425\U0001d425\U0001d428, \U0001d430\U0001d428\U0001d42b\U0001d425\U0001d41d!"
>>> print(s)
Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    print(s)
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001d407'
in position 0: Non-BMP character not supported in Tk

I think this is most likely a case of "yeah, Windows XP just sucks".
But I have no reason or inclination to get myself a newer Windows to
find out if it's any different.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 06:23 -0800
  Re: Py 3.3, unicode / upper() Thomas Bach <thbach@students.uni-mainz.de> - 2012-12-19 15:43 +0100
  Re: Py 3.3, unicode / upper() Christian Heimes <christian@python.org> - 2012-12-19 15:52 +0100
    Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 12:55 -0800
      Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-19 14:23 -0700
        Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:42 -0800
        Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:42 -0800
      Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 13:01 +1100
      Re: Py 3.3, unicode / upper() Westley Martínez <anikom15@gmail.com> - 2012-12-19 18:53 -0800
    Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 12:55 -0800
  Re: Py 3.3, unicode / upper() Stefan Krah <stefan-usenet@bytereef.org> - 2012-12-19 16:01 +0100
  Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 02:17 +1100
  Re: Py 3.3, unicode / upper() Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-19 16:18 +0100
    Re: Py 3.3, unicode / upper() Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-19 16:22 +0100
    Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 02:40 +1100
      Re: Py 3.3, unicode / upper() Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-20 15:57 +0100
    Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-19 11:27 -0700
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 13:18 -0800
        Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-19 14:31 -0700
          Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:40 -0800
            Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:48 -0500
            Re: Py 3.3, unicode / upper() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-20 22:51 +0000
          Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:40 -0800
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 13:18 -0800
    Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-19 19:39 -0500
    Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 13:03 +1100
    Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-19 21:54 -0500
    Re: Py 3.3, unicode / upper() Westley Martínez <anikom15@gmail.com> - 2012-12-19 19:12 -0800
    Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 14:22 +1100
    Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 00:32 -0500
      Re: Py 3.3, unicode / upper() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-20 05:51 +0000
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:57 -0800
        Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:30 -0500
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:57 -0800
    Re: Py 3.3, unicode / upper() Serhiy Storchaka <storchaka@gmail.com> - 2012-12-27 21:00 +0200
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-27 11:36 -0800
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-27 11:36 -0800
  Re: Py 3.3, unicode / upper() Christian Heimes <christian@python.org> - 2012-12-19 16:33 +0100
    Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-29 11:16 -0800
    Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-29 11:16 -0800
  Re: Py 3.3, unicode / upper() Benjamin Peterson <benjamin@python.org> - 2012-12-19 20:25 +0000
  Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:19 -0800
    Re: Py 3.3, unicode / upper() MRAB <python@mrabarnett.plus.com> - 2012-12-20 20:20 +0000
    Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-21 08:19 +1100
    Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:12 -0500
    Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:59 -0500
    Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-20 17:34 -0700

csiph-web