Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35158

Re: Py 3.3, unicode / upper()

Path csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <wxjmfauth@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.003
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'context': 0.05; 'python': 0.09; 'forcing': 0.09; 'pep': 0.09; 'subject:()': 0.09; 'to:addr:comp.lang.python': 0.09; 'cc:addr:python-list': 0.10; 'stored': 0.10; "wouldn't": 0.11; 'dec': 0.15; '8:40': 0.16; 'bug,': 0.16; 'cares': 0.16; 'dump': 0.16; 'non-english': 0.16; 'storing': 0.16; 'subject:3.3': 0.16; 'subject:unicode': 0.16; 'unfair': 0.16; 'wider': 0.16; 'wed,': 0.16; 'string': 0.17; 'wrote:': 0.17; 'bytes': 0.17; 'unicode': 0.17; '>>>': 0.18; 'memory': 0.18; 'platforms': 0.18; 'trying': 0.21; '3.2': 0.22; 'cc:2**0': 0.23; "i've": 0.23; 'linux': 0.24; 'least': 0.25; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header :User-Agent:1': 0.26; '----': 0.27; 'am,': 0.27; 'fixed': 0.28; 'actual': 0.28; 'chris': 0.28; 'character.': 0.29; 'represented': 0.29; 'strings,': 0.29; 'thinks': 0.29; '8bit%:5': 0.29; 'code': 0.31; 'anybody': 0.32; 'builds': 0.33; "he's": 0.33; 'problem': 0.33; 'everyone': 0.33; 'received:google.com': 0.34; 'compared': 0.35; 'especially': 0.35; 'doing': 0.35; 'received:209.85': 0.35; 'alone': 0.36; 'characters': 0.36; 'enough': 0.36; 'optimization': 0.37; 'does': 0.37; 'rather': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'things': 0.38; 'sure': 0.38; 'build': 0.39; 'space': 0.39; 'think': 0.40; 'your': 0.60; 'from:no real name:2**0': 0.60; 'skip:u 10': 0.60; 'most': 0.61; 'subject:, ': 0.61; 'containing': 0.61; 'solve': 0.62; 'different': 0.63; 'more': 0.63; 'our': 0.65; 'him,': 0.66; '>from': 0.75; 'counts': 0.81; 'all;': 0.84; 'complaint': 0.84; 'moral': 0.84; 'ocean.': 0.84
Newsgroups comp.lang.python
Date Wed, 19 Dec 2012 13:18:05 -0800 (PST)
In-Reply-To <mailman.1068.1355941696.29569.python-list@python.org>
Complaints-To groups-abuse@google.com
Injection-Info glegroupsg2000goo.googlegroups.com; posting-host=178.198.163.217; posting-account=ung4FAoAAAC46zhHJ0Nsnuox7M5gDvs_
References <2adb4a25-8ea3-441f-b8c0-ee6c87e4b19f@googlegroups.com> <kaslsb$iue$1@news.albasani.net> <CAPTjJmrLAe0i9rW6sCYkYBvpiPk2O=FHB0PgSq1dqNqh9Y7Zqg@mail.gmail.com> <mailman.1068.1355941696.29569.python-list@python.org>
User-Agent G2/1.0
X-Google-Web-Client true
X-Google-IP 178.198.163.217
MIME-Version 1.0
Subject Re: Py 3.3, unicode / upper()
From wxjmfauth@gmail.com
To comp.lang.python@googlegroups.com
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding quoted-printable
Cc Python <python-list@python.org>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Message-ID <mailman.1073.1355951888.29569.python-list@python.org> (permalink)
Lines 72
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1355951888 news.xs4all.nl 6851 [2001:888:2000:d::a6]:40267
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:35158

Show key headers only | View raw


Le mercredi 19 décembre 2012 19:27:38 UTC+1, Ian a écrit :
> On Wed, Dec 19, 2012 at 8:40 AM, Chris Angelico <rosuav@gmail.com> wrote:
> 
> > You may not be familiar with jmf. He's one of our resident trolls, and
> 
> > he has a bee in his bonnet about PEP 393 strings, on the basis that
> 
> > they take up more space in memory than a narrow build of Python 3.2
> 
> > would, for a string with lots of BMP characters and one non-BMP. In
> 
> > 3.2 narrow builds, strings were stored in UTF-16, with *surrogate
> 
> > pairs* for non-BMP characters. This means that len() counts them
> 
> > twice, as does string indexing/slicing. That's a major bug, especially
> 
> > as your Python code will do different things on different platforms -
> 
> > most Linux builds of 3.2 are "wide" builds, storing characters in four
> 
> > bytes each.
> 
> 
> 
> >From what I've been able to discern, his actual complaint about PEP
> 
> 393 stems from misguided moral concerns.  With PEP-393, strings that
> 
> can be fully represented in Latin-1 can be stored in half the space
> 
> (ignoring fixed overhead) compared to strings containing at least one
> 
> non-Latin-1 character.  jmf thinks this optimization is unfair to
> 
> non-English users and immoral; he wants Latin-1 strings to be treated
> 
> exactly like non-Latin-1 strings (I don't think he actually cares
> 
> about non-BMP strings at all; if narrow-build Unicode is good enough
> 
> for him, then it must be good enough for everybody).  Unfortunately
> 
> for him, the Latin-1 optimization is rather trivial in the wider
> 
> context of PEP-393, and simply removing that part alone clearly
> 
> wouldn't be doing anybody any favors.  So for him to get what he
> 
> wants, the entire PEP has to go.
> 
> 
> 
> It's rather like trying to solve the problem of wealth disparity by
> 
> forcing everyone to dump their excess wealth into the ocean.

----

latin-1 (iso-8859-1) ? are you sure ?

>>> sys.getsizeof('a')
26
>>> sys.getsizeof('ab')
27
>>> sys.getsizeof('aé')
39

Time to go to bed. More complete answer tomorrow.

jmf

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 06:23 -0800
  Re: Py 3.3, unicode / upper() Thomas Bach <thbach@students.uni-mainz.de> - 2012-12-19 15:43 +0100
  Re: Py 3.3, unicode / upper() Christian Heimes <christian@python.org> - 2012-12-19 15:52 +0100
    Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 12:55 -0800
      Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-19 14:23 -0700
        Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:42 -0800
        Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:42 -0800
      Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 13:01 +1100
      Re: Py 3.3, unicode / upper() Westley Martínez <anikom15@gmail.com> - 2012-12-19 18:53 -0800
    Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 12:55 -0800
  Re: Py 3.3, unicode / upper() Stefan Krah <stefan-usenet@bytereef.org> - 2012-12-19 16:01 +0100
  Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 02:17 +1100
  Re: Py 3.3, unicode / upper() Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-19 16:18 +0100
    Re: Py 3.3, unicode / upper() Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-19 16:22 +0100
    Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 02:40 +1100
      Re: Py 3.3, unicode / upper() Johannes Bauer <dfnsonfsduifb@gmx.de> - 2012-12-20 15:57 +0100
    Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-19 11:27 -0700
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 13:18 -0800
        Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-19 14:31 -0700
          Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:40 -0800
            Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:48 -0500
            Re: Py 3.3, unicode / upper() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-20 22:51 +0000
          Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:40 -0800
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-19 13:18 -0800
    Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-19 19:39 -0500
    Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 13:03 +1100
    Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-19 21:54 -0500
    Re: Py 3.3, unicode / upper() Westley Martínez <anikom15@gmail.com> - 2012-12-19 19:12 -0800
    Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-20 14:22 +1100
    Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 00:32 -0500
      Re: Py 3.3, unicode / upper() Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-20 05:51 +0000
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:57 -0800
        Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:30 -0500
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:57 -0800
    Re: Py 3.3, unicode / upper() Serhiy Storchaka <storchaka@gmail.com> - 2012-12-27 21:00 +0200
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-27 11:36 -0800
      Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-27 11:36 -0800
  Re: Py 3.3, unicode / upper() Christian Heimes <christian@python.org> - 2012-12-19 16:33 +0100
    Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-29 11:16 -0800
    Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-29 11:16 -0800
  Re: Py 3.3, unicode / upper() Benjamin Peterson <benjamin@python.org> - 2012-12-19 20:25 +0000
  Re: Py 3.3, unicode / upper() wxjmfauth@gmail.com - 2012-12-20 11:19 -0800
    Re: Py 3.3, unicode / upper() MRAB <python@mrabarnett.plus.com> - 2012-12-20 20:20 +0000
    Re: Py 3.3, unicode / upper() Chris Angelico <rosuav@gmail.com> - 2012-12-21 08:19 +1100
    Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:12 -0500
    Re: Py 3.3, unicode / upper() Terry Reedy <tjreedy@udel.edu> - 2012-12-20 17:59 -0500
    Re: Py 3.3, unicode / upper() Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-20 17:34 -0700

csiph-web