Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!cs.uu.nl!news.stack.nl!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'ascii': 0.07; 'think,': 0.07; 'python': 0.09; 'subject:()': 0.09; 'to:addr:comp.lang.python': 0.09; 'cc:addr:python-list': 0.10; 'dec': 0.15; 'anybody.': 0.16; 'sees': 0.16; 'subject:3.3': 0.16; 'subject:unicode': 0.16; 'wed,': 0.16; 'wrote:': 0.17; 'unicode': 0.17; '>>>': 0.18; 'memory': 0.18; 'posted': 0.22; 'cc:2**0': 0.23; 'user.': 0.23; 'non': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; '>>>>': 0.29; 'yes.': 0.29; '8bit%:5': 0.29; 'this.': 0.29; "i'm": 0.29; '-----': 0.32; 'received:google.com': 0.34; 'pm,': 0.35; 'received:209.85': 0.35; 'serve': 0.36; 'characters': 0.36; 'compare': 0.36; 'should': 0.36; 'optimization': 0.37; 'uses': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'sure': 0.38; 'from:no real name:2**0': 0.60; 'further': 0.61; 'subject:, ': 0.61; 'more': 0.63; 'here': 0.65; 'user,': 0.69; 'savings': 0.75 Newsgroups: comp.lang.python Date: Thu, 20 Dec 2012 11:40:21 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=178.198.163.217; posting-account=ung4FAoAAAC46zhHJ0Nsnuox7M5gDvs_ References: <2adb4a25-8ea3-441f-b8c0-ee6c87e4b19f@googlegroups.com> <1fb2010e-73e4-4025-bb93-12ce7992ddab@googlegroups.com> User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-IP: 178.198.163.217 MIME-Version: 1.0 Subject: Re: Py 3.3, unicode / upper() From: wxjmfauth@gmail.com To: comp.lang.python@googlegroups.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: Python X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Message-ID: Lines: 64 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1356037281 news.xs4all.nl 6907 [2001:888:2000:d::a6]:56122 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:35225 Le mercredi 19 d=E9cembre 2012 22:31:42 UTC+1, Ian a =E9crit=A0: > On Wed, Dec 19, 2012 at 2:18 PM, wrote: >=20 > > latin-1 (iso-8859-1) ? are you sure ? >=20 >=20 >=20 > Yes. >=20 >=20 >=20 > >>>> sys.getsizeof('a') >=20 > > 26 >=20 > >>>> sys.getsizeof('ab') >=20 > > 27 >=20 > >>>> sys.getsizeof('a=E9') >=20 > > 39 >=20 >=20 >=20 > Compare to: >=20 >=20 >=20 > >>> sys.getsizeof('a\u0100') >=20 > 42 >=20 >=20 >=20 > The reason for the difference you posted is that pure ASCII strings >=20 > have a further optimization, which I glossed over and which is purely >=20 > a savings in overhead: >=20 >=20 >=20 > >>> sys.getsizeof('abcde') - sys.getsizeof('a') >=20 > 4 >=20 > >>> sys.getsizeof('=E1b=E7d=EA') - sys.getsizeof('=E1') >=20 > 4 ----- I know all of this. And this is exactly, what I explained. I do not care about this optimization. I'm not an ascii user. As a non ascii user, this optimization is just irrelevant. What should a Python user think, if he sees his strings are comsuming more memory just because he uses non ascii characters or he sees his strings are changing just because he "uppercases" them. Unicode is here to serve anybody. jmf