Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!newsfeed.eweka.nl!eweka.nl!feeder3.eweka.nl!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'essentially': 0.04; 'ascii': 0.07; 'repeated': 0.07; 'strings.': 0.07; 'subject:How': 0.09; 'python': 0.09; 'ignoring': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:()': 0.09; 'subject:string': 0.09; 'subject:using': 0.09; 'terry': 0.09; 'unicode,': 0.09; 'suggest': 0.11; '3.2,': 0.16; '3.3,': 0.16; '3.3.': 0.16; 'cherry': 0.16; 'partly': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'subject: \n ': 0.16; 'subject:unicode': 0.16; 'subject:variable': 0.16; 'unicode.': 0.16; 'url:py': 0.16; 'wrote:': 0.17; 'bytes': 0.17; 'unicode': 0.17; 'jan': 0.18; 'systems.': 0.18; 'code,': 0.18; 'windows': 0.19; 'suggested': 0.20; 'tracker': 0.20; 'do.': 0.21; '3.2': 0.22; 'command': 0.24; 'least': 0.25; 'header:In-Reply- To:1': 0.25; 'header:User-Agent:1': 0.26; 'raw': 0.27; 'header:X -Complaints-To:1': 0.28; 'run': 0.28; 'comparison': 0.29; 'factor': 0.29; 'this.': 0.29; 'url:python': 0.32; 'switch': 0.32; 'running': 0.32; 'asked': 0.33; 'anyone': 0.33; 'to:addr:python- list': 0.33; 'likely': 0.33; 'operations': 0.33; 'times.': 0.33; 'version': 0.34; 'skip:b 20': 0.34; 'pm,': 0.35; 'there': 0.35; 'received:org': 0.36; 'really': 0.36; 'but': 0.36; 'url:org': 0.36; 'compare': 0.36; 'test': 0.36; 'skip:p 20': 0.36; 'ok,': 0.37; 'does': 0.37; 'why': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'lost': 0.60; 'you.': 0.61; 'personally': 0.61; 'first': 0.61; 'relatively': 0.62; 'times': 0.63; 'making': 0.64; 'here': 0.65; 'results': 0.65; '10000': 0.65; 'overall': 0.66; 'sum': 0.66; 'fact,': 0.69; 'click': 0.76; '5.6': 0.84; 'irrelevant': 0.84; 'received:fios.verizon.net': 0.84; 'subject:value': 0.84; 'url:cpython': 0.84; 'download.': 0.91; 'faster.': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Terry Reedy Subject: Re: How do I display unicode value stored in a string variable using ord() Date: Sun, 19 Aug 2012 17:03:46 -0400 References: <308df2af-abe7-4043-b199-0a39f440e0ab@googlegroups.com> <502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Gmane-NNTP-Posting-Host: pool-173-75-251-66.phlapa.fios.verizon.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20120713 Thunderbird/14.0 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 88 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1345410260 news.xs4all.nl 6884 [2001:888:2000:d::a6]:39593 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27432 On 8/19/2012 1:03 PM, Blind Anagram wrote: > Running Python from a Windows command prompt, I got the following on > Python 3.2.3 and 3.3 beta 2: > > python33\python" -m timeit "('abc' * 1000).replace('c', 'de')" > 10000 loops, best of 3: 39.3 usec per loop > python33\python" -m timeit "('ab=E2=80=A6' * 1000).replace('=E2=80=A6',= '=E2=80=A6=E2=80=A6')" > 10000 loops, best of 3: 51.8 usec per loop > python33\python" -m timeit "('ab=E2=80=A6' * 1000).replace('=E2=80=A6',= 'x=E2=80=A6')" > 10000 loops, best of 3: 52 usec per loop > python33\python" -m timeit "('ab=E2=80=A6' * 1000).replace('=E2=80=A6',= '=C5=93=E2=80=A6')" > 10000 loops, best of 3: 50.3 usec per loop > python33\python" -m timeit "('ab=E2=80=A6' * 1000).replace('=E2=80=A6',= '=E2=82=AC=E2=80=A6')" > 10000 loops, best of 3: 51.6 usec per loop > python33\python" -m timeit "('XYZ' * 1000).replace('X', '=C3=A9=C3=A7')= " > 10000 loops, best of 3: 38.3 usec per loop > python33\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')" > 10000 loops, best of 3: 50.3 usec per loop > > python32\python" -m timeit "('abc' * 1000).replace('c', 'de')" > 10000 loops, best of 3: 24.5 usec per loop > python32\python" -m timeit "('ab=E2=80=A6' * 1000).replace('=E2=80=A6',= '=E2=80=A6=E2=80=A6')" > 10000 loops, best of 3: 24.7 usec per loop > python32\python" -m timeit "('ab=E2=80=A6' * 1000).replace('=E2=80=A6',= 'x=E2=80=A6')" > 10000 loops, best of 3: 24.8 usec per loop > python32\python" -m timeit "('ab=E2=80=A6' * 1000).replace('=E2=80=A6',= '=C5=93=E2=80=A6')" > 10000 loops, best of 3: 24 usec per loop > python32\python" -m timeit "('ab=E2=80=A6' * 1000).replace('=E2=80=A6',= '=E2=82=AC=E2=80=A6')" > 10000 loops, best of 3: 24.1 usec per loop > python32\python" -m timeit "('XYZ' * 1000).replace('X', '=C3=A9=C3=A7')= " > 10000 loops, best of 3: 24.4 usec per loop > python32\python" -m timeit "('XYZ' * 1000).replace('Y', 'p?')" > 10000 loops, best of 3: 24.3 usec per loop This is one test repeated 7 times with essentially irrelevant=20 variations. The difference is less on my system (50%). Others report=20 seeing 3.3 as faster. When I asked on pydev, the answer was don't bother = making a tracker issue unless I was personally interested in=20 investigating why search is relatively slow in 3.3 on Windows. Any=20 change would have to not slow other operations or severely impact search = on other systems. I suggest the same answer to you. If you seriously want to compare old and new unicode, go to http://hg.python.org/cpython/file/tip/Tools/stringbench/stringbench.py and click raw to download. Run on 3.2 and 3.3, ignoring the bytes times. Here is a version of the first comparison from stringbench: print(timeit('''('NOW IS THE TIME FOR ALL GOOD PEOPLE TO COME TO THE AID = OF PYTHON'* 10).lower()''')) Results are 5.6 for 3.2 and .8 for 3.3. WOW! 3.3 is 7 times faster! OK, not fair. I cherry picked. The 7 times speedup in 3.3 likely is at=20 least partly independent of the 393 unicode change. The same test in=20 stringbench for bytes is twice as fast in 3.3 as 3.2, but only 2x, not=20 7x. In fact, it may have been the bytes/unicode comparison in 3.2 that=20 suggested that unicode case conversion of ascii chrs might be made faster= =2E The sum of the 3.3 unicode times is 109 versus 110 for 3.3 bytes and 125 = for 3.2 unicode. This unweighted sum is not really fair since the raw=20 times vary by a factor of at least 100. But is does suggest that anyone=20 claiming that 3.3 unicode is overall 'slower' than 3.2 unicode has some=20 work to do. There is also this. On my machine, the lowest bytes-time/unicode-time=20 for 3.3 is .71. This suggests that there is not a lot of fluff left in=20 the unicode code, and that not much is lost by the bytes to unicode=20 switch for strings. --=20 Terry Jan Reedy