Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'skip:[ 20': 0.03; 'scripts': 0.09; 'before.': 0.09; 'coding,': 0.09; 'encode': 0.09; 'imply': 0.09; 'to:addr:comp.lang.python': 0.09; 'unicode,': 0.09; 'cc:addr:python-list': 0.10; 'encoding': 0.15; 'languages.': 0.15; 'represents': 0.15; "skip:' 30": 0.15; '3.2.': 0.16; 'amd64': 0.16; 'attempt.': 0.16; 'bigger.': 0.16; 'characters:': 0.16; 'cp1252': 0.16; 'dismiss': 0.16; 'luckily,': 0.16; 'mardi': 0.16; 'measurement': 0.16; 'skip:7 20': 0.16; 'subject:Big': 0.16; 'subject:subject': 0.16; 'wrote:': 0.17; 'unicode': 0.17; 'examples': 0.18; '>>>': 0.18; '(or': 0.18; 'causing': 0.20; 'skip:" 30': 0.20; 'bit': 0.21; '(all': 0.22; 'cc:2**0': 0.23; 'cc:no real name:2**0': 0.24; 'connected': 0.24; 'linux': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header :User-Agent:1': 0.26; 'skip:[ 10': 0.26; 'coding': 0.27; 'fine': 0.28; '100000': 0.29; 'factor': 0.29; 'lies': 0.29; 'points': 0.29; 'probably': 0.29; 'field,': 0.30; 'code': 0.31; 'point': 0.31; 'safely': 0.33; 'received:google.com': 0.34; 'too.': 0.35; 'received:209.85': 0.35; 'really': 0.36; 'but': 0.36; 'characters': 0.36; 'too': 0.36; 'bad': 0.37; 'ok,': 0.37; 'skip:t 40': 0.37; 'skip:4 10': 0.37; 'quite': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'some': 0.38; 'your': 0.60; 'from:no real name:2**0': 0.60; 'range': 0.60; 'skip:u 10': 0.60; 'most': 0.61; 'subject:, ': 0.61; 'first': 0.61; 'chance': 0.61; 'repeat': 0.62; 'email addr:gmail.com': 0.63; 'show': 0.63; 'charset:windows-1252': 0.65; '10000': 0.65; '8bit%:10': 0.69; 'fact,': 0.69; '10.3': 0.84; 'agree?': 0.84; 'aspect.': 0.84; 'otten': 0.84; 'principles.': 0.84; 'replicate': 0.84; 'subject:Abuse': 0.84 Newsgroups: comp.lang.python Date: Tue, 21 Aug 2012 10:16:06 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=81.62.84.157; posting-account=ung4FAoAAAC46zhHJ0Nsnuox7M5gDvs_ References: <308df2af-abe7-4043-b199-0a39f440e0ab@googlegroups.com> <502f8a2a$0$29978$c3e8da3$5496439d@news.astraweb.com> <7xehn4vyya.fsf@ruckus.brouhaha.com> <5030832d$0$29978$c3e8da3$5496439d@news.astraweb.com> <7x8vdbmho6.fsf@ruckus.brouhaha.com> <7xfw7ilqnd.fsf@ruckus.brouhaha.com> <50314968$0$29978$c3e8da3$5496439d@news.astraweb.com> <7xwr0ua1pw.fsf@ruckus.brouhaha.com> User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-IP: 81.62.84.157 MIME-Version: 1.0 Subject: Re: Abuse of subject, was Re: Abuse of Big Oh notation From: wxjmfauth@gmail.com To: comp.lang.python@googlegroups.com Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Message-ID: Lines: 125 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1345569376 news.xs4all.nl 6884 [2001:888:2000:d::a6]:42751 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27579 Le mardi 21 ao=FBt 2012 09:52:09 UTC+2, Peter Otten a =E9crit=A0: > wxjmfauth@gmail.com wrote: >=20 >=20 >=20 > > By chance and luckily, first attempt. >=20 > =20 >=20 > > c:\python32\python -m timeit "('=80'*100+'=80'*100).replace('=80' >=20 > > , '=9C')" >=20 > > 1000000 loops, best of 3: 1.48 usec per loop >=20 > > c:\python33\python -m timeit "('=80'*100+'=80'*100).replace('=80' >=20 > > , '=9C')" >=20 > > 100000 loops, best of 3: 7.62 usec per loop >=20 >=20 >=20 > OK, that is roughly factor 5. Let's see what I get: >=20 >=20 >=20 > $ python3.2 -m timeit '("=80"*100+"=80"*100).replace("=80", "=9C")' >=20 > 100000 loops, best of 3: 1.8 usec per loop >=20 > $ python3.3 -m timeit '("=80"*100+"=80"*100).replace("=80", "=9C")' >=20 > 10000 loops, best of 3: 9.11 usec per loop >=20 >=20 >=20 > That is factor 5, too. So I can replicate your measurement on an AMD64 Li= nux=20 >=20 > system with self-built 3.3 versus system 3.2. >=20 >=20 >=20 > > Note >=20 > > The used characters are not members of the latin-1 coding >=20 > > scheme (btw an *unusable* coding). >=20 > > They are however charaters in cp1252 and mac-roman. >=20 >=20 >=20 > You seem to imply that the slowdown is connected to the inability of lati= n-1=20 >=20 > to encode "=9C" and "=80" (to take the examples relevant to the above=20 >=20 > microbench). So let's repeat with latin-1 characters: >=20 >=20 >=20 > $ python3.2 -m timeit '("=E4"*100+"=E4"*100).replace("=E4", "=DF")' >=20 > 100000 loops, best of 3: 1.76 usec per loop >=20 > $ python3.3 -m timeit '("=E4"*100+"=E4"*100).replace("=E4", "=DF")' >=20 > 10000 loops, best of 3: 10.3 usec per loop >=20 >=20 >=20 > Hm, the slowdown is even a tad bigger. So we can safely dismiss your theo= ry=20 >=20 > that an unfortunate choice of the 8 bit encoding is causing it. Do you=20 >=20 > agree? - I do not care too much about the numbers. It's an attempt to show the principles. - The fact, considering latin-1 as a bad coding, lies on the point that is is simply unsuable for some scripts / languages. It has mainly to do with source/text files coding. This is not really the point here. - Now, the technical aspect. This "coding" (latin-1) may be considered somehow as the pseudo-coding covering the unicode code points range 128..255. Unfortunatelly, this "coding" is not very optimal (or can be see as) when you work with a full range of Unicode, but is is fine when one works only in pure latin-1, with only 256 characters. This range 128..255 is always the critical part (all codings considered). And probably represents the most used characters. I hope, it was not too confused. I have no proof for my theory. With my experience on that field, I highly suspect this as the bottleneck. Some os as before. Py 3.2.3 >>> timeit.repeat("('=80'*100+'=80'*100).replace('=80', '=9C')") [1.5384088242603358, 1.532421642233382, 1.5327445924545433] >>> timeit.repeat("('=E4'*100+'=E4'*100).replace('=E4', '=DF')") [1.561762063667686, 1.5443503206462594, 1.5458670051605168] 3.3.0b2 >>> timeit.repeat("('=80'*100+'=80'*100).replace('=80', '=9C')") [7.701523104134512, 7.720358191179441, 7.614549852683501]>>>=20 >>> timeit.repeat("('=E4'*100+'=E4'*100).replace('=E4', '=DF')") [4.887939423990709, 4.868787294350611, 4.865697999795991] Quite mysterious! In any way it is a regression. jmf