Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'skip:[ 20': 0.03; 'cpython': 0.05; "'a'": 0.07; 'character,': 0.07; 'exit': 0.07; 'formatting': 0.07; 'newline': 0.07; 'skip:\\ 10': 0.07; 'subject:skip:c 10': 0.07; '"a"': 0.09; 'dict': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'sep': 0.09; 'size)': 0.09; 'subset': 0.09; 'tab': 0.09; 'tcl': 0.09; 'terry': 0.09; '"this': 0.13; '(the': 0.15; 'languages.': 0.15; '"\\t': 0.16; '"b"': 0.16; '(small)': 0.16; '0.04': 0.16; '0.06': 0.16; '0.07': 0.16; '0.09': 0.16; '0.19': 0.16; '1.01': 0.16; '1.02': 0.16; '1.03': 0.16; '2.02': 0.16; '2.50': 0.16; '2.64': 0.16; '==========': 0.16; 'comparison:': 0.16; 'concat': 0.16; 'n/a': 0.16; 'newlines': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reedy': 0.16; 'right-click': 0.16; 'skip:" 70': 0.16; 'subject:String': 0.16; 'tabs': 0.16; 'windows).': 0.16; 'string': 0.17; 'wrote:': 0.17; 'bytes': 0.17; 'instance,': 0.17; 'string,': 0.17; 'unicode': 0.17; '(in': 0.18; 'jan': 0.18; '>>>': 0.18; 'windows': 0.19; '(not': 0.20; 'skip:" 30': 0.20; 'skip:" 40': 0.20; 'bit': 0.21; 'import': 0.21; '3.2': 0.22; 'lets': 0.22; 'strip': 0.22; 'example': 0.23; 'split': 0.23; 'downloaded': 0.24; 'somewhere': 0.24; 'command': 0.24; 'header :In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'skip:" 20': 0.26; 'expand': 0.26; 'am,': 0.27; 'handling': 0.27; 'replace': 0.27; 'correct': 0.28; 'header:X-Complaints-To:1': 0.28; 'skip:( 20': 0.28; 'all.': 0.28; 'run': 0.28; '120': 0.29; 'builds.': 0.29; 'comparison': 0.29; 'factor': 0.29; 'spaces': 0.29; 'whitespace': 0.29; 'character': 0.29; 'words': 0.29; '"the': 0.29; '(including': 0.30; 'seconds': 0.30; 'url:python': 0.32; '"")': 0.33; 'handle': 0.33; 'to:addr:python-list': 0.33; 'code:': 0.33; 'another': 0.33; 'list': 0.35; 'received:org': 0.36; 'but': 0.36; 'url:org': 0.36; 'characters': 0.36; 'skip:p 20': 0.36; 'correctly': 0.37; 'does': 0.37; 'two': 0.37; 'late': 0.37; 'rather': 0.37; 'subject:: ': 0.38; 'skip:( 30': 0.38; 'comment': 0.38; 'some': 0.38; 'sure': 0.38; 'to:addr:python.org': 0.39; 'skip:" 10': 0.40; 'header:Received:5': 0.40; 'link': 0.60; 'repeat': 0.62; 'between': 0.63; 'world': 0.63; 'times': 0.63; 'email addr:gmail.com': 0.63; 'here': 0.65; 'total': 0.65; '100': 0.78; '12.1': 0.84; '2.10': 0.84; '2.11': 0.84; '2.12': 0.84; '3.12': 0.84; 'dense': 0.84; 'received:fios.verizon.net': 0.84; 'substrings': 0.84; 'url:cpython': 0.84; 'increases': 0.91; 'from.': 0.93 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Terry Reedy Subject: Re: String concatenation benchmarking weirdness Date: Sat, 12 Jan 2013 06:31:09 -0500 References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Gmane-NNTP-Posting-Host: pool-173-75-251-66.phlapa.fios.verizon.net User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0 In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 478 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1357990336 news.xs4all.nl 6982 [2001:888:2000:d::a6]:47944 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:36696 On 1/12/2013 3:38 AM, wxjmfauth@gmail.com wrote: > from timeit import timeit, repeat > > size =3D 1000 > > r =3D repeat("y =3D x + 'a'", setup =3D "x =3D 'a' * %i" % size) > print('1:', r) > r =3D repeat("y =3D x + '=C3=A9'", setup =3D "x =3D 'a' * %i" % size) > print('2:', r) > r =3D repeat("y =3D x + '=C5=93'", setup =3D "x =3D 'a' * %i" % size) > print('3:', r) > r =3D repeat("y =3D x + '=E2=82=AC'", setup =3D "x =3D 'a' * %i" % size= ) > print('4:', r) > r =3D repeat("y =3D x + '=E2=82=AC'", setup =3D "x =3D '=E2=82=AC' * %i= " % size) > print('5:', r) > r =3D repeat("y =3D x + '=C5=93'", setup =3D "x =3D '=C5=93' * %i" % si= ze) > print('6:', r) > r =3D repeat("y =3D =C3=A9 + '=C5=93'", setup =3D "=C3=A9 =3D '=C5=93' = * %i" % size) > print('7:', r) > r =3D repeat("y =3D =C3=A9 + '=C5=93'", setup =3D "=C3=A9 =3D '=E2=82=AC= ' * %i" % size) > print('8:', r) > > > >> c:\python32\pythonw -u "vitesse3.py" > 1: [0.3603178435286996, 0.42901157137281515, 0.35459694357592086] > 2: [0.3576409223543202, 0.4272010951864649, 0.3590055732104662] > 3: [0.3552022735516487, 0.4256544908828328, 0.35824546465278573] > 4: [0.35488168890607774, 0.4271707696118834, 0.36109528098614074] > 5: [0.3560675370237849, 0.4261538782668417, 0.36138160167082134] > 6: [0.3570182634788317, 0.4270155971913008, 0.35770629956705324] > 7: [0.3556977225493485, 0.4264969117143753, 0.3645634239700426] > 8: [0.35511247834379844, 0.4259628665308437, 0.3580737510097034] >> Exit code: 0 >> c:\Python33\pythonw -u "vitesse3.py" > 1: [0.3053600256152646, 0.3306491917840535, 0.3044963374976518] > 2: [0.36252767208680514, 0.36937298133086727, 0.3685573415262271] > 3: [0.7666293438924097, 0.7653473991487574, 0.7630926729867262] > 4: [0.7636680712265038, 0.7647586103955284, 0.7631395397838059] > 5: [0.44721085450773934, 0.3863234021671369, 0.45664368355696094] > 6: [0.44699700013114807, 0.3873974001136613, 0.45167383387335036] > 7: [0.4465200615491014, 0.387050034441188, 0.45459690419205856] > 8: [0.44760587465455437, 0.3875261853459726, 0.45421212384964704] >> Exit code: 0 > > > The difference between a correct (coherent) unicode handling and ... By 'correct' Jim means 'speedy', for a subset of string operations*.=20 rather than 'accurate'. In 3.2 and before, CPython does not handle=20 extended plane characters correctly on Windows and other narrow builds.=20 This is, by the way, true of many other languages. For instance, Tcl 8.5 = and before (not sure about the new 8.6) does not handle them at all. The = same is true of Microsoft command windows. * lets try another comparison: from timeit import timeit print(timeit("a.encode()", "a =3D 'a'*10000")) 3.2: 12.1 seconds 3.3 .7 seconds 3.3 is 15 times faster!!! (The factor increases with the length of a.) A fairer comparison is the approximately 120 micro benchmarks in=20 Tools/stringbench.py. Here they are, uncensored, for 3.3.0 and 3.2.3. It = is in the Tools directory of some distributions but not all (including=20 not Windows). It can be downloaded from http://hg.python.org/cpython/file/6fe28afa6611/Tools/stringbench In FireFox, Right-click on the stringbench.py link and 'Save link as...' to somewhere you can run it from. >>> stringbench v2.0 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit=20 (AMD64)] 2013-01-12 06:17:51.685781 bytes unicode (in ms) (in ms) % comment =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D case conversion -- dense 0.41 0.43 95.2 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower()=20 (*1000) 0.42 0.43 95.8 ("where in the world is carmen san deigo?"*10).upper()=20 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D case conversion -- rare 0.41 0.43 95.8 ("Where in the world is Carmen San Deigo?"*10).lower()=20 (*1000) 0.42 0.43 96.3 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper()=20 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D concat 20 strings of words length 4 to 15 1.83 1.95 94.1 s1+s2+s3+s4+...+s20 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D concat two strings 0.10 0.10 98.7 "Andrew"+"Dalke" (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D count AACT substrings in DNA example 2.46 2.44 100.9 dna.count("AACT") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D count newlines 0.77 0.75 103.6 ...text.with.2000.newlines.count("\n") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D early match, single character 0.30 0.27 110.5 ("A"*1000).find("A") (*1000) 0.45 0.06 750.5 "A" in "A"*1000 (*1000) 0.30 0.27 110.4 ("A"*1000).index("A") (*1000) 0.24 0.22 107.2 ("A"*1000).partition("A") (*1000) 0.33 0.29 116.6 ("A"*1000).rfind("A") (*1000) 0.32 0.29 107.9 ("A"*1000).rindex("A") (*1000) 0.20 0.21 94.1 ("A"*1000).rpartition("A") (*1000) 0.42 0.45 93.4 ("A"*1000).rsplit("A", 1) (*1000) 0.39 0.41 95.9 ("A"*1000).split("A", 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D early match, two characters 0.32 0.27 121.1 ("AB"*1000).find("AB") (*1000) 0.45 0.06 729.5 "AB" in "AB"*1000 (*1000) 0.30 0.27 111.2 ("AB"*1000).index("AB") (*1000) 0.23 0.28 85.0 ("AB"*1000).partition("AB") (*1000) 0.33 0.30 110.6 ("AB"*1000).rfind("AB") (*1000) 0.33 0.30 110.5 ("AB"*1000).rindex("AB") (*1000) 0.22 0.27 83.1 ("AB"*1000).rpartition("AB") (*1000) 0.46 0.47 96.7 ("AB"*1000).rsplit("AB", 1) (*1000) 0.44 0.48 90.9 ("AB"*1000).split("AB", 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D endswith multiple characters 0.24 0.29 84.0 "Andrew".endswith("Andrew") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D endswith multiple characters - not! 0.26 0.28 92.9 "Andrew".endswith("Anders") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D endswith single character 0.25 0.28 90.0 "Andrew".endswith("w") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D formatting a string type with a dict N/A 0.67 0.0 "The %(k1)s is %(k2)s the=20 %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join empty string, with 1 character sep N/A 0.06 0.0 "A".join("") (*100) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join empty string, with 5 character sep N/A 0.06 0.0 "ABCDE".join("") (*100) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join list of 100 words, with 1 character s= ep 0.87 1.27 68.8 "A".join(["Bob"]*100)) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join list of 100 words, with 5 character s= ep 1.14 1.54 74.0 "ABCDE".join(["Bob"]*100)) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join list of 26 characters, with 1 charact= er sep 0.27 0.37 72.0 "A".join(list("ABC..Z")) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join list of 26 characters, with 5 charact= er sep 0.32 0.43 75.7 "ABCDE".join(list("ABC..Z")) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join string with 26 characters, with 1 cha= racter sep N/A 1.30 0.0 "A".join("ABC..Z") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join string with 26 characters, with 5 cha= racter sep N/A 1.37 0.0 "ABCDE".join("ABC..Z") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D late match, 100 characters 3.25 3.23 100.5 s=3D"ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100) 2.79 2.78 100.4 s=3D"ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100) 1.98 1.94 102.3 s=3D"ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100) 3.24 3.23 100.3 s=3D"ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100) 4.26 3.62 117.7 s=3D"ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)= 3.23 3.23 100.1 s=3D"ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100) 2.32 2.32 100.1 s=3D"ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100) 3.23 3.21 100.8 s=3D"ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100) 3.58 3.57 100.4 s=3D"ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100= ) 3.60 3.60 100.0 s=3D"ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)= 3.60 3.56 101.2 s=3D"ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D late match, two characters 0.62 0.58 106.3 ("AB"*300+"C").find("BC") (*1000) 0.92 0.82 111.8 ("AB"*300+"CA").find("CA") (*1000) 0.73 0.33 218.8 "BC" in ("AB"*300+"C") (*1000) 0.61 0.60 101.0 ("AB"*300+"C").index("BC") (*1000) 0.54 0.82 66.4 ("AB"*300+"C").partition("BC") (*1000) 0.66 0.63 104.6 ("C"+"AB"*300).rfind("CA") (*1000) 0.91 0.88 102.3 ("BC"+"AB"*300).rfind("BC") (*1000) 0.65 0.62 105.1 ("C"+"AB"*300).rindex("CA") (*1000) 0.53 0.56 94.5 ("C"+"AB"*300).rpartition("CA") (*1000) 0.75 0.77 96.6 ("C"+"AB"*300).rsplit("CA", 1) (*1000) 0.65 0.67 97.0 ("AB"*300+"C").split("BC", 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D no match, single character 0.89 0.87 102.3 ("A"*1000).find("B") (*1000) 1.03 0.64 159.1 "B" in "A"*1000 (*1000) 0.67 0.68 98.7 ("A"*1000).partition("B") (*1000) 0.87 0.85 102.8 ("A"*1000).rfind("B") (*1000) 0.67 0.68 98.5 ("A"*1000).rpartition("B") (*1000) 0.87 0.87 99.2 ("A"*1000).rsplit("B", 1) (*1000) 0.86 0.85 101.5 ("A"*1000).split("B", 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D no match, two characters 1.22 1.16 104.9 ("AB"*1000).find("BC") (*1000) 1.93 2.02 95.2 ("AB"*1000).find("CA") (*1000) 1.37 0.94 145.3 "BC" in "AB"*1000 (*1000) 1.39 2.14 65.1 ("AB"*1000).partition("BC") (*1000) 2.32 2.31 100.7 ("AB"*1000).rfind("BC") (*1000) 1.47 1.44 102.1 ("AB"*1000).rfind("CA") (*1000) 2.26 2.27 99.7 ("AB"*1000).rpartition("BC") (*1000) 2.46 2.45 100.2 ("AB"*1000).rsplit("BC", 1) (*1000) 1.15 1.16 99.1 ("AB"*1000).split("BC", 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D quick replace multiple character match 0.13 0.12 105.0 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D quick replace single character match 0.12 0.12 105.2 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D repeat 1 character 10 times 0.08 0.10 80.6 "A"*10 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D repeat 1 character 1000 times 0.16 0.18 93.1 "A"*1000 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D repeat 5 characters 10 times 0.11 0.13 84.4 "ABCDE"*10 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D repeat 5 characters 1000 times 0.39 0.41 94.8 "ABCDE"*1000 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D replace and expand multiple characters, bi= g string 2.02 2.36 85.6 "...text.with.2000.newlines...replace("\n", "\r\n") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D replace multiple characters, dna 3.12 3.23 96.6 dna.replace("ATC", "ATT") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D replace single character 0.33 0.40 82.4 "This is a test".replace(" ", "\t") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D replace single character, big string 0.75 0.86 87.4 "...text.with.2000.lines...replace("\n", " ") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D replace/remove multiple characters 0.41 0.48 86.1 "When shall we three meet again?".replace("ee", "") (*1000= ) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split 1 whitespace 0.14 0.18 79.3 ("Here are some words. "*2).partition(" ") (*1000) 0.11 0.14 75.1 ("Here are some words. "*2).rpartition(" ") (*1000) 0.35 0.39 90.3 ("Here are some words. "*2).rsplit(None, 1) (*1000) 0.32 0.38 83.9 ("Here are some words. "*2).split(None, 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split 2000 newlines 1.74 2.02 86.3 "...text...".rsplit("\n") (*10) 1.69 1.97 85.5 "...text...".split("\n") (*10) 1.89 2.55 74.0 "...text...".splitlines() (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split newlines 0.35 0.39 88.9 "this\nis\na\ntest\n".rsplit("\n") (*1000) 0.34 0.40 86.4 "this\nis\na\ntest\n".split("\n") (*1000) 0.32 0.40 80.7 "this\nis\na\ntest\n".splitlines() (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split on multicharacter separator (dna) 2.28 2.30 99.1 dna.rsplit("ACTAT") (*10) 2.63 2.66 98.9 dna.split("ACTAT") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split on multicharacter separator (small) 0.55 0.69 79.0=20 "this--is--a--test--of--the--emergency--broadcast--system".rsplit("--")=20 (*1000) 0.58 0.70 82.9=20 "this--is--a--test--of--the--emergency--broadcast--system".split("--")=20 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split whitespace (huge) 1.51 2.12 71.4 human_text.rsplit() (*10) 1.51 2.05 73.6 human_text.split() (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split whitespace (small) 0.48 0.68 70.1 ("Here are some words. "*2).rsplit() (*1000) 0.48 0.64 74.9 ("Here are some words. "*2).split() (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D startswith multiple characters 0.24 0.25 95.9 "Andrew".startswith("Andrew") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D startswith multiple characters - not! 0.24 0.25 95.7 "Andrew".startswith("Anders") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D startswith single character 0.23 0.25 95.4 "Andrew".startswith("A") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D strip terminal newline 0.09 0.21 44.1 s=3D"Hello!\n"; s[:-1] if s[-1]=3D=3D"\n" else s (*1000) 0.09 0.12 74.0 "\nHello!".rstrip() (*1000) 0.09 0.12 74.0 "Hello!\n".rstrip() (*1000) 0.09 0.12 71.6 "\nHello!\n".strip() (*1000) 0.09 0.12 73.2 "\nHello!".strip() (*1000) 0.09 0.12 72.9 "Hello!\n".strip() (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D strip terminal spaces and tabs 0.09 0.13 69.6 "\t \tHello".rstrip() (*1000) 0.09 0.13 72.3 "Hello\t \t".rstrip() (*1000) 0.07 0.08 86.8 "Hello\t \t".strip() (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D tab split 0.59 0.65 90.9 GFF3_example.rsplit("\t", 8) (*1000) 0.55 0.59 94.2 GFF3_example.rsplit("\t") (*1000) 0.52 0.57 90.7 GFF3_example.split("\t", 8) (*1000) 0.52 0.57 90.1 GFF3_example.split("\t") (*1000) 108.87 116.31 93.6 TOTAL >>> stringbench v2.0 3.2.3 (default, Apr 11 2012, 07:12:16) [MSC v.1500 64 bit (AMD64)] 2013-01-12 06:23:05.994000 bytes unicode (in ms) (in ms) % comment =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D case conversion -- dense 0.63 3.01 21.0 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower()=20 (*1000) 0.63 2.90 21.5 ("where in the world is carmen san deigo?"*10).upper()=20 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D case conversion -- rare 0.84 2.83 29.8 ("Where in the world is Carmen San Deigo?"*10).lower()=20 (*1000) 0.50 3.47 14.3 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper()=20 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D concat 20 strings of words length 4 to 15 1.82 1.75 103.9 s1+s2+s3+s4+...+s20 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D concat two strings 0.09 0.08 115.5 "Andrew"+"Dalke" (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D count AACT substrings in DNA example 2.40 2.64 91.1 dna.count("AACT") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D count newlines 0.77 0.75 101.6 ...text.with.2000.newlines.count("\n") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D early match, single character 0.19 0.18 101.9 ("A"*1000).find("A") (*1000) 0.39 0.05 824.7 "A" in "A"*1000 (*1000) 0.19 0.19 96.3 ("A"*1000).index("A") (*1000) 0.20 0.22 87.5 ("A"*1000).partition("A") (*1000) 0.20 0.20 101.8 ("A"*1000).rfind("A") (*1000) 0.20 0.20 101.2 ("A"*1000).rindex("A") (*1000) 0.18 0.22 82.5 ("A"*1000).rpartition("A") (*1000) 0.41 0.45 91.7 ("A"*1000).rsplit("A", 1) (*1000) 0.42 0.43 99.0 ("A"*1000).split("A", 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D early match, two characters 0.19 0.19 102.3 ("AB"*1000).find("AB") (*1000) 0.39 0.05 781.6 "AB" in "AB"*1000 (*1000) 0.19 0.20 97.9 ("AB"*1000).index("AB") (*1000) 0.23 0.33 71.1 ("AB"*1000).partition("AB") (*1000) 0.20 0.20 101.6 ("AB"*1000).rfind("AB") (*1000) 0.20 0.20 100.1 ("AB"*1000).rindex("AB") (*1000) 0.22 0.31 70.4 ("AB"*1000).rpartition("AB") (*1000) 0.47 0.53 90.0 ("AB"*1000).rsplit("AB", 1) (*1000) 0.45 0.52 85.0 ("AB"*1000).split("AB", 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D endswith multiple characters 0.18 0.18 97.6 "Andrew".endswith("Andrew") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D endswith multiple characters - not! 0.18 0.18 100.4 "Andrew".endswith("Anders") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D endswith single character 0.18 0.18 97.1 "Andrew".endswith("w") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D formatting a string type with a dict N/A 0.53 0.0 "The %(k1)s is %(k2)s the=20 %(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join empty string, with 1 character sep N/A 0.05 0.0 "A".join("") (*100) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join empty string, with 5 character sep N/A 0.05 0.0 "ABCDE".join("") (*100) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join list of 100 words, with 1 character s= ep 1.02 1.02 99.6 "A".join(["Bob"]*100)) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join list of 100 words, with 5 character s= ep 1.25 1.48 84.4 "ABCDE".join(["Bob"]*100)) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join list of 26 characters, with 1 charact= er sep 0.31 0.25 122.9 "A".join(list("ABC..Z")) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join list of 26 characters, with 5 charact= er sep 0.36 0.41 88.4 "ABCDE".join(list("ABC..Z")) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join string with 26 characters, with 1 cha= racter sep N/A 1.06 0.0 "A".join("ABC..Z") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D join string with 26 characters, with 5 cha= racter sep N/A 1.22 0.0 "ABCDE".join("ABC..Z") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D late match, 100 characters 2.52 2.68 94.0 s=3D"ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100) 2.35 3.06 76.9 s=3D"ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100) 1.55 1.61 96.2 s=3D"ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100) 2.51 2.68 94.0 s=3D"ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100) 3.57 4.66 76.7 s=3D"ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100) 3.23 3.24 99.8 s=3D"ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100) 2.35 2.56 91.7 s=3D"ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100) 3.23 3.24 99.8 s=3D"ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100) 3.58 3.92 91.4 s=3D"ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)= 3.62 3.96 91.4 s=3D"ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100) 2.89 3.38 85.4 s=3D"ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D late match, two characters 0.52 0.52 99.5 ("AB"*300+"C").find("BC") (*1000) 0.69 0.90 76.5 ("AB"*300+"CA").find("CA") (*1000) 0.67 0.37 179.2 "BC" in ("AB"*300+"C") (*1000) 0.51 0.53 96.8 ("AB"*300+"C").index("BC") (*1000) 0.48 0.81 59.3 ("AB"*300+"C").partition("BC") (*1000) 0.55 0.55 101.5 ("C"+"AB"*300).rfind("CA") (*1000) 0.85 0.85 100.0 ("BC"+"AB"*300).rfind("BC") (*1000) 0.55 0.55 100.3 ("C"+"AB"*300).rindex("CA") (*1000) 0.52 0.60 87.1 ("C"+"AB"*300).rpartition("CA") (*1000) 0.78 0.82 95.4 ("C"+"AB"*300).rsplit("CA", 1) (*1000) 0.65 0.72 91.2 ("AB"*300+"C").split("BC", 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D no match, single character 0.77 0.77 100.6 ("A"*1000).find("B") (*1000) 0.98 0.63 155.1 "B" in "A"*1000 (*1000) 0.66 0.66 99.7 ("A"*1000).partition("B") (*1000) 0.77 0.77 100.4 ("A"*1000).rfind("B") (*1000) 0.66 0.66 99.7 ("A"*1000).rpartition("B") (*1000) 0.88 0.88 100.4 ("A"*1000).rsplit("B", 1) (*1000) 0.88 0.87 101.2 ("A"*1000).split("B", 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D no match, two characters 1.19 1.21 98.1 ("AB"*1000).find("BC") (*1000) 1.79 2.51 71.2 ("AB"*1000).find("CA") (*1000) 1.28 1.08 119.1 "BC" in "AB"*1000 (*1000) 1.10 2.11 52.1 ("AB"*1000).partition("BC") (*1000) 2.37 2.37 100.0 ("AB"*1000).rfind("BC") (*1000) 1.36 1.36 100.5 ("AB"*1000).rfind("CA") (*1000) 2.25 2.26 99.9 ("AB"*1000).rpartition("BC") (*1000) 2.38 2.62 90.7 ("AB"*1000).rsplit("BC", 1) (*1000) 1.18 1.30 90.1 ("AB"*1000).split("BC", 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D quick replace multiple character match 0.12 0.32 37.1 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D quick replace single character match 0.12 0.30 37.9 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D repeat 1 character 10 times 0.08 0.09 90.3 "A"*10 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D repeat 1 character 1000 times 0.16 0.19 82.2 "A"*1000 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D repeat 5 characters 10 times 0.11 0.12 98.3 "ABCDE"*10 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D repeat 5 characters 1000 times 0.40 0.58 67.9 "ABCDE"*1000 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D replace and expand multiple characters, bi= g string 1.95 2.13 91.7 "...text.with.2000.newlines...replace("\n", "\r\n") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D replace multiple characters, dna 2.93 3.25 90.3 dna.replace("ATC", "ATT") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D replace single character 0.25 0.26 96.6 "This is a test".replace(" ", "\t") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D replace single character, big string 0.73 1.01 72.0 "...text.with.2000.lines...replace("\n", " ") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D replace/remove multiple characters 0.30 0.34 89.0 "When shall we three meet again?".replace("ee", "") (*1000= ) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split 1 whitespace 0.12 0.13 93.3 ("Here are some words. "*2).partition(" ") (*1000) 0.11 0.11 98.8 ("Here are some words. "*2).rpartition(" ") (*1000) 0.32 0.37 86.5 ("Here are some words. "*2).rsplit(None, 1) (*1000) 0.32 0.33 96.9 ("Here are some words. "*2).split(None, 1) (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split 2000 newlines 1.76 2.19 80.5 "...text...".rsplit("\n") (*10) 1.72 2.10 81.9 "...text...".split("\n") (*10) 1.87 2.58 72.4 "...text...".splitlines() (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split newlines 0.36 0.34 103.9 "this\nis\na\ntest\n".rsplit("\n") (*1000) 0.35 0.33 105.9 "this\nis\na\ntest\n".split("\n") (*1000) 0.31 0.34 89.7 "this\nis\na\ntest\n".splitlines() (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split on multicharacter separator (dna) 2.18 2.34 93.4 dna.rsplit("ACTAT") (*10) 2.50 2.64 94.5 dna.split("ACTAT") (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split on multicharacter separator (small) 0.59 0.62 95.3=20 "this--is--a--test--of--the--emergency--broadcast--system".rsplit("--")=20 (*1000) 0.55 0.59 93.1=20 "this--is--a--test--of--the--emergency--broadcast--system".split("--")=20 (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split whitespace (huge) 1.54 2.34 65.5 human_text.rsplit() (*10) 1.51 2.22 68.3 human_text.split() (*10) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D split whitespace (small) 0.46 0.60 76.5 ("Here are some words. "*2).rsplit() (*1000) 0.45 0.51 87.6 ("Here are some words. "*2).split() (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D startswith multiple characters 0.18 0.18 97.3 "Andrew".startswith("Andrew") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D startswith multiple characters - not! 0.18 0.18 100.1 "Andrew".startswith("Anders") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D startswith single character 0.17 0.18 96.8 "Andrew".startswith("A") (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D strip terminal newline 0.11 0.21 52.0 s=3D"Hello!\n"; s[:-1] if s[-1]=3D=3D"\n" else s (*1000) 0.06 0.07 92.1 "\nHello!".rstrip() (*1000) 0.06 0.07 92.2 "Hello!\n".rstrip() (*1000) 0.06 0.07 91.2 "\nHello!\n".strip() (*1000) 0.06 0.07 91.1 "\nHello!".strip() (*1000) 0.06 0.07 91.1 "Hello!\n".strip() (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D strip terminal spaces and tabs 0.07 0.07 89.4 "\t \tHello".rstrip() (*1000) 0.07 0.07 91.4 "Hello\t \t".rstrip() (*1000) 0.04 0.05 88.7 "Hello\t \t".strip() (*1000) =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D tab split 0.57 0.56 100.8 GFF3_example.rsplit("\t", 8) (*1000) 0.53 0.53 100.7 GFF3_example.rsplit("\t") (*1000) 0.49 0.49 101.2 GFF3_example.split("\t", 8) (*1000) 0.51 0.49 103.5 GFF3_example.split("\t") (*1000) 102.13 125.57 81.3 TOTAL --=20 Terry Jan Reedy