Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #50971

Re: Find and Replace Simplification

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.005
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'table.': 0.07; 'string': 0.09; 'ascii': 0.09; 'builtin': 0.09; 'bytes,': 0.09; 'extracted': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'translate': 0.10; 'python': 0.11; 'translation': 0.12; '(than': 0.16; 'dict': 0.16; 'ebcdic,': 0.16; 'iterated': 0.16; 'rebuild': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'unicode,': 0.16; 'worst': 0.16; 'wrote:': 0.18; 'looked': 0.18; 'not,': 0.20; 'seems': 0.21; '>>>': 0.22; 'code,': 0.22; 'input': 0.22; 'header:User-Agent:1': 0.23; 'byte': 0.24; 'replace': 0.24; 'string,': 0.24; 'unicode': 0.24; 'decide': 0.24; '(or': 0.24; 'downloaded': 0.26; 'header:X-Complaints-To:1': 0.27; 'header:In- Reply-To:1': 0.27; 'idea': 0.28; 'character': 0.29; "doesn't": 0.30; 'characters': 0.30; 'faster,': 0.31; 'really,': 0.31; "skip:' 40": 0.31; 'probably': 0.32; 'url:python': 0.33; 'table': 0.34; 'something': 0.35; 'but': 0.35; 'there': 0.35; 'shows': 0.36; 'url:org': 0.36; 'should': 0.36; 'example,': 0.37; 'two': 0.37; 'expected': 0.38; 'skip:o 20': 0.38; '8bit%:86': 0.38; 'to:addr:python-list': 0.38; 'files': 0.38; 'pm,': 0.38; 'does': 0.39; 'though,': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'even': 0.60; 'is.': 0.60; 'new': 0.61; 'simple': 0.61; 'high': 0.63; 'such': 0.63; 'july': 0.63; 'more': 0.64; 'taking': 0.65; 'theoretical': 0.74; '.replace': 0.84; 'faster.': 0.84; 'penalty': 0.84; 'subject:skip:S 10': 0.84; 'tricky': 0.84; 'undoubtedly': 0.84; 'url:cpython': 0.84; '2013': 0.98
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Dave Angel <davea@davea.name>
Subject Re: Find and Replace Simplification
Date Sat, 20 Jul 2013 14:04:51 -0400
References <mailman.4865.1374240179.3114.python-list@python.org> <51e967bb$0$29971$c3e8da3$5496439d@news.astraweb.com> <ksbt1a$5q1$1@ger.gmane.org> <CAN1F8qXttLWtMFDED-+gEdOR_5tZmDKtqDsF2kdojCRSMAn7eg@mail.gmail.com> <ksdtu7$ctm$1@ger.gmane.org> <CAN1F8qVv0N1D=JEkvWeQePhZg7fT7ULPPXp+ZTOmu=wQCzXiHQ@mail.gmail.com>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding 8bit
X-Gmane-NNTP-Posting-Host 174.32.174.33
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7
In-Reply-To <CAN1F8qVv0N1D=JEkvWeQePhZg7fT7ULPPXp+ZTOmu=wQCzXiHQ@mail.gmail.com>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.4927.1374343508.3114.python-list@python.org> (permalink)
Lines 54
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1374343508 news.xs4all.nl 15876 [2001:888:2000:d::a6]:59306
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:50971

Show key headers only | View raw


On 07/20/2013 01:03 PM, Joshua Landau wrote:
> On 20 July 2013 12:57, Serhiy Storchaka <storchaka@gmail.com> wrote:
>> 20.07.13 14:16, Joshua Landau написав(ла):
>>>

     <snip>

>>> However, some quick timing shows that translate has a very
>>> high penalty for missing characters and is a tad slower any way.
>>>
>>> Really, though, there should be no reason for .translate() to be
>>> slower than replace -- at worst it should just be "reduce(lambda s,
>>> ab: s.replace(*ab), mapping.items()¹, original_str)" and end up the
>>> *same* speed as iterated replace.
>>
>>
>> It doesn't work such way. Consider
>> 'ab'.translate({ord('a'):'b',ord('b'):'a'}).
>
> *sad*
>
> Still, it seems to me that it should be optimizable for sensible
> builtin types such that .translate is significantly faster, as there's
> no theoretical extra work that .translate *has* to do that .replace
> does not, and .replace also has to rebuild the string a lot of times.
>

translate is going to be faster (than replace) for Unicode if it has a 
"large" table.  For example, to translate from ASCII to EBCDIC, where 
every character in the string is replaced by a new one.  I have no idea 
what the cutoff is.  But of course, for a case like ASCII to EBCDIC, it 
would be very tricky to do it with replaces, probably taking much more 
than the expected 96 passes.

translate for byte strings is undoubtedly tons faster.  For byte 
strings, the translation table is 256 bytes, and the inner loop is a 
simple lookup.  But for Unicode, the table is a dict (or something very 
like it, I looked at the C code, not the Python code).

So for every character in the input string, it does a dict-type lookup, 
before it can even decide if the character is going to change.

Just for reference, the two files I was looking at were:

objects/unicodeobject.c
objects/bytesobject.c

Extracted from the bz2 downloaded from the page:
     http://hg.python.org/cpython


-- 
DaveA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 09:22 -0400
  Re: Find and Replace Simplification Novocastrian_Nomad <gregory.j.baker@gmail.com> - 2013-07-19 06:38 -0700
  Re: Find and Replace Simplification John Gordon <gordon@panix.com> - 2013-07-19 14:28 +0000
  Re: Find and Replace Simplification Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-19 16:22 +0000
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-19 20:29 +0300
    Re: Find and Replace Simplification Skip Montanaro <skip@pobox.com> - 2013-07-19 13:08 -0500
    Re: Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 17:44 -0400
    Re: Find and Replace Simplification Dave Angel <davea@davea.name> - 2013-07-19 18:45 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 12:16 +0100
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-20 14:48 +0300
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-20 14:57 +0300
    Re: Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 08:41 -0400
    Re: Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 08:50 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 18:03 +0100
    Re: Find and Replace Simplification Dave Angel <davea@davea.name> - 2013-07-20 14:04 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 19:37 +0100
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 19:41 +0100
    Re: Find and Replace Simplification Dave Angel <davea@davea.name> - 2013-07-20 17:56 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 23:33 +0100
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-21 10:44 +0300
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-21 12:29 +0100
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-21 15:28 +0300
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-21 13:49 +0100

csiph-web