Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #50971

Re: Find and Replace Simplification

From Dave Angel <davea@davea.name>
Subject Re: Find and Replace Simplification
Date 2013-07-20 14:04 -0400
References (1 earlier) <51e967bb$0$29971$c3e8da3$5496439d@news.astraweb.com> <ksbt1a$5q1$1@ger.gmane.org> <CAN1F8qXttLWtMFDED-+gEdOR_5tZmDKtqDsF2kdojCRSMAn7eg@mail.gmail.com> <ksdtu7$ctm$1@ger.gmane.org> <CAN1F8qVv0N1D=JEkvWeQePhZg7fT7ULPPXp+ZTOmu=wQCzXiHQ@mail.gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.4927.1374343508.3114.python-list@python.org> (permalink)

Show all headers | View raw


On 07/20/2013 01:03 PM, Joshua Landau wrote:
> On 20 July 2013 12:57, Serhiy Storchaka <storchaka@gmail.com> wrote:
>> 20.07.13 14:16, Joshua Landau написав(ла):
>>>

     <snip>

>>> However, some quick timing shows that translate has a very
>>> high penalty for missing characters and is a tad slower any way.
>>>
>>> Really, though, there should be no reason for .translate() to be
>>> slower than replace -- at worst it should just be "reduce(lambda s,
>>> ab: s.replace(*ab), mapping.items()¹, original_str)" and end up the
>>> *same* speed as iterated replace.
>>
>>
>> It doesn't work such way. Consider
>> 'ab'.translate({ord('a'):'b',ord('b'):'a'}).
>
> *sad*
>
> Still, it seems to me that it should be optimizable for sensible
> builtin types such that .translate is significantly faster, as there's
> no theoretical extra work that .translate *has* to do that .replace
> does not, and .replace also has to rebuild the string a lot of times.
>

translate is going to be faster (than replace) for Unicode if it has a 
"large" table.  For example, to translate from ASCII to EBCDIC, where 
every character in the string is replaced by a new one.  I have no idea 
what the cutoff is.  But of course, for a case like ASCII to EBCDIC, it 
would be very tricky to do it with replaces, probably taking much more 
than the expected 96 passes.

translate for byte strings is undoubtedly tons faster.  For byte 
strings, the translation table is 256 bytes, and the inner loop is a 
simple lookup.  But for Unicode, the table is a dict (or something very 
like it, I looked at the C code, not the Python code).

So for every character in the input string, it does a dict-type lookup, 
before it can even decide if the character is going to change.

Just for reference, the two files I was looking at were:

objects/unicodeobject.c
objects/bytesobject.c

Extracted from the bz2 downloaded from the page:
     http://hg.python.org/cpython


-- 
DaveA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 09:22 -0400
  Re: Find and Replace Simplification Novocastrian_Nomad <gregory.j.baker@gmail.com> - 2013-07-19 06:38 -0700
  Re: Find and Replace Simplification John Gordon <gordon@panix.com> - 2013-07-19 14:28 +0000
  Re: Find and Replace Simplification Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-19 16:22 +0000
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-19 20:29 +0300
    Re: Find and Replace Simplification Skip Montanaro <skip@pobox.com> - 2013-07-19 13:08 -0500
    Re: Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 17:44 -0400
    Re: Find and Replace Simplification Dave Angel <davea@davea.name> - 2013-07-19 18:45 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 12:16 +0100
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-20 14:48 +0300
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-20 14:57 +0300
    Re: Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 08:41 -0400
    Re: Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 08:50 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 18:03 +0100
    Re: Find and Replace Simplification Dave Angel <davea@davea.name> - 2013-07-20 14:04 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 19:37 +0100
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 19:41 +0100
    Re: Find and Replace Simplification Dave Angel <davea@davea.name> - 2013-07-20 17:56 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 23:33 +0100
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-21 10:44 +0300
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-21 12:29 +0100
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-21 15:28 +0300
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-21 13:49 +0100

csiph-web