Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #50972

Re: Find and Replace Simplification

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!news.mixmin.net!eweka.nl!hq-usenetpeers.eweka.nl!xlned.com!feeder1.xlned.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <joshua.landau.ws@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.013
X-Spam-Evidence '*H*': 0.97; '*S*': 0.00; '(at': 0.04; 'languages,': 0.04; 'table.': 0.07; 'test,': 0.07; 'string': 0.09; 'ascii': 0.09; 'builtin': 0.09; 'bytes,': 0.09; 'extracted': 0.09; 'translate': 0.10; 'cc:addr:python-list': 0.11; 'python': 0.11; 'translation': 0.12; "wouldn't": 0.14; '"a"': 0.16; '(than': 0.16; 'dict': 0.16; 'ebcdic,': 0.16; 'iterated': 0.16; 'rebuild': 0.16; 'unicode,': 0.16; 'sender:addr:gmail.com': 0.17; 'wrote:': 0.18; 'looked': 0.18; 'obviously': 0.18; 'not,': 0.20; 'seems': 0.21; 'code,': 0.22; 'input': 0.22; 'memory': 0.22; 'cc:addr:python.org': 0.22; 'byte': 0.24; 'bytes': 0.24; 'passes': 0.24; 'string,': 0.24; 'unicode': 0.24; 'decide': 0.24; '(or': 0.24; 'cc:2**0': 0.24; 'tables': 0.26; 'least': 0.26; 'downloaded': 0.26; 'header:In-Reply-To:1': 0.27; 'idea': 0.28; 'character': 0.29; 'especially': 0.30; 'message- id:@mail.gmail.com': 0.30; 'breaking': 0.31; 'faster,': 0.31; 'types.': 0.31; 'probably': 0.32; 'url:python': 0.33; 'running': 0.33; 'table': 0.34; "i'd": 0.34; 'problem': 0.35; 'knowledge': 0.35; 'agree': 0.35; 'something': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'really': 0.36; 'doing': 0.36; "didn't": 0.36; 'url:org': 0.36; 'should': 0.36; 'example,': 0.37; 'two': 0.37; 'list': 0.37; 'level': 0.37; 'area': 0.37; 'expected': 0.38; 'skip:o 20': 0.38; 'mapping': 0.38; 'files': 0.38; 'pm,': 0.38; 'expect': 0.39; 'does': 0.39; 'bad': 0.39; 'sure': 0.39; 'even': 0.60; 'easy': 0.60; 'dave': 0.60; 'is.': 0.60; 'then,': 0.60; 'full': 0.61; 'new': 0.61; 'simple': 0.61; "you're": 0.61; 'first': 0.61; 'times': 0.62; 'such': 0.63; 'july': 0.63; 'more': 0.64; 'taking': 0.65; 'theoretical': 0.74; 'low': 0.83; '.replace': 0.84; 'faster.': 0.84; 'imagine,': 0.84; 'subject:skip:S 10': 0.84; 'timings': 0.84; 'tricky': 0.84; 'undoubtedly': 0.84; 'url:cpython': 0.84; 'angel': 0.91; 'swing': 0.91; '2013': 0.98
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=YbAr2ohgNso65TnuASTj+Lkm5gyB2OCEgBk4YjY6YH0=; b=oYz4k0f/rFxZb07hOv64jqi7BQw6+hBF3ofdL/5u7PVxqGFSZHX34PlGx92kpdV12T cDArlZ30LxCbrgB/e0jbgmUEjaCUdZVjDccDx+rcPhDk2jxJdwTzQ4ShlMng0aez3N4V QW/VbZ6XlvXTnE+rd8tiEBH5CFw2gk1VwBK4/XoelqPUcw2Jga+Zt2lDNGGY+ZQsEJD1 QdQLewD2fA/uoUhb+tvhX7gTC4Nr1NGz14cCgl79Ofpt0gB7sLc9/9+ksjCknVWQw29/ +91VAw8IbNuQ17ZiYWpVWJdL9S3Fi2E3zDyEJXWfKTknxG5Apq9PAhMJQqb3cUfpzJ4n juMQ==
X-Received by 10.112.5.199 with SMTP id u7mr9589772lbu.67.1374345488088; Sat, 20 Jul 2013 11:38:08 -0700 (PDT)
MIME-Version 1.0
Sender joshua.landau.ws@gmail.com
In-Reply-To <ksejfu$ist$1@ger.gmane.org>
References <mailman.4865.1374240179.3114.python-list@python.org> <51e967bb$0$29971$c3e8da3$5496439d@news.astraweb.com> <ksbt1a$5q1$1@ger.gmane.org> <CAN1F8qXttLWtMFDED-+gEdOR_5tZmDKtqDsF2kdojCRSMAn7eg@mail.gmail.com> <ksdtu7$ctm$1@ger.gmane.org> <CAN1F8qVv0N1D=JEkvWeQePhZg7fT7ULPPXp+ZTOmu=wQCzXiHQ@mail.gmail.com> <ksejfu$ist$1@ger.gmane.org>
From Joshua Landau <joshua@landau.ws>
Date Sat, 20 Jul 2013 19:37:28 +0100
X-Google-Sender-Auth r9L3Oiyo7RVFhl2UXJUvikjCVYs
Subject Re: Find and Replace Simplification
To Dave Angel <davea@davea.name>
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
Cc python-list <python-list@python.org>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.4928.1374345496.3114.python-list@python.org> (permalink)
Lines 63
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1374345496 news.xs4all.nl 15972 [2001:888:2000:d::a6]:36257
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:50972

Show key headers only | View raw


On 20 July 2013 19:04, Dave Angel <davea@davea.name> wrote:
> On 07/20/2013 01:03 PM, Joshua Landau wrote:
>>
>> Still, it seems to me that it should be optimizable for sensible
>> builtin types such that .translate is significantly faster, as there's
>> no theoretical extra work that .translate *has* to do that .replace
>> does not, and .replace also has to rebuild the string a lot of times.
>>
>
> translate is going to be faster (than replace) for Unicode if it has a
> "large" table.  For example, to translate from ASCII to EBCDIC, where every
> character in the string is replaced by a new one.  I have no idea what the
> cutoff is.  But of course, for a case like ASCII to EBCDIC, it would be very
> tricky to do it with replaces, probably taking much more than the expected
> 96 passes.

My timings showed that for ".upper()", doing the full 26 passes "a" ->
"A", it was *way* slower to use .translate than .replace, unless you
used a list or equiv. with much faster lookup. Even then, it was
slower to use .translate.

I agree that for large tables it's obviously going to swing the other
way, but by the time you're running .replace 26 times you wouldn't (at
least I wouldn't) expect it still to be screamingly faster than
.translate.

> translate for byte strings is undoubtedly tons faster.  For byte strings,
> the translation table is 256 bytes, and the inner loop is a simple lookup.

For my above test, .translate is about 10x faster than iterated .replace.

> But for Unicode, the table is a dict (or something very like it, I looked at
> the C code, not the Python code).
>
> So for every character in the input string, it does a dict-type lookup,
> before it can even decide if the character is going to change.

The problem can be solved, I'd imagine, for builtin types. Just build
an internal representation upon calling .translate that's faster. It's
especially easy in the list case -- just build a C array¹ at the start
mapping int -> int and then have really fast C mapping speeds.

For dictionaries, you can do the same thing -- you just have to make
sure you're not breaking any memory barriers.

¹ I don't do C or other low level languages, so my knowledge in this
area is embarrassingly bad

> Just for reference, the two files I was looking at were:
>
> objects/unicodeobject.c
> objects/bytesobject.c
>
> Extracted from the bz2 downloaded from the page:
>     http://hg.python.org/cpython

I didn't look at bytes first time, I might take a look later.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 09:22 -0400
  Re: Find and Replace Simplification Novocastrian_Nomad <gregory.j.baker@gmail.com> - 2013-07-19 06:38 -0700
  Re: Find and Replace Simplification John Gordon <gordon@panix.com> - 2013-07-19 14:28 +0000
  Re: Find and Replace Simplification Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-07-19 16:22 +0000
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-19 20:29 +0300
    Re: Find and Replace Simplification Skip Montanaro <skip@pobox.com> - 2013-07-19 13:08 -0500
    Re: Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-19 17:44 -0400
    Re: Find and Replace Simplification Dave Angel <davea@davea.name> - 2013-07-19 18:45 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 12:16 +0100
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-20 14:48 +0300
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-20 14:57 +0300
    Re: Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 08:41 -0400
    Re: Find and Replace Simplification Devyn Collier Johnson <devyncjohnson@gmail.com> - 2013-07-20 08:50 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 18:03 +0100
    Re: Find and Replace Simplification Dave Angel <davea@davea.name> - 2013-07-20 14:04 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 19:37 +0100
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 19:41 +0100
    Re: Find and Replace Simplification Dave Angel <davea@davea.name> - 2013-07-20 17:56 -0400
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-20 23:33 +0100
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-21 10:44 +0300
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-21 12:29 +0100
    Re: Find and Replace Simplification Serhiy Storchaka <storchaka@gmail.com> - 2013-07-21 15:28 +0300
    Re: Find and Replace Simplification Joshua Landau <joshua@landau.ws> - 2013-07-21 13:49 +0100

csiph-web