Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #72573

Re: Unicode and Python - how often do you index strings?

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; '(at': 0.04; 'patterns': 0.04; 'explicitly': 0.05; 'subject:Python': 0.06; 'indexing': 0.07; 'matches': 0.07; 'utf-8': 0.07; 'string': 0.09; 'excluding': 0.09; 'front,': 0.09; 'prefix': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'wrote': 0.14; "'-'": 0.16; 'backward': 0.16; 'centers,': 0.16; 'expression,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'internally': 0.16; 'md5': 0.16; 'position)': 0.16; 'roy': 0.16; 'subject:Unicode': 0.16; 'index': 0.16; 'wrote:': 0.18; 'code.': 0.18; 'discussion': 0.18; 'looked': 0.18; 'variable': 0.18; 'wed,': 0.18; 'bit': 0.19; "python's": 0.19; 'thanks.': 0.20; '(the': 0.22; 'code,': 0.22; '(in': 0.22; 'putting': 0.22; 'cc:addr:python.org': 0.22; 'unicode': 0.24; '(or': 0.24; 'cc:2**0': 0.24; "i've": 0.25; 'source': 0.25; 'least': 0.26; 'header:In-Reply-To:1': 0.27; 'tried': 0.27; 'chris': 0.29; 'am,': 0.29; 'scanned': 0.29; 'message- id:@mail.gmail.com': 0.30; 'code': 0.31; 'lines': 0.31; "skip:' 10": 0.31; 'you?': 0.31; '(although': 0.31; 'bunch': 0.31; 'end,': 0.31; 'question:': 0.31; 'probably': 0.32; 'critical': 0.32; 'cases': 0.33; 'implemented': 0.33; 'third': 0.33; 'maybe': 0.34; 'could': 0.34; 'problem.': 0.35; 'more,': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'subject:?': 0.36; 'application': 0.37; 'two': 0.37; 'awesome': 0.38; 'rather': 0.38; 'how': 0.40; 'back': 0.62; 'times': 0.62; 'our': 0.64; 'here': 0.66; 'bottom': 0.67; 'close': 0.67; 'smith': 0.68; 'article': 0.77; 'points,': 0.84; 'subject:you': 0.87; 'to:none': 0.92
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=hlJn4eE+bAnRJydwRwm23vOthA5RTVxMVP02peOnI5w=; b=ooK9FA1iiOemwCPVBOI0mfP/vUeeZuA9SXQ86Kz2srgSu9MNU2GQG2/KwxQ6pYUGxT 7MXYxFrAQb4pLI6jC5k7fnrP2T6kD7pMLjSfAJmezZGzCKbyjOqtG0GGe7sdTe2URWAf rZhYNnoSRr/igD6XuL0mIMCSuVagbYpWWygBZ6VYsefOYjotv0ymqhcacvV8P8mqo2w7 P4NhxAxC1NMQbVv37/FutA6nptXfbWy0qAJh81tFu3oOhDK6JOL3vlJ5egwafUqI0y7W z58f+j09UXQeuxhzddYEkgKQs3N7zGYaeCEFZ3hFEQbWcqznAGrkwVFbBJm+P0i7478G Bddw==
MIME-Version 1.0
X-Received by 10.52.14.130 with SMTP id p2mr11471916vdc.39.1401848025975; Tue, 03 Jun 2014 19:13:45 -0700 (PDT)
In-Reply-To <roy-9D3770.21181203062014@news.panix.com>
References <mailman.10656.1401842403.18130.python-list@python.org> <roy-9D3770.21181203062014@news.panix.com>
Date Wed, 4 Jun 2014 12:13:45 +1000
Subject Re: Unicode and Python - how often do you index strings?
From Chris Angelico <rosuav@gmail.com>
Cc "python-list@python.org" <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.10664.1401848034.18130.python-list@python.org> (permalink)
Lines 53
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1401848034 news.xs4all.nl 2882 [2001:888:2000:d::a6]:45930
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:72573

Show key headers only | View raw


On Wed, Jun 4, 2014 at 11:18 AM, Roy Smith <roy@panix.com> wrote:
> In article <mailman.10656.1401842403.18130.python-list@python.org>,
>  Chris Angelico <rosuav@gmail.com> wrote:
>
>> A current discussion regarding Python's Unicode support centres (or
>> centers, depending on how close you are to the cent[er]{2} of the
>> universe)
>
> <sarcasm style="regex-pedant">Um, you mean cent(er|re), don't you?  The
> pattern you wrote also matches centee and centrr.</sarcasm>

Maybe there's someone who spells it that way! Let's not be excluding
people. That'd be rude.

>> around one critical question: Is string indexing common?
>
> Not in our code.  I've got 80008 non-blank lines of Python (2.7) source
> handy.  I tried a few heuristics to find patterns which might be string
> indexing.
>
> $ find . -name '*.py' | xargs egrep '\[[^]][0-9]+\]'
>
> and then looked them over manually.  I see this pattern a bunch of times
> (in a single-use script):
>
> data['shard_key'] = hashlib.md5(str(id)).hexdigest()[:4]

Slicing is a form of indexing too, although in this case (slicing from
the front) it could be implemented on top of UTF-8 without much
problem.

> withhyphen = number if '-' in number else (number[:-2] + '-' +
> number[-2:]) # big assumption here

This *definitely* counts; if strings were represented internally in
UTF-8, this would involve two scans (although a smart implementation
could probably count backward rather than forward). By the way, any
time you slice up to the third from the end, you win two extra awesome
points, just for putting [:-3] into your code and having it mean
something. But I digress.

> Anyway, there's a bunch more, but the bottom line is that in our code,
> indexing into a string (at least explicitly in application source code)
> is a pretty rare thing.

Thanks. Of course, the pattern you searched for is looking only for
literals; it's a bit harder to find cases where the index (or slice
position) comes from a variable or expression, and those situations
are also rather harder to optimize (the MD5 prefix is clearly better
scanned from the front, the number tail is clearly better scanned from
the back - but with a variable?).

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-04 10:39 +1000
  Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-03 21:18 -0400
    Re: Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-04 12:13 +1000
      Re: Unicode and Python - how often do you index strings? Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-04 18:48 +1200
        Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 10:57 +0000
    Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 10:50 +0000
      Re: Unicode and Python - how often do you index strings? Rustom Mody <rustompmody@gmail.com> - 2014-06-04 05:52 -0700
        Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 13:36 +0000
  Re: Unicode and Python - how often do you index strings? wxjmfauth@gmail.com - 2014-06-03 23:50 -0700
    Re: Unicode and Python - how often do you index strings? Michael Torrie <torriem@gmail.com> - 2014-06-04 08:50 -0600
      Re: Unicode and Python - how often do you index strings? wxjmfauth@gmail.com - 2014-06-05 00:06 -0700
        Re: Unicode and Python - how often do you index strings? Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 10:20 +0300
        Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-05 15:39 +0000
          Re: Unicode and Python - how often do you index strings? Mark H Harris <harrismh777@gmail.com> - 2014-06-05 10:57 -0500
            Re: Unicode and Python - how often do you index strings? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-06-05 18:15 +0100
              Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-05 17:33 +0000
    Re: Unicode and Python - how often do you index strings? Joshua Landau <joshua@landau.ws> - 2014-06-05 18:18 +0100
  Re: Unicode and Python Rustom Mody <rustompmody@gmail.com> - 2014-06-04 21:25 -0700
    Re: Unicode and Python wxjmfauth@gmail.com - 2014-06-05 00:23 -0700
  Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-05 18:09 +0200
    Re: Unicode and Python - how often do you index strings? Paul Rubin <no.email@nospam.invalid> - 2014-06-05 11:16 -0700
      Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-05 20:42 +0200
        Re: Unicode and Python - how often do you index strings? Ryan Hiebert <ryan@ryanhiebert.com> - 2014-06-05 13:52 -0500
          Re: Unicode and Python - how often do you index strings? Paul Rubin <no.email@nospam.invalid> - 2014-06-05 12:58 -0700
            Re: Unicode and Python - how often do you index strings? Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 14:18 -0600
              Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-06 10:47 +0200
                Re: Unicode and Python - how often do you index strings? Tim Chase <python.list@tim.thechases.com> - 2014-06-06 05:37 -0500
                Re: Unicode and Python - how often do you index strings? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 11:52 +0000
            Re: Unicode and Python - how often do you index strings? Albert-Jan Roskam <fomcl@yahoo.com> - 2014-06-05 13:34 -0700
              Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-05 17:00 -0400
                Re: Unicode and Python - how often do you index strings? Rustom Mody <rustompmody@gmail.com> - 2014-06-05 15:24 -0700
                Re: Unicode and Python - how often do you index strings? Ned Deily <nad@acm.org> - 2014-06-05 15:57 -0700
                Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-05 20:10 -0400
                Re: Unicode and Python - how often do you index strings? Ned Deily <nad@acm.org> - 2014-06-05 17:43 -0700
                Re: Unicode and Python - how often do you index strings? Grant Edwards <invalid@invalid.invalid> - 2014-06-06 14:20 +0000
            Re: Unicode and Python - how often do you index strings? Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 18:05 -0600
          Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-06 10:42 +0200
            Re: Unicode and Python - how often do you index strings? Larry Hudson <orgnut@yahoo.com> - 2014-06-06 20:24 -0700
        Re: Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-06 05:59 +1000
        Re: Unicode and Python - how often do you index strings? Ryan Hiebert <ryan@ryanhiebert.com> - 2014-06-05 15:05 -0500

csiph-web