Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #72573

Re: Unicode and Python - how often do you index strings?

References <mailman.10656.1401842403.18130.python-list@python.org> <roy-9D3770.21181203062014@news.panix.com>
Date 2014-06-04 12:13 +1000
Subject Re: Unicode and Python - how often do you index strings?
From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.10664.1401848034.18130.python-list@python.org> (permalink)

Show all headers | View raw


On Wed, Jun 4, 2014 at 11:18 AM, Roy Smith <roy@panix.com> wrote:
> In article <mailman.10656.1401842403.18130.python-list@python.org>,
>  Chris Angelico <rosuav@gmail.com> wrote:
>
>> A current discussion regarding Python's Unicode support centres (or
>> centers, depending on how close you are to the cent[er]{2} of the
>> universe)
>
> <sarcasm style="regex-pedant">Um, you mean cent(er|re), don't you?  The
> pattern you wrote also matches centee and centrr.</sarcasm>

Maybe there's someone who spells it that way! Let's not be excluding
people. That'd be rude.

>> around one critical question: Is string indexing common?
>
> Not in our code.  I've got 80008 non-blank lines of Python (2.7) source
> handy.  I tried a few heuristics to find patterns which might be string
> indexing.
>
> $ find . -name '*.py' | xargs egrep '\[[^]][0-9]+\]'
>
> and then looked them over manually.  I see this pattern a bunch of times
> (in a single-use script):
>
> data['shard_key'] = hashlib.md5(str(id)).hexdigest()[:4]

Slicing is a form of indexing too, although in this case (slicing from
the front) it could be implemented on top of UTF-8 without much
problem.

> withhyphen = number if '-' in number else (number[:-2] + '-' +
> number[-2:]) # big assumption here

This *definitely* counts; if strings were represented internally in
UTF-8, this would involve two scans (although a smart implementation
could probably count backward rather than forward). By the way, any
time you slice up to the third from the end, you win two extra awesome
points, just for putting [:-3] into your code and having it mean
something. But I digress.

> Anyway, there's a bunch more, but the bottom line is that in our code,
> indexing into a string (at least explicitly in application source code)
> is a pretty rare thing.

Thanks. Of course, the pattern you searched for is looking only for
literals; it's a bit harder to find cases where the index (or slice
position) comes from a variable or expression, and those situations
are also rather harder to optimize (the MD5 prefix is clearly better
scanned from the front, the number tail is clearly better scanned from
the back - but with a variable?).

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-04 10:39 +1000
  Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-03 21:18 -0400
    Re: Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-04 12:13 +1000
      Re: Unicode and Python - how often do you index strings? Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-04 18:48 +1200
        Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 10:57 +0000
    Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 10:50 +0000
      Re: Unicode and Python - how often do you index strings? Rustom Mody <rustompmody@gmail.com> - 2014-06-04 05:52 -0700
        Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 13:36 +0000
  Re: Unicode and Python - how often do you index strings? wxjmfauth@gmail.com - 2014-06-03 23:50 -0700
    Re: Unicode and Python - how often do you index strings? Michael Torrie <torriem@gmail.com> - 2014-06-04 08:50 -0600
      Re: Unicode and Python - how often do you index strings? wxjmfauth@gmail.com - 2014-06-05 00:06 -0700
        Re: Unicode and Python - how often do you index strings? Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 10:20 +0300
        Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-05 15:39 +0000
          Re: Unicode and Python - how often do you index strings? Mark H Harris <harrismh777@gmail.com> - 2014-06-05 10:57 -0500
            Re: Unicode and Python - how often do you index strings? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-06-05 18:15 +0100
              Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-05 17:33 +0000
    Re: Unicode and Python - how often do you index strings? Joshua Landau <joshua@landau.ws> - 2014-06-05 18:18 +0100
  Re: Unicode and Python Rustom Mody <rustompmody@gmail.com> - 2014-06-04 21:25 -0700
    Re: Unicode and Python wxjmfauth@gmail.com - 2014-06-05 00:23 -0700
  Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-05 18:09 +0200
    Re: Unicode and Python - how often do you index strings? Paul Rubin <no.email@nospam.invalid> - 2014-06-05 11:16 -0700
      Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-05 20:42 +0200
        Re: Unicode and Python - how often do you index strings? Ryan Hiebert <ryan@ryanhiebert.com> - 2014-06-05 13:52 -0500
          Re: Unicode and Python - how often do you index strings? Paul Rubin <no.email@nospam.invalid> - 2014-06-05 12:58 -0700
            Re: Unicode and Python - how often do you index strings? Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 14:18 -0600
              Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-06 10:47 +0200
                Re: Unicode and Python - how often do you index strings? Tim Chase <python.list@tim.thechases.com> - 2014-06-06 05:37 -0500
                Re: Unicode and Python - how often do you index strings? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 11:52 +0000
            Re: Unicode and Python - how often do you index strings? Albert-Jan Roskam <fomcl@yahoo.com> - 2014-06-05 13:34 -0700
              Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-05 17:00 -0400
                Re: Unicode and Python - how often do you index strings? Rustom Mody <rustompmody@gmail.com> - 2014-06-05 15:24 -0700
                Re: Unicode and Python - how often do you index strings? Ned Deily <nad@acm.org> - 2014-06-05 15:57 -0700
                Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-05 20:10 -0400
                Re: Unicode and Python - how often do you index strings? Ned Deily <nad@acm.org> - 2014-06-05 17:43 -0700
                Re: Unicode and Python - how often do you index strings? Grant Edwards <invalid@invalid.invalid> - 2014-06-06 14:20 +0000
            Re: Unicode and Python - how often do you index strings? Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 18:05 -0600
          Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-06 10:42 +0200
            Re: Unicode and Python - how often do you index strings? Larry Hudson <orgnut@yahoo.com> - 2014-06-06 20:24 -0700
        Re: Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-06 05:59 +1000
        Re: Unicode and Python - how often do you index strings? Ryan Hiebert <ryan@ryanhiebert.com> - 2014-06-05 15:05 -0500

csiph-web