Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #72573
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <rosuav@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; '(at': 0.04; 'patterns': 0.04; 'explicitly': 0.05; 'subject:Python': 0.06; 'indexing': 0.07; 'matches': 0.07; 'utf-8': 0.07; 'string': 0.09; 'excluding': 0.09; 'front,': 0.09; 'prefix': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'wrote': 0.14; "'-'": 0.16; 'backward': 0.16; 'centers,': 0.16; 'expression,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'internally': 0.16; 'md5': 0.16; 'position)': 0.16; 'roy': 0.16; 'subject:Unicode': 0.16; 'index': 0.16; 'wrote:': 0.18; 'code.': 0.18; 'discussion': 0.18; 'looked': 0.18; 'variable': 0.18; 'wed,': 0.18; 'bit': 0.19; "python's": 0.19; 'thanks.': 0.20; '(the': 0.22; 'code,': 0.22; '(in': 0.22; 'putting': 0.22; 'cc:addr:python.org': 0.22; 'unicode': 0.24; '(or': 0.24; 'cc:2**0': 0.24; "i've": 0.25; 'source': 0.25; 'least': 0.26; 'header:In-Reply-To:1': 0.27; 'tried': 0.27; 'chris': 0.29; 'am,': 0.29; 'scanned': 0.29; 'message- id:@mail.gmail.com': 0.30; 'code': 0.31; 'lines': 0.31; "skip:' 10": 0.31; 'you?': 0.31; '(although': 0.31; 'bunch': 0.31; 'end,': 0.31; 'question:': 0.31; 'probably': 0.32; 'critical': 0.32; 'cases': 0.33; 'implemented': 0.33; 'third': 0.33; 'maybe': 0.34; 'could': 0.34; 'problem.': 0.35; 'more,': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'subject:?': 0.36; 'application': 0.37; 'two': 0.37; 'awesome': 0.38; 'rather': 0.38; 'how': 0.40; 'back': 0.62; 'times': 0.62; 'our': 0.64; 'here': 0.66; 'bottom': 0.67; 'close': 0.67; 'smith': 0.68; 'article': 0.77; 'points,': 0.84; 'subject:you': 0.87; 'to:none': 0.92 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=hlJn4eE+bAnRJydwRwm23vOthA5RTVxMVP02peOnI5w=; b=ooK9FA1iiOemwCPVBOI0mfP/vUeeZuA9SXQ86Kz2srgSu9MNU2GQG2/KwxQ6pYUGxT 7MXYxFrAQb4pLI6jC5k7fnrP2T6kD7pMLjSfAJmezZGzCKbyjOqtG0GGe7sdTe2URWAf rZhYNnoSRr/igD6XuL0mIMCSuVagbYpWWygBZ6VYsefOYjotv0ymqhcacvV8P8mqo2w7 P4NhxAxC1NMQbVv37/FutA6nptXfbWy0qAJh81tFu3oOhDK6JOL3vlJ5egwafUqI0y7W z58f+j09UXQeuxhzddYEkgKQs3N7zGYaeCEFZ3hFEQbWcqznAGrkwVFbBJm+P0i7478G Bddw== |
| MIME-Version | 1.0 |
| X-Received | by 10.52.14.130 with SMTP id p2mr11471916vdc.39.1401848025975; Tue, 03 Jun 2014 19:13:45 -0700 (PDT) |
| In-Reply-To | <roy-9D3770.21181203062014@news.panix.com> |
| References | <mailman.10656.1401842403.18130.python-list@python.org> <roy-9D3770.21181203062014@news.panix.com> |
| Date | Wed, 4 Jun 2014 12:13:45 +1000 |
| Subject | Re: Unicode and Python - how often do you index strings? |
| From | Chris Angelico <rosuav@gmail.com> |
| Cc | "python-list@python.org" <python-list@python.org> |
| Content-Type | text/plain; charset=UTF-8 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.10664.1401848034.18130.python-list@python.org> (permalink) |
| Lines | 53 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1401848034 news.xs4all.nl 2882 [2001:888:2000:d::a6]:45930 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:72573 |
Show key headers only | View raw
On Wed, Jun 4, 2014 at 11:18 AM, Roy Smith <roy@panix.com> wrote:
> In article <mailman.10656.1401842403.18130.python-list@python.org>,
> Chris Angelico <rosuav@gmail.com> wrote:
>
>> A current discussion regarding Python's Unicode support centres (or
>> centers, depending on how close you are to the cent[er]{2} of the
>> universe)
>
> <sarcasm style="regex-pedant">Um, you mean cent(er|re), don't you? The
> pattern you wrote also matches centee and centrr.</sarcasm>
Maybe there's someone who spells it that way! Let's not be excluding
people. That'd be rude.
>> around one critical question: Is string indexing common?
>
> Not in our code. I've got 80008 non-blank lines of Python (2.7) source
> handy. I tried a few heuristics to find patterns which might be string
> indexing.
>
> $ find . -name '*.py' | xargs egrep '\[[^]][0-9]+\]'
>
> and then looked them over manually. I see this pattern a bunch of times
> (in a single-use script):
>
> data['shard_key'] = hashlib.md5(str(id)).hexdigest()[:4]
Slicing is a form of indexing too, although in this case (slicing from
the front) it could be implemented on top of UTF-8 without much
problem.
> withhyphen = number if '-' in number else (number[:-2] + '-' +
> number[-2:]) # big assumption here
This *definitely* counts; if strings were represented internally in
UTF-8, this would involve two scans (although a smart implementation
could probably count backward rather than forward). By the way, any
time you slice up to the third from the end, you win two extra awesome
points, just for putting [:-3] into your code and having it mean
something. But I digress.
> Anyway, there's a bunch more, but the bottom line is that in our code,
> indexing into a string (at least explicitly in application source code)
> is a pretty rare thing.
Thanks. Of course, the pattern you searched for is looking only for
literals; it's a bit harder to find cases where the index (or slice
position) comes from a variable or expression, and those situations
are also rather harder to optimize (the MD5 prefix is clearly better
scanned from the front, the number tail is clearly better scanned from
the back - but with a variable?).
ChrisA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-04 10:39 +1000
Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-03 21:18 -0400
Re: Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-04 12:13 +1000
Re: Unicode and Python - how often do you index strings? Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-06-04 18:48 +1200
Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 10:57 +0000
Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 10:50 +0000
Re: Unicode and Python - how often do you index strings? Rustom Mody <rustompmody@gmail.com> - 2014-06-04 05:52 -0700
Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-04 13:36 +0000
Re: Unicode and Python - how often do you index strings? wxjmfauth@gmail.com - 2014-06-03 23:50 -0700
Re: Unicode and Python - how often do you index strings? Michael Torrie <torriem@gmail.com> - 2014-06-04 08:50 -0600
Re: Unicode and Python - how often do you index strings? wxjmfauth@gmail.com - 2014-06-05 00:06 -0700
Re: Unicode and Python - how often do you index strings? Marko Rauhamaa <marko@pacujo.net> - 2014-06-05 10:20 +0300
Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-05 15:39 +0000
Re: Unicode and Python - how often do you index strings? Mark H Harris <harrismh777@gmail.com> - 2014-06-05 10:57 -0500
Re: Unicode and Python - how often do you index strings? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-06-05 18:15 +0100
Re: Unicode and Python - how often do you index strings? alister <alister.nospam.ware@ntlworld.com> - 2014-06-05 17:33 +0000
Re: Unicode and Python - how often do you index strings? Joshua Landau <joshua@landau.ws> - 2014-06-05 18:18 +0100
Re: Unicode and Python Rustom Mody <rustompmody@gmail.com> - 2014-06-04 21:25 -0700
Re: Unicode and Python wxjmfauth@gmail.com - 2014-06-05 00:23 -0700
Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-05 18:09 +0200
Re: Unicode and Python - how often do you index strings? Paul Rubin <no.email@nospam.invalid> - 2014-06-05 11:16 -0700
Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-05 20:42 +0200
Re: Unicode and Python - how often do you index strings? Ryan Hiebert <ryan@ryanhiebert.com> - 2014-06-05 13:52 -0500
Re: Unicode and Python - how often do you index strings? Paul Rubin <no.email@nospam.invalid> - 2014-06-05 12:58 -0700
Re: Unicode and Python - how often do you index strings? Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 14:18 -0600
Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-06 10:47 +0200
Re: Unicode and Python - how often do you index strings? Tim Chase <python.list@tim.thechases.com> - 2014-06-06 05:37 -0500
Re: Unicode and Python - how often do you index strings? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-06-06 11:52 +0000
Re: Unicode and Python - how often do you index strings? Albert-Jan Roskam <fomcl@yahoo.com> - 2014-06-05 13:34 -0700
Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-05 17:00 -0400
Re: Unicode and Python - how often do you index strings? Rustom Mody <rustompmody@gmail.com> - 2014-06-05 15:24 -0700
Re: Unicode and Python - how often do you index strings? Ned Deily <nad@acm.org> - 2014-06-05 15:57 -0700
Re: Unicode and Python - how often do you index strings? Roy Smith <roy@panix.com> - 2014-06-05 20:10 -0400
Re: Unicode and Python - how often do you index strings? Ned Deily <nad@acm.org> - 2014-06-05 17:43 -0700
Re: Unicode and Python - how often do you index strings? Grant Edwards <invalid@invalid.invalid> - 2014-06-06 14:20 +0000
Re: Unicode and Python - how often do you index strings? Ian Kelly <ian.g.kelly@gmail.com> - 2014-06-05 18:05 -0600
Re: Unicode and Python - how often do you index strings? Johannes Bauer <dfnsonfsduifb@gmx.de> - 2014-06-06 10:42 +0200
Re: Unicode and Python - how often do you index strings? Larry Hudson <orgnut@yahoo.com> - 2014-06-06 20:24 -0700
Re: Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-06 05:59 +1000
Re: Unicode and Python - how often do you index strings? Ryan Hiebert <ryan@ryanhiebert.com> - 2014-06-05 15:05 -0500
csiph-web