Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #72620

Re: Unicode and Python - how often do you index strings?

Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'cpython': 0.05; 'kind,': 0.05; 'subject:Python': 0.06; 'utf-8': 0.07; 'string': 0.09; 'character,': 0.09; 'lawrence': 0.09; 'cc:addr:python-list': 0.11; 'count,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'internally': 0.16; 'iteration': 0.16; 'rarely': 0.16; 'subject:Unicode': 0.16; 'index': 0.16; 'wrote:': 0.18; 'wed,': 0.18; 'memory': 0.22; 'cc:addr:python.org': 0.22; 'header': 0.24; 'cc:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'characters': 0.30; 'message-id:@mail.gmail.com': 0.30; 'quite': 0.32; 'beginning': 0.33; 'but': 0.35; 'received:google.com': 0.35; 'representing': 0.36; 'doing': 0.36; 'subject:?': 0.36; 'pm,': 0.38; 'that,': 0.38; 'does': 0.39; 'either': 0.39; 'impact': 0.61; 'simply': 0.61; 'here': 0.66; 'subject:you': 0.87; 'to:none': 0.92
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=mMRJC7OwafvOB+14/ZAO0Hek2ss1/7sq+LYpzx/ADpE=; b=x888x036iziK4k2ozCxtRG3IoohXMFFr5m0FS/NH/dxkLDK5OOmXDJr0z1lG2BvW1K GUbOHBk52nK6JXYOWdDAuCA8BoxzZOituUI2NtOZmZiqDYBNhCU48ljXmL2ZdoHfGdjY /lxcPv6+laFwljDGX/c+yE9xkMT5aKAQMEsDUqrX1Wp/51fF0qhK8ieGDVKSE5jsBewR KG6N3h+wzQF/0LHmiNaIybcY7YSbcpfBDhm68J4MeWW9uoF4GJsZ/RE/yhRJep/diLRA i6iM1LLu2q74leaxDz+2aiXi/EsjMNLcxKotkInsEpnaMeDRhmgbLLcJXlRhLnmesbto ZvsQ==
MIME-Version 1.0
X-Received by 10.221.27.8 with SMTP id ro8mr2341032vcb.30.1401874211572; Wed, 04 Jun 2014 02:30:11 -0700 (PDT)
In-Reply-To <lmmkvk$73h$1@ger.gmane.org>
References <CAPTjJmr4iHdaCy61w2rz-oL6FcarRzzTeEU44Fxn2Z=gS0fh-Q@mail.gmail.com> <lmmkvk$73h$1@ger.gmane.org>
Date Wed, 4 Jun 2014 19:30:11 +1000
Subject Re: Unicode and Python - how often do you index strings?
From Chris Angelico <rosuav@gmail.com>
Cc "python-list@python.org" <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.10693.1401874213.18130.python-list@python.org> (permalink)
Lines 13
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1401874213 news.xs4all.nl 2918 [2001:888:2000:d::a6]:59015
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:72620

Show key headers only | View raw


On Wed, Jun 4, 2014 at 6:22 PM, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
> Single characters quite often, iteration rarely if ever, slicing all the
> time, but does that last one count?

Yes, slicing counts. What matters here is the potential impact of
internally representing strings as UTF-8 streams; when you ask for the
Nth character, it would have to scan from either the beginning or end
(more likely beginning) of the string and count, instead of doing what
CPython 3.3+ does and simply look up the header to find out the kind,
bit-shift the index by one less than that, and use that as a memory
location.

ChrisA

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Unicode and Python - how often do you index strings? Chris Angelico <rosuav@gmail.com> - 2014-06-04 19:30 +1000

csiph-web