Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'cpython': 0.05; 'kind,': 0.05; 'subject:Python': 0.06; 'utf-8': 0.07; 'string': 0.09; 'character,': 0.09; 'lawrence': 0.09; 'cc:addr:python-list': 0.11; 'count,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'internally': 0.16; 'iteration': 0.16; 'rarely': 0.16; 'subject:Unicode': 0.16; 'index': 0.16; 'wrote:': 0.18; 'wed,': 0.18; 'memory': 0.22; 'cc:addr:python.org': 0.22; 'header': 0.24; 'cc:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'characters': 0.30; 'message-id:@mail.gmail.com': 0.30; 'quite': 0.32; 'beginning': 0.33; 'but': 0.35; 'received:google.com': 0.35; 'representing': 0.36; 'doing': 0.36; 'subject:?': 0.36; 'pm,': 0.38; 'that,': 0.38; 'does': 0.39; 'either': 0.39; 'impact': 0.61; 'simply': 0.61; 'here': 0.66; 'subject:you': 0.87; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=mMRJC7OwafvOB+14/ZAO0Hek2ss1/7sq+LYpzx/ADpE=; b=x888x036iziK4k2ozCxtRG3IoohXMFFr5m0FS/NH/dxkLDK5OOmXDJr0z1lG2BvW1K GUbOHBk52nK6JXYOWdDAuCA8BoxzZOituUI2NtOZmZiqDYBNhCU48ljXmL2ZdoHfGdjY /lxcPv6+laFwljDGX/c+yE9xkMT5aKAQMEsDUqrX1Wp/51fF0qhK8ieGDVKSE5jsBewR KG6N3h+wzQF/0LHmiNaIybcY7YSbcpfBDhm68J4MeWW9uoF4GJsZ/RE/yhRJep/diLRA i6iM1LLu2q74leaxDz+2aiXi/EsjMNLcxKotkInsEpnaMeDRhmgbLLcJXlRhLnmesbto ZvsQ== MIME-Version: 1.0 X-Received: by 10.221.27.8 with SMTP id ro8mr2341032vcb.30.1401874211572; Wed, 04 Jun 2014 02:30:11 -0700 (PDT) In-Reply-To: References: Date: Wed, 4 Jun 2014 19:30:11 +1000 Subject: Re: Unicode and Python - how often do you index strings? From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 13 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1401874213 news.xs4all.nl 2918 [2001:888:2000:d::a6]:59015 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:72620 On Wed, Jun 4, 2014 at 6:22 PM, Mark Lawrence wrote: > Single characters quite often, iteration rarely if ever, slicing all the > time, but does that last one count? Yes, slicing counts. What matters here is the potential impact of internally representing strings as UTF-8 streams; when you ask for the Nth character, it would have to scan from either the beginning or end (more likely beginning) of the string and count, instead of doing what CPython 3.3+ does and simply look up the header to find out the kind, bit-shift the index by one less than that, and use that as a memory location. ChrisA