Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!news.stack.nl!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Tue, 3 Jun 2014 21:37:17 -0500
From: Tim Chase <python.list@tim.thechases.com>
To: Chris Angelico <rosuav@gmail.com>
Subject: Re: Unicode and Python - how often do you index strings?
In-Reply-To: <CAPTjJmp0KVo_3xxCThkDsUkuQQA14WvfHpvNBbqgEDRdfNGFJg@mail.gmail.com>
References: <CAPTjJmr4iHdaCy61w2rz-oL6FcarRzzTeEU44Fxn2Z=gS0fh-Q@mail.gmail.com> <20140603201154.38b47afb@bigbox.christie.dr> <CAPTjJmp0KVo_3xxCThkDsUkuQQA14WvfHpvNBbqgEDRdfNGFJg@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: "python-list@python.org" <python-list@python.org>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.10672.1401849477.18130.python-list@python.org>
Lines: 30
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:72581

On 2014-06-04 12:16, Chris Angelico wrote:
> On Wed, Jun 4, 2014 at 11:11 AM, Tim Chase
> <python.list@tim.thechases.com> wrote:
> > I then take row 2 and use it to make a mapping of header-name to a
> > slice-object for slicing the subsequent strings:
> >
> >       slice(i.start(), i.end())
> >
> >     print("EmpID = %s" % row[header_map["EMPID"]].strip())
> >     print("Name = %s" % row[header_map["NAME"]].strip())
> >
> > which I presume uses string indexing under the hood.
> 
> Yes, it's definitely going to be indexing. If strings were
> represented internally in UTF-8, each of those calls would need to
> scan from the beginning of the string, counting and discarding
> characters until it finds the place to start, then counting and
> retaining characters until it finds the place to stop. Definite
> example of what I'm looking for, thanks!

For what it's worth, most of the lines in each file are under ~2k, so
even O(N) or O(log N) indexing wouldn't be grievous.  Noticeable, but
not grievous.

Glad my example could give you some fodder.

-tkc