Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #103660
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: How to read from a file to an arbitrary delimiter efficiently? |
| Date | 2016-02-28 20:28 +0000 |
| Message-ID | <mailman.24.1456691337.9760.python-list@python.org> (permalink) |
| References | <56cea44e$0$11128$c3e8da3@news.astraweb.com> |
On 25 February 2016 at 06:50, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
>
> I have a need to read to an arbitrary delimiter, which might be any of a
> (small) set of characters. For the sake of the exercise, lets say it is
> either ! or ? (for example).
>
> I want to read from files reasonably efficiently. I don't mind if there is a
> little overhead, but my first attempt is 100 times slower than the built-in
> "read to the end of the line" method.
You can get something much faster using mmap and searching for a
single delimiter:
def readuntil(m, delim):
start = m.tell()
index = m.find(delim, start)
if index == -1:
return m.read()
else:
return m.read(index - start)
def readmmap(f):
m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
f.seek(0)
while True:
chunk = readuntil(m, b'!') # Note byte-string
if not chunk:
return
# Do stuff with chunk
pass
My timing makes that ~7x slower than iterating over the lines of the
file but still around 100x faster than reading individual characters.
I'm not sure how to generalise it to looking for multiple delimiters
without dropping back to reading individual characters though.
--
Oscar
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-02-25 17:50 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2016-02-25 08:37 +0100
Re: How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve@pearwood.info> - 2016-02-27 21:40 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Dan Sommers <dan@tombstonezero.net> - 2016-02-27 14:40 +0000
Re: How to read from a file to an arbitrary delimiter efficiently? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-02-27 12:03 -0500
Re: How to read from a file to an arbitrary delimiter efficiently? Marko Rauhamaa <marko@pacujo.net> - 2016-02-27 19:47 +0200
Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-25 18:30 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve@pearwood.info> - 2016-02-27 20:49 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-27 23:17 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-27 23:18 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Serhiy Storchaka <storchaka@gmail.com> - 2016-02-27 17:23 +0200
Re: How to read from a file to an arbitrary delimiter efficiently? Paul Rubin <no.email@nospam.invalid> - 2016-02-24 23:48 -0800
Re: How to read from a file to an arbitrary delimiter efficiently? wxjmfauth@gmail.com - 2016-02-25 06:37 -0800
Re: How to read from a file to an arbitrary delimiter efficiently? wxjmfauth@gmail.com - 2016-02-25 06:38 -0800
Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 16:35 +0000
Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 20:03 +0000
Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 20:28 +0000
Re: How to read from a file to an arbitrary delimiter efficiently? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-02-28 20:28 +0000
Re: How to read from a file to an arbitrary delimiter efficiently? Tim Delaney <timothy.c.delaney@gmail.com> - 2016-02-29 08:00 +1100
csiph-web