Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #103582

Re: How to read from a file to an arbitrary delimiter efficiently?

From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Subject Re: How to read from a file to an arbitrary delimiter efficiently?
Date 2016-02-27 23:17 +1100
Message-ID <mailman.173.1456575458.20994.python-list@python.org> (permalink)
References <56cea44e$0$11128$c3e8da3@news.astraweb.com> <mailman.116.1456385901.20994.python-list@python.org> <56d17138$0$1605$c3e8da3$5496439d@news.astraweb.com>

Show all headers | View raw


On Sat, Feb 27, 2016 at 8:49 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> On Thu, 25 Feb 2016 06:30 pm, Chris Angelico wrote:
>
>> On Thu, Feb 25, 2016 at 5:50 PM, Steven D'Aprano
>> <steve+comp.lang.python@pearwood.info> wrote:
>>>
>>> # Read a chunk of bytes/characters from an open file.
>>> def chunkiter(f, delim):
>>>     buffer = []
>>>     b = f.read(1)
>>>     while b:
>>>         buffer.append(b)
>>>         if b in delim:
>>>             yield ''.join(buffer)
>>>             buffer = []
>>>         b = f.read(1)
>>>     if buffer:
>>>         yield ''.join(buffer)
>>
>> How bad is it if you over-read?
>
> Pretty bad :-)
>
> Ideally, I'd rather not over-read at all. I'd like the user to be able to
> swap from "read N bytes" to "read to the next delimiter" (and possibly
> even "read the next line") without losing anything.

If those are the *only* two operations, you should be able to maintain
your own buffer. Something like this:

class ChunkIter:
    def __init__(self, f, delim):
        self.f = f
        self.delim = re.compile("["+delim+"]")
        self.buffer = ""
    def read_to_delim(self):
        """Return characters up to the next delim, or remaining chars,
or "" if at EOF"""
        while "delimiter not found":
            *parts, self.buffer = self.delim.split(self.buffer, 1)
            if parts: return parts[0]
            b = self.f.read(256)
            if not b: return self.buffer
            self.buffer += b
    def read(self, nbytes):
        need = nbytes - len(self.buffer)
        if need > 0: self.buffer += self.f.read(need)
        ret, self.buffer = self.buffer[:need], self.buffer[need:]
        return ret

It still might over-read from the underlying file, but those extra
chars will be available to the read(N) function.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-02-25 17:50 +1100
  Re: How to read from a file to an arbitrary delimiter efficiently? Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2016-02-25 08:37 +0100
    Re: How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve@pearwood.info> - 2016-02-27 21:40 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Dan Sommers <dan@tombstonezero.net> - 2016-02-27 14:40 +0000
      Re: How to read from a file to an arbitrary delimiter efficiently? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-02-27 12:03 -0500
        Re: How to read from a file to an arbitrary delimiter efficiently? Marko Rauhamaa <marko@pacujo.net> - 2016-02-27 19:47 +0200
  Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-25 18:30 +1100
    Re: How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve@pearwood.info> - 2016-02-27 20:49 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-27 23:17 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-27 23:18 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Serhiy Storchaka <storchaka@gmail.com> - 2016-02-27 17:23 +0200
  Re: How to read from a file to an arbitrary delimiter efficiently? Paul Rubin <no.email@nospam.invalid> - 2016-02-24 23:48 -0800
    Re: How to read from a file to an arbitrary delimiter efficiently? wxjmfauth@gmail.com - 2016-02-25 06:37 -0800
    Re: How to read from a file to an arbitrary delimiter efficiently? wxjmfauth@gmail.com - 2016-02-25 06:38 -0800
  Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 16:35 +0000
    Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 20:03 +0000
      Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 20:28 +0000
  Re: How to read from a file to an arbitrary delimiter efficiently? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-02-28 20:28 +0000
  Re: How to read from a file to an arbitrary delimiter efficiently? Tim Delaney <timothy.c.delaney@gmail.com> - 2016-02-29 08:00 +1100

csiph-web