Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Chris Angelico <rosuav@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: How to read from a file to an arbitrary delimiter efficiently?
Date: Thu, 25 Feb 2016 18:30:25 +1100
Lines: 35
Message-ID: <mailman.116.1456385901.20994.python-list@python.org>
References: <56cea44e$0$11128$c3e8da3@news.astraweb.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
In-Reply-To: <56cea44e$0$11128$c3e8da3@news.astraweb.com>
Precedence: list
Xref: csiph.com comp.lang.python:103483

On Thu, Feb 25, 2016 at 5:50 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
>
> # Read a chunk of bytes/characters from an open file.
> def chunkiter(f, delim):
>     buffer = []
>     b = f.read(1)
>     while b:
>         buffer.append(b)
>         if b in delim:
>             yield ''.join(buffer)
>             buffer = []
>         b = f.read(1)
>     if buffer:
>         yield ''.join(buffer)

How bad is it if you over-read? If it's absolutely critical that you
not read anything from the buffer that you shouldn't, then yeah, it's
going to be slow. But if you're never going to read the file using
anything other than this iterator, the best thing to do is to read
more at a time. Simple and naive method:

def chunkiter(f, delim):
    """Don't use [ or ] as the delimiter, kthx"""
    buffer = ""
    b = f.read(256)
    while b:
        buffer += b
        *parts, buffer = re.split("["+delim+"]", buffer)
        yield from parts
    if buffer: yield buffer

How well does that perform?

ChrisA