Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #103582
| Path | csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail |
|---|---|
| From | Chris Angelico <rosuav@gmail.com> |
| Newsgroups | comp.lang.python |
| Subject | Re: How to read from a file to an arbitrary delimiter efficiently? |
| Date | Sat, 27 Feb 2016 23:17:36 +1100 |
| Lines | 54 |
| Message-ID | <mailman.173.1456575458.20994.python-list@python.org> (permalink) |
| References | <56cea44e$0$11128$c3e8da3@news.astraweb.com> <mailman.116.1456385901.20994.python-list@python.org> <56d17138$0$1605$c3e8da3$5496439d@news.astraweb.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=UTF-8 |
| X-Trace | news.uni-berlin.de xdlSeiKh4UZZp6/02FUL+Qqj97TOuHbxIxFBAYkdnMjg== |
| Return-Path | <rosuav@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'received:209.85.223': 0.03; 'chunk': 0.07; 'remaining': 0.07; 'subject:file': 0.07; 'cc:addr:python-list': 0.09; 'subject:How': 0.09; '"""return': 0.09; 'buffer.': 0.09; 'self.buffer': 0.09; 'underlying': 0.09; ':-)': 0.12; 'def': 0.13; 'file,': 0.15; 'thu,': 0.15; '*only*': 0.16; '2016': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'ideally,': 0.16; 'line")': 0.16; 'losing': 0.16; 'read(self,': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'wrote:': 0.16; '>>>': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'file.': 0.22; 'feb': 0.23; 'sat,': 0.23; 'this:': 0.23; 'header:In-Reply-To:1': 0.24; 'all.': 0.24; 'chris': 0.26; 'message-id:@mail.gmail.com': 0.27; 'operations,': 0.27; 'yield': 0.27; "skip:' 10": 0.28; 'ret': 0.29; "i'd": 0.31; 'skip:_ 10': 0.32; 'possibly': 0.32; 'class': 0.33; "d'aprano": 0.33; 'steven': 0.33; 'open': 0.33; 'received:google.com': 0.35; 'next': 0.35; 'something': 0.35; 'but': 0.36; 'should': 0.36; 'received:209.85': 0.36; '(and': 0.36; 'subject:?': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'two': 0.37; 'received:209': 0.38; 'subject:from': 0.39; 'rather': 0.39; 'still': 0.40; 'your': 0.60; 'chrisa': 0.84; 'subject:read': 0.84; 'to:none': 0.91 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc; bh=3Tkcm3b2YVmAXGEuj3zn40QVKh2XXOnCFGLGYPoB7yM=; b=bywFrybbCn3+tyVTvIp8CSIWnOclugNrAiDS3qntGuvH5jy6jWgvu3TjIlY9TlUcg0 ttb5GKIH9MhNzK0frTrMWfdlmW9OoaOMNAWm7Sz+g2vcM2+gPrftfPsAZfI8+UF39V5D Iu3ZmX1tXC4Cl28UAtO2GfpvS+RcUlfLx3p1GRPxLJKhIYjvWhDcRv4pFwDp52ZVI/eY OCnK+/cap2BxiD5yL7g4SGW5raVHngilYxhQAVy0pah1PSogV++hlydUxKPwSwwxyQKX FBuoQ+YpKetwumXbkQqzePYHmHYNsl4TZY8CzzakAJ+M99LTN4vXfV11DsZ6yAKM35B8 h2tg== |
| X-Google-DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:cc; bh=3Tkcm3b2YVmAXGEuj3zn40QVKh2XXOnCFGLGYPoB7yM=; b=GUQFoQx1k5Va7rVJcoGFDxYFEO3q6n1vOmvHaKWuLUmZuh3cP2gnf1eUGoFV4uLoVL UuBdFARFZhaiMQH94S1wvC5gl/7r/9WnrNyIpA1Ix/SKU4sEAb08mPvmvsuYMIqSDlMX wWJ70qR66ypNRGiGi1OFP+ss9q/TJSFb85U5Qm+yLC9zLZTNElNiwvlLu2q2CDZROOtR e7l26O616labAEhGZhnbkwjFLBxMKJjD5hSZaX2moik02WA8iL2ijULNIGI9njUHM6yq 08LI6zKQA5XwhRqVrwcTF3N48l6L3Pw4cQ0lIa8hoTsvs+ydK5b879yR8zd6eVmgnKHi N2ag== |
| X-Gm-Message-State | AG10YOSojcZhQ7G5WP0e/ErD3s+kcKa2r5W/0eZfbeOHXJ7fx5UEY857K2KMcq5eQ7fuWdnTaOf50FS1Ufk/hw== |
| X-Received | by 10.107.47.162 with SMTP id v34mr11328659iov.19.1456575456642; Sat, 27 Feb 2016 04:17:36 -0800 (PST) |
| In-Reply-To | <56d17138$0$1605$c3e8da3$5496439d@news.astraweb.com> |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.21rc2 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Xref | csiph.com comp.lang.python:103582 |
Show key headers only | View raw
On Sat, Feb 27, 2016 at 8:49 PM, Steven D'Aprano <steve@pearwood.info> wrote:
> On Thu, 25 Feb 2016 06:30 pm, Chris Angelico wrote:
>
>> On Thu, Feb 25, 2016 at 5:50 PM, Steven D'Aprano
>> <steve+comp.lang.python@pearwood.info> wrote:
>>>
>>> # Read a chunk of bytes/characters from an open file.
>>> def chunkiter(f, delim):
>>> buffer = []
>>> b = f.read(1)
>>> while b:
>>> buffer.append(b)
>>> if b in delim:
>>> yield ''.join(buffer)
>>> buffer = []
>>> b = f.read(1)
>>> if buffer:
>>> yield ''.join(buffer)
>>
>> How bad is it if you over-read?
>
> Pretty bad :-)
>
> Ideally, I'd rather not over-read at all. I'd like the user to be able to
> swap from "read N bytes" to "read to the next delimiter" (and possibly
> even "read the next line") without losing anything.
If those are the *only* two operations, you should be able to maintain
your own buffer. Something like this:
class ChunkIter:
def __init__(self, f, delim):
self.f = f
self.delim = re.compile("["+delim+"]")
self.buffer = ""
def read_to_delim(self):
"""Return characters up to the next delim, or remaining chars,
or "" if at EOF"""
while "delimiter not found":
*parts, self.buffer = self.delim.split(self.buffer, 1)
if parts: return parts[0]
b = self.f.read(256)
if not b: return self.buffer
self.buffer += b
def read(self, nbytes):
need = nbytes - len(self.buffer)
if need > 0: self.buffer += self.f.read(need)
ret, self.buffer = self.buffer[:need], self.buffer[need:]
return ret
It still might over-read from the underlying file, but those extra
chars will be available to the read(N) function.
ChrisA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-02-25 17:50 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2016-02-25 08:37 +0100
Re: How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve@pearwood.info> - 2016-02-27 21:40 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Dan Sommers <dan@tombstonezero.net> - 2016-02-27 14:40 +0000
Re: How to read from a file to an arbitrary delimiter efficiently? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-02-27 12:03 -0500
Re: How to read from a file to an arbitrary delimiter efficiently? Marko Rauhamaa <marko@pacujo.net> - 2016-02-27 19:47 +0200
Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-25 18:30 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve@pearwood.info> - 2016-02-27 20:49 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-27 23:17 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-27 23:18 +1100
Re: How to read from a file to an arbitrary delimiter efficiently? Serhiy Storchaka <storchaka@gmail.com> - 2016-02-27 17:23 +0200
Re: How to read from a file to an arbitrary delimiter efficiently? Paul Rubin <no.email@nospam.invalid> - 2016-02-24 23:48 -0800
Re: How to read from a file to an arbitrary delimiter efficiently? wxjmfauth@gmail.com - 2016-02-25 06:37 -0800
Re: How to read from a file to an arbitrary delimiter efficiently? wxjmfauth@gmail.com - 2016-02-25 06:38 -0800
Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 16:35 +0000
Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 20:03 +0000
Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 20:28 +0000
Re: How to read from a file to an arbitrary delimiter efficiently? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-02-28 20:28 +0000
Re: How to read from a file to an arbitrary delimiter efficiently? Tim Delaney <timothy.c.delaney@gmail.com> - 2016-02-29 08:00 +1100
csiph-web