Path: csiph.com!eternal-september.org!feeder.eternal-september.org!mx02.eternal-september.org!.POSTED!not-for-mail From: Paul Rubin Newsgroups: comp.lang.python Subject: Re: How to read from a file to an arbitrary delimiter efficiently? Date: Wed, 24 Feb 2016 23:48:59 -0800 Organization: A noiseless patient Spider Lines: 27 Message-ID: <871t81w7pw.fsf@jester.gateway.pace.com> References: <56cea44e$0$11128$c3e8da3@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: mx02.eternal-september.org; posting-host="560a36ee31cc4bcf69e115b311f0cc5c"; logging-data="28563"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19/mXue/hwMdwyTpFO/8qBr" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) Cancel-Lock: sha1:BVTrwCDpEkGk2vyfnU/Y7wd2rhg= sha1:h6vISi6TF/q9GuHa1hofdMPQbFQ= Xref: csiph.com comp.lang.python:103484 Steven D'Aprano writes: > while b: > buffer.append(b) This looks bad because of the overhead of list elements, and also the reading of 1 char at a time. If it's bytes that you're reading, try using bytearray instead of list: def chunkiter(f,delim): buf = bytearray() bufappend = buf.append # avoid an attribute lookup when calling fread = f.read # similar while True: c = fread(1) bufappend(c) if c in delim: yield str(buf) del buf[:] If that's still not fast enough, you could do a more hacky thing of reading large chunks of input at once (f.read(4096) or whatever), splitting on the delimiter set with re.split, and yielding the split output, refilling the buffer when you don't find more delimiters. That doesn't tell you what delimiters actually match: do you need that? Maybe there is nicer a way to get at it than adding up the lengths of the chunks to index into the buffer. How large do you expect the chunks to be?