Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #103660

Re: How to read from a file to an arbitrary delimiter efficiently?

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Oscar Benjamin <oscar.j.benjamin@gmail.com>
Newsgroups comp.lang.python
Subject Re: How to read from a file to an arbitrary delimiter efficiently?
Date Sun, 28 Feb 2016 20:28:29 +0000
Lines 39
Message-ID <mailman.24.1456691337.9760.python-list@python.org> (permalink)
References <56cea44e$0$11128$c3e8da3@news.astraweb.com>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8
X-Trace news.uni-berlin.de mjVb8L/QUfJaVqLUAG531AHnGznxYQNfJJ/0FdMnChVQ==
Return-Path <oscar.j.benjamin@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'method.': 0.05; 'chunk': 0.07; 'subject:file': 0.07; 'cc:addr:python-list': 0.09; 'subject:How': 0.09; 'sake': 0.09; 'index': 0.13; 'def': 0.13; '(small)': 0.16; '-1:': 0.16; '100x': 0.16; '2016': 0.16; 'cc:name:python list': 0.16; 'chunk:': 0.16; 'delimiters': 0.16; 'dropping': 0.16; 'efficiently.': 0.16; 'example).': 0.16; 'f.seek(0)': 0.16; 'iterating': 0.16; 'mmap': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'to:addr:pearwood.info': 0.16; 'to:addr:steve+comp.lang.python': 0.16; "to:name:steven d'aprano": 0.16; 'true:': 0.16; 'wrote:': 0.16; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'pass': 0.22; 'header:In-Reply-To:1': 0.24; 'message-id:@mail.gmail.com': 0.27; "i'm": 0.30; "d'aprano": 0.33; 'lets': 0.33; 'steven': 0.33; 'though.': 0.33; '(for': 0.34; 'file': 0.34; 'received:google.com': 0.35; 'attempt': 0.35; 'something': 0.35; 'but': 0.36; 'there': 0.36; 'lines': 0.36; 'received:209.85': 0.36; 'faster': 0.36; 'subject:?': 0.36; 'subject:: ': 0.37; 'say': 0.37; 'received:209': 0.38; 'stuff': 0.38; 'files': 0.38; 'end': 0.39; 'sure': 0.39; 'subject:from': 0.39; 'still': 0.40; 'back': 0.62; 'times': 0.63; '100': 0.79; 'exercise,': 0.84; 'oscar': 0.84; 'overhead,': 0.84; 'subject:read': 0.84
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=wttnAki6hFu8ITJNdCXtJwv9miwB61KEp0S4zU9Njy8=; b=YCTgkUDn6xROkKIFcbLfJZyhEDHQnp8JLZUIkxSNiJfi6apt1rZ10YiNeNoTMfroTa mODc2cCPmTMfBlDwRRDKjHY4goZh3/zMJzZ4c94aSrhWg1OpEMw8GVw8+yTWcYLNVAsb YEtxocdMg+68NW4i+7O8z1xm6zlFNN+Eltd9pJcerwkQNfajXwJImF/bNi4ZDiY9AWoT k+i7hOnVfEmM7L+2TCVlhD7jTdrp6RzlMIqaMl+miFLs+wHRFyXcEYP2IfGd98ao0b3Y scJI9tBLlRsyBCuSdqsDl3r5k9bxVbyT3MEvi5ZNX8mmsyyfdapzxvQhnFHu+iYJoM9a IgAw==
X-Google-DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=wttnAki6hFu8ITJNdCXtJwv9miwB61KEp0S4zU9Njy8=; b=nMdQG8ZtWzicS1t+19p3grqPxu9zK2gIr3g1su9szG0Fmo9quSe153GlPrGLbIbxl+ YfVPIBr9ZcfgmoLfVWSi0AjfN+dZCaOaz2XHPjY4Rza4Beqj4zhsCJEwRDJzo9C0QeCO KafXy48CRiGnMbRicGoVmOPpNwBu5amGjlfSaV6e+sOAUlYtfb9ZNvoWdSAlggXWK0iv 987P4RzQXUIjvr4QlWLiQ4btK2LQGH3Skrgv2PkGuspsLf5XCYa79FBsobuecYInvAuZ CosE1teELgagCtaX8H2e0DB3oUdqcFQVeNxJHUXDpmXwcANX3EyJFE/I0rTi6YjvVSRr pKdw==
X-Gm-Message-State AD7BkJIwL6FWw7a0dLGB+ZBzT+Bw0KCCSOd60yyeI9DqVz+ES1yqJwNkmljeNnEATVVboMX1yvkcytJYC8di7A==
X-Received by 10.25.210.4 with SMTP id j4mr4454029lfg.130.1456691329363; Sun, 28 Feb 2016 12:28:49 -0800 (PST)
In-Reply-To <56cea44e$0$11128$c3e8da3@news.astraweb.com>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.21rc2
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:103660

Show key headers only | View raw


On 25 February 2016 at 06:50, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
>
> I have a need to read to an arbitrary delimiter, which might be any of a
> (small) set of characters. For the sake of the exercise, lets say it is
> either ! or ? (for example).
>
> I want to read from files reasonably efficiently. I don't mind if there is a
> little overhead, but my first attempt is 100 times slower than the built-in
> "read to the end of the line" method.

You can get something much faster using mmap and searching for a
single delimiter:

def readuntil(m, delim):
    start = m.tell()
    index = m.find(delim, start)
    if index == -1:
        return m.read()
    else:
        return m.read(index - start)

def readmmap(f):
    m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
    f.seek(0)
    while True:
        chunk = readuntil(m, b'!') # Note byte-string
        if not chunk:
            return
        # Do stuff with chunk
        pass

My timing makes that ~7x slower than iterating over the lines of the
file but still around 100x faster than reading individual characters.
I'm not sure how to generalise it to looking for multiple delimiters
without dropping back to reading individual characters though.

--
Oscar

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-02-25 17:50 +1100
  Re: How to read from a file to an arbitrary delimiter efficiently? Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2016-02-25 08:37 +0100
    Re: How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve@pearwood.info> - 2016-02-27 21:40 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Dan Sommers <dan@tombstonezero.net> - 2016-02-27 14:40 +0000
      Re: How to read from a file to an arbitrary delimiter efficiently? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-02-27 12:03 -0500
        Re: How to read from a file to an arbitrary delimiter efficiently? Marko Rauhamaa <marko@pacujo.net> - 2016-02-27 19:47 +0200
  Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-25 18:30 +1100
    Re: How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve@pearwood.info> - 2016-02-27 20:49 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-27 23:17 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-27 23:18 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Serhiy Storchaka <storchaka@gmail.com> - 2016-02-27 17:23 +0200
  Re: How to read from a file to an arbitrary delimiter efficiently? Paul Rubin <no.email@nospam.invalid> - 2016-02-24 23:48 -0800
    Re: How to read from a file to an arbitrary delimiter efficiently? wxjmfauth@gmail.com - 2016-02-25 06:37 -0800
    Re: How to read from a file to an arbitrary delimiter efficiently? wxjmfauth@gmail.com - 2016-02-25 06:38 -0800
  Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 16:35 +0000
    Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 20:03 +0000
      Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 20:28 +0000
  Re: How to read from a file to an arbitrary delimiter efficiently? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-02-28 20:28 +0000
  Re: How to read from a file to an arbitrary delimiter efficiently? Tim Delaney <timothy.c.delaney@gmail.com> - 2016-02-29 08:00 +1100

csiph-web