Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #103606

Re: How to read from a file to an arbitrary delimiter efficiently?

From BartC <bc@freeuk.com>
Newsgroups comp.lang.python
Subject Re: How to read from a file to an arbitrary delimiter efficiently?
Date 2016-02-27 20:03 +0000
Organization A noiseless patient Spider
Message-ID <nasv9k$ij8$1@dont-email.me> (permalink)
References <56cea44e$0$11128$c3e8da3@news.astraweb.com> <nasj2p$hec$1@dont-email.me>

Show all headers | View raw


On 27/02/2016 16:35, BartC wrote:
> On 25/02/2016 06:50, Steven D'Aprano wrote:
>> I have a need to read to an arbitrary delimiter, which might be any of a
>> (small) set of characters. For the sake of the exercise, lets say it is
>> either ! or ? (for example).

> However those aren't the main reasons for the poor speed. The limiting
> factor here is reading one byte at a time. Just a loop like this:
>
>     while f.read(1):
>        pass
>
> without doing anything else, seems to take most of the time. (3.6
> seconds, compared with 5.6 seconds of your readchunks() on a 6MB version
> of your test file, on Python 2.7. readlines() took about 0.2 seconds.)
>
> Any faster solutions would need to read more than one byte at a time.

I've done some more test using Python 3.4, with the same 200,000 line 
6MB test file:

0.25 seconds       Scan the file with 'for line in f'
2.25 seconds       Scan the file with your readlines() routine
4.0  seconds       Scan the file with your readchunks() routine
0.65 seconds       Scan the file with using a buffer

This latter test uses a 64-byte buffer, reading not more than an extra 
63 bytes, but resetting the file position to just past the end of of 
each identified chunk so that any subsequent read works as expected.

This test (the code is too untidy to post) only checks for two specific 
delimiters (not an arbitrary string fill of them). (It also counts EOF 
as a valid delimiter so counts one more chunk.)

Increasing the buffer size doesn't help, and beyond 256 bytes slowed 
things down (for this input) as it spends too long rereading data.

-- 
Bartc

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-02-25 17:50 +1100
  Re: How to read from a file to an arbitrary delimiter efficiently? Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2016-02-25 08:37 +0100
    Re: How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve@pearwood.info> - 2016-02-27 21:40 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Dan Sommers <dan@tombstonezero.net> - 2016-02-27 14:40 +0000
      Re: How to read from a file to an arbitrary delimiter efficiently? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-02-27 12:03 -0500
        Re: How to read from a file to an arbitrary delimiter efficiently? Marko Rauhamaa <marko@pacujo.net> - 2016-02-27 19:47 +0200
  Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-25 18:30 +1100
    Re: How to read from a file to an arbitrary delimiter efficiently? Steven D'Aprano <steve@pearwood.info> - 2016-02-27 20:49 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-27 23:17 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Chris Angelico <rosuav@gmail.com> - 2016-02-27 23:18 +1100
      Re: How to read from a file to an arbitrary delimiter efficiently? Serhiy Storchaka <storchaka@gmail.com> - 2016-02-27 17:23 +0200
  Re: How to read from a file to an arbitrary delimiter efficiently? Paul Rubin <no.email@nospam.invalid> - 2016-02-24 23:48 -0800
    Re: How to read from a file to an arbitrary delimiter efficiently? wxjmfauth@gmail.com - 2016-02-25 06:37 -0800
    Re: How to read from a file to an arbitrary delimiter efficiently? wxjmfauth@gmail.com - 2016-02-25 06:38 -0800
  Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 16:35 +0000
    Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 20:03 +0000
      Re: How to read from a file to an arbitrary delimiter efficiently? BartC <bc@freeuk.com> - 2016-02-27 20:28 +0000
  Re: How to read from a file to an arbitrary delimiter efficiently? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-02-28 20:28 +0000
  Re: How to read from a file to an arbitrary delimiter efficiently? Tim Delaney <timothy.c.delaney@gmail.com> - 2016-02-29 08:00 +1100

csiph-web