Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #12283

Re: Record seperator

Path csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!news.glorb.com!solaris.cc.vt.edu!news.vt.edu!newsfeed-00.mathworks.com!panix!roy
From Roy Smith <roy@panix.com>
Newsgroups comp.lang.python
Subject Re: Record seperator
Date Sat, 27 Aug 2011 13:45:31 -0400
Organization PANIX Public Access Internet and UNIX, NYC
Lines 30
Message-ID <roy-F7BDDC.13453127082011@news.panix.com> (permalink)
References <slrnj5fo7u.4ra.greymausg@hmaus.org> <mailman.451.1314385354.27778.python-list@python.org> <slrnj5i1g9.581.greymausg@hmaus.org> <4e592852$0$29965$c3e8da3$5496439d@news.astraweb.com>
NNTP-Posting-Host localhost
X-Trace reader1.panix.com 1314467133 23280 127.0.0.1 (27 Aug 2011 17:45:33 GMT)
X-Complaints-To abuse@panix.com
NNTP-Posting-Date Sat, 27 Aug 2011 17:45:33 +0000 (UTC)
User-Agent MT-NewsWatcher/3.5.3b3 (Intel Mac OS X)
Xref x330-a1.tempe.blueboxinc.net comp.lang.python:12283

Show key headers only | View raw


In article <4e592852$0$29965$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> open("file.txt")   # opens the file
>  .read()           # reads the contents of the file
>  .split("\n\n")    # splits the text on double-newlines.

The biggest problem with this code is that read() slurps the entire file 
into a string.  That's fine for moderately sized files, but will fail 
(or at least be grossly inefficient) for very large files.

It's always annoyed me a little that while it's easy to iterate over the 
lines of a file, it's more complicated to iterate over a file character 
by character.  You could write your own generator to do that:

for c in getchar(open("file.txt")):
   whatever

def getchar(f):
   for line in f:
      for c in line:
         yield c

but that's annoyingly verbose (and probably not hugely efficient).

Of course, the next problem for the specific problem at hand is that 
even with an iterator over the characters of a file, split() only works 
on strings.  It would be nice to have a version of split which took an 
iterable and returned an iterator over the split components.  Maybe 
there is such a thing and I'm just missing it?

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Record seperator greymaus <greymausg@mail.com> - 2011-08-26 18:39 +0000
  Re: Record seperator "D'Arcy J.M. Cain" <darcy@druid.net> - 2011-08-26 15:02 -0400
    Re: Record seperator greymaus <greymausg@mail.com> - 2011-08-27 16:59 +0000
      Re: Record seperator Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-08-28 03:24 +1000
        Re: Record seperator Roy Smith <roy@panix.com> - 2011-08-27 13:45 -0400
          Re: Record seperator ChasBrown <cbrown@cbrownsystems.com> - 2011-08-27 11:40 -0700
          Re: Record seperator Terry Reedy <tjreedy@udel.edu> - 2011-08-27 16:03 -0400
            Re: Record seperator Roy Smith <roy@panix.com> - 2011-08-27 17:07 -0400
              Re: Record seperator Terry Reedy <tjreedy@udel.edu> - 2011-08-27 20:55 -0400
          Re: Record seperator Chris Angelico <rosuav@gmail.com> - 2011-08-28 06:07 +1000
        Re: Record seperator greymaus <greymausg@mail.com> - 2011-08-28 10:03 +0000

csiph-web