Path: csiph.com!usenet.pasdenom.info!news.etla.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <atkvgbFto6uU1@mid.individual.net>
References: <mailman.855.1366477790.3114.python-list@python.org> <atkvgbFto6uU1@mid.individual.net>
From: Oscar Benjamin <oscar.j.benjamin@gmail.com>
Date: Mon, 22 Apr 2013 15:49:49 +0100
Subject: Re: itertools.groupby
To: Neil Cerutti <neilc@norwich.edu>
Content-Type: text/plain; charset=ISO-8859-1
Cc: python-list@python.org
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.919.1366642212.3114.python-list@python.org>
Lines: 38
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:44080

On 22 April 2013 15:24, Neil Cerutti <neilc@norwich.edu> wrote:
>
> Hrmmm, hoomm. Nobody cares for slicing any more.
>
> def headered_groups(lst, header):
>     b = lst.index(header) + 1
>     while True:
>         try:
>             e = lst.index(header, b)
>         except ValueError:
>             yield lst[b:]
>             break
>         yield lst[b:e]
>         b = e+1

This requires the whole file to be read into memory. Iterators are
typically preferred over list slicing for sequential text file access
since you can avoid loading the whole file at once. This means that
you can process a large file while only using a constant amount of
memory.

>
> for group in headered_groups([line.strip() for line in open('data.txt')],
>         "Starting a new group"):
>     print(group)

The list comprehension above loads the entire file into memory.
Assuming that .strip() is just being used to remove the newline at the
end it would be better to just use the readlines() method since that
loads everything into memory and removes the newlines. To remove them
without reading everything you can use map (or imap in Python 2):

with open('data.txt') as inputfile:
    for group in headered_groups(map(str.strip, inputfile)):
        print(group)


Oscar