Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #44080

Re: itertools.groupby

Path csiph.com!usenet.pasdenom.info!news.etla.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <oscar.j.benjamin@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.005
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'memory.': 0.07; 'removes': 0.07; 'assuming': 0.09; 'newline': 0.09; 'try:': 0.09; 'valueerror:': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'def': 0.12; '2):': 0.16; 'cares': 0.16; 'iterators': 0.16; 'once.': 0.16; 'sequential': 0.16; 'true:': 0.16; 'wrote:': 0.18; 'memory': 0.22; 'preferred': 0.22; 'cc:addr:python.org': 0.22; '(or': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; 'header:In-Reply-To:1': 0.27; 'message-id:@mail.gmail.com': 0.30; 'constant': 0.31; 'loading': 0.31; 'loads': 0.31; 'subject:skip:i 10': 0.31; 'file': 0.32; 'text': 0.33; 'except': 0.35; 'received:google.com': 0.35; 'yield': 0.36; 'method': 0.36; 'list': 0.37; 'being': 0.38; 'remove': 0.60; 'read': 0.60; 'break': 0.61; 'new': 0.61; 'entire': 0.61; 'map': 0.64; 'nobody': 0.68; 'oscar': 0.84; '2013': 0.98
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=7BQmAGVYKEWmE2PtS84GkjegOOMz3ouDLPMoYgFJWBE=; b=hqmkG/L09aoLKEtheAXTImw/FwQkebX179xDd2zcc2LS3sKuIk6gMFx2x9ytfzO2kV kWHpZxZjeoifLd+xZx85yBURf3yawLY+hcVVVZ5B8czkMT+fMBuBNpCyACL00SFERgjZ MJiMNtNO9ejVCottWdligkGUNfVJ2x4OuQafKKCNBdFJXRDN1CKv7bUHZ1ciaOm/Sqgb TisskHSLIDUq8dD3B/Tv5E9e/Y4YiiUiYReho7J6idGXCwTPmDMt+nFkLBarWI5x/vGN lmo5g+FveLQO8g+DsIe2nhslXSMQp/8rGdKdPts/mrTCAdOK8fsuHQM1KhpeoRXcoX4u YdXg==
X-Received by 10.58.220.129 with SMTP id pw1mr19430460vec.32.1366642209760; Mon, 22 Apr 2013 07:50:09 -0700 (PDT)
MIME-Version 1.0
In-Reply-To <atkvgbFto6uU1@mid.individual.net>
References <mailman.855.1366477790.3114.python-list@python.org> <atkvgbFto6uU1@mid.individual.net>
From Oscar Benjamin <oscar.j.benjamin@gmail.com>
Date Mon, 22 Apr 2013 15:49:49 +0100
Subject Re: itertools.groupby
To Neil Cerutti <neilc@norwich.edu>
Content-Type text/plain; charset=ISO-8859-1
Cc python-list@python.org
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.919.1366642212.3114.python-list@python.org> (permalink)
Lines 38
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1366642212 news.xs4all.nl 2212 [2001:888:2000:d::a6]:56228
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:44080

Show key headers only | View raw


On 22 April 2013 15:24, Neil Cerutti <neilc@norwich.edu> wrote:
>
> Hrmmm, hoomm. Nobody cares for slicing any more.
>
> def headered_groups(lst, header):
>     b = lst.index(header) + 1
>     while True:
>         try:
>             e = lst.index(header, b)
>         except ValueError:
>             yield lst[b:]
>             break
>         yield lst[b:e]
>         b = e+1

This requires the whole file to be read into memory. Iterators are
typically preferred over list slicing for sequential text file access
since you can avoid loading the whole file at once. This means that
you can process a large file while only using a constant amount of
memory.

>
> for group in headered_groups([line.strip() for line in open('data.txt')],
>         "Starting a new group"):
>     print(group)

The list comprehension above loads the entire file into memory.
Assuming that .strip() is just being used to remove the newline at the
end it would be better to just use the readlines() method since that
loads everything into memory and removes the newlines. To remove them
without reading everything you can use map (or imap in Python 2):

with open('data.txt') as inputfile:
    for group in headered_groups(map(str.strip, inputfile)):
        print(group)


Oscar

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

itertools.groupby Jason Friedman <jsf80238@gmail.com> - 2013-04-20 11:09 -0600
  Re: itertools.groupby Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-21 00:13 +0000
    Re: itertools.groupby Joshua Landau <joshua.landau.ws@gmail.com> - 2013-04-22 04:09 +0100
  Re: itertools.groupby Neil Cerutti <neilc@norwich.edu> - 2013-04-22 14:24 +0000
    Re: itertools.groupby Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 15:49 +0100
      Re: itertools.groupby Neil Cerutti <neilc@norwich.edu> - 2013-04-22 15:04 +0000
    Re: itertools.groupby Chris Angelico <rosuav@gmail.com> - 2013-04-23 01:14 +1000

csiph-web