Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'output': 0.05; 'odd': 0.07; 'python3': 0.07; 'remaining': 0.07; 'iterate': 0.09; 'lines:': 0.09; 'none)': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'def': 0.12; 'wrote': 0.14; "'b',": 0.16; "['a',": 0.16; 'header:': 0.16; 'instead:': 0.16; 'itertools': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'refactoring': 0.16; "where's": 0.16; 'alpha': 0.16; 'wrote:': 0.18; 'code.': 0.18; 'import': 0.22; 'header:User-Agent:1': 0.23; 'initial': 0.24; 'this:': 0.26; 'header:X-Complaints-To:1': 0.27; 'point': 0.28; 'function': 0.29; 'wondering': 0.29; "doesn't": 0.30; 'skip:g 30': 0.30; "i'm": 0.30; 'code': 0.31; 'header,': 0.31; 'subject:skip:i 10': 0.31; 'file': 0.32; 'skip:- 30': 0.32; 'skip:# 10': 0.33; 'could': 0.34; 'problem': 0.35; 'but': 0.35; 'there': 0.35; 'yield': 0.36; 'list': 0.37; 'starting': 0.37; 'jason': 0.38; 'handle': 0.38; 'to:addr:python-list': 0.38; 'that,': 0.38; 'to:addr:python.org': 0.39; 'skip:p 20': 0.39; 'received:org': 0.40; 'how': 0.40; 'new': 0.61; 'first': 0.61; 'such': 0.63; 'as:': 0.81; "'2',": 0.84; "'3',": 0.84; 'case?': 0.84; 'silently': 0.84; 'lists:': 0.91; 'wanting': 0.93 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Peter Otten <__peter__@web.de> Subject: Re: itertools.groupby Date: Sun, 21 Apr 2013 10:28:12 +0200 Organization: None References: Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Gmane-NNTP-Posting-Host: p508494be.dip0.t-ipconnect.de User-Agent: KNode/4.7.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 121 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1366533018 news.xs4all.nl 2193 [2001:888:2000:d::a6]:42267 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:43992 Jason Friedman wrote: > I have a file such as: > > $ cat my_data > Starting a new group > a > b > c > Starting a new group > 1 > 2 > 3 > 4 > Starting a new group > X > Y > Z > Starting a new group > > I am wanting a list of lists: > ['a', 'b', 'c'] > ['1', '2', '3', '4'] > ['X', 'Y', 'Z'] > [] > > I wrote this: > ------------------------------------ > #!/usr/bin/python3 > from itertools import groupby > > def get_lines_from_file(file_name): > with open(file_name) as reader: > for line in reader.readlines(): readlines() slurps the whole file into memory! Don't do that, iterate over the file directly instead: for line in reader: > yield(line.strip()) > > counter = 0 > def key_func(x): > if x.startswith("Starting a new group"): > global counter > counter += 1 > return counter > > for key, group in groupby(get_lines_from_file("my_data"), key_func): > print(list(group)[1:]) > ------------------------------------ > > I get the output I desire, but I'm wondering if there is a solution > without the global counter. If you were to drop the empty groups you could simplify it to def is_header(line): return line.startswith("Starting a new group") with open("my_data") as lines: stripped_lines = (line.strip() for line in lines) for header, group in itertools.groupby(stripped_lines, key=is_header): if not header: print(list(group)) And here's a refactoring for your initial code. The main point is the use of nonlocal instead of global state to make the function reentrant. def split_groups(items, header): odd = True def group_key(item): nonlocal odd if header(item): odd = not odd return odd for _key, group in itertools.groupby(items, key=group_key): yield itertools.islice(group, 1, None) def is_header(line): return line.startswith("Starting a new group") with open("my_data") as lines: stripped_lines = map(str.strip, lines) for group in split_groups(stripped_lines, header=is_header): print(list(group)) One remaining problem with that code is that it will silently drop the first line of the file if it doesn't start with a header: $ cat my_data alpha beta gamma Starting a new group a b c Starting a new group Starting a new group 1 2 3 4 Starting a new group X Y Z Starting a new group $ python3 group.py ['beta', 'gamma'] # where's alpha? ['a', 'b', 'c'] [] ['1', '2', '3', '4'] ['X', 'Y', 'Z'] [] How do you want to handle that case?