Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #54382

Re: iterating over a file with two pointers

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder5.xlned.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <roy@panix.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.002
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'subject:file': 0.07; 'subject:two': 0.07; 'versions.': 0.07; '34,': 0.09; 'iterate': 0.09; 'performs': 0.09; 'cc:addr:python- list': 0.11; 'python': 0.11; 'suggest': 0.14; 'wrote': 0.14; '"*"': 0.16; '3):': 0.16; 'benjamin': 0.16; 'caching': 0.16; 'cc:name:python list': 0.16; 'descriptor': 0.16; 'descriptors': 0.16; 'garbage': 0.16; 'iterator': 0.16; 'itertools': 0.16; 'objects.': 0.16; 'programmers.': 0.16; 'received:166.84': 0.16; 'received:166.84.1': 0.16; 'received:166.84.1.89': 0.16; 'received:24.136': 0.16; 'received:mailbackend.panix.com': 0.16; 'received:panix.com': 0.16; 'rgb(255,': 0.16; 'roy': 0.16; 'true:': 0.16; 'whitespace.': 0.16; 'helvetica;': 0.16; 'wrote:': 0.18; 'basically': 0.19; 'cheap': 0.19; 'received:166': 0.19; 'import': 0.22; 'cc:addr:python.org': 0.22; 'print': 0.22; '---': 0.24; 'cc:2**0': 0.24; 'holds': 0.26; 'header:In-Reply-To:1': 0.27; 'received:24': 0.27; 'idea': 0.28; '0);': 0.29; 'am,': 0.29; 'medium;': 0.30; 'easier': 0.31; 'lines': 0.31; '255,': 0.31; 'sep': 0.31; 'file': 0.32; 'agreed': 0.32; 'skip:- 30': 0.32; 'another': 0.32; 'open': 0.33; 'url:python': 0.33; 'rgb(0,': 0.33; "i'd": 0.34; 'subject:with': 0.35; 'but': 0.35; 'there': 0.35; 'version': 0.36; 'doing': 0.36; 'possible': 0.36; 'url:org': 0.36; 'application': 0.37; 'skip:- 20': 0.37; 'two': 0.37; 'auto;': 0.38; 'minimum': 0.38; 'problems': 0.38; 'url:library': 0.38; 'expect': 0.39; 'how': 0.40; 'dave': 0.60; 'most': 0.60; 'break': 0.61; "you're": 0.61; 'header:Message-Id:1': 0.63; 'costs': 0.63; 'such': 0.63; 'more': 0.64; 'talking': 0.65; 'temporary': 0.65; 'to:addr:gmail.com': 0.65; 'life': 0.66; 'believe': 0.68; 'smith': 0.68; 'arial,': 0.74; 'helvetica,': 0.74; 'inline': 0.74; 'sans- serif;': 0.78; 'around,': 0.84; 'email addr:panix.com': 0.84; 'hanging': 0.84; 'oscar': 0.84; 'subject:over': 0.84; '2013,': 0.91; 'angel': 0.91
Subject Re: iterating over a file with two pointers
Mime-Version 1.0 (Apple Message framework v1283)
Content-Type multipart/alternative; boundary="Apple-Mail=_66FF0CA8-D298-491F-985E-28A0C6305CA4"
From Roy Smith <roy@panix.com>
In-Reply-To <CAHVvXxQa6rsrD669kL-EeqCQFn3jKH-k=eWY5iey4RwVBD2RiA@mail.gmail.com>
Date Wed, 18 Sep 2013 10:36:43 -0400
References <3018b3d4-f914-4c89-9f26-cd4b2af32e73@googlegroups.com> <CAPTjJmoyrJqVR29MeDzcfA9K=gGgHSuqO3uCNXGLQs7APLJByA@mail.gmail.com> <mailman.115.1379504419.18130.python-list@python.org> <roy-B13238.08561818092013@news.panix.com> <CAHVvXxQa6rsrD669kL-EeqCQFn3jKH-k=eWY5iey4RwVBD2RiA@mail.gmail.com>
To Oscar Benjamin <oscar.j.benjamin@gmail.com>
X-Mailer Apple Mail (2.1283)
Cc Python List <python-list@python.org>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.120.1379515006.18130.python-list@python.org> (permalink)
Lines 316
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1379515006 news.xs4all.nl 15908 [2001:888:2000:d::a6]:58209
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:54382

Show key headers only | View raw


[Multipart message — attachments visible in raw view] - view raw

> Dave Angel <davea@davea.name> wrote (and I agreed with):
>> I'd suggest you open the file twice, and get two file objects.  Then you
>> can iterate over them independently.


On Sep 18, 2013, at 9:09 AM, Oscar Benjamin wrote:
> There's no need to use OS resources by opening the file twice or to
> screw up the IO caching with seek().

There's no reason NOT to use OS resources.  That's what the OS is there for; to make life easier on application programmers.  Opening a file twice costs almost nothing.  File descriptors are almost as cheap as whitespace.

> Peter's version holds just as many lines as is necessary in an
> internal Python buffer and performs the minimum possible
> amount of IO.

I believe by "Peter's version", you're talking about:

> from itertools import islice, tee 
> 
> with open("tmp.txt") as f: 
>     while True: 
>         for outer in f: 
>             print outer, 
>             if "*" in outer: 
>                 f, g = tee(f) 
>                 for inner in islice(g, 3): 
>                     print "   ", inner, 
>                 break 
>         else: 
>             break 


There's this note from http://docs.python.org/2.7/library/itertools.html#itertools.tee:

> This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().


I have no idea how that interacts with the pattern above where you call tee() serially.  You're basically doing

with open("my_file") as f:
while True:
	f, g = tee(f)

Are all of those g's just hanging around, eating up memory, while waiting to be garbage collected?  I have no idea.  But I do know that no such problems exist with the two file descriptor versions.






> I would expect this to be more
> efficient as well as less error-prone on Windows.
> 
> 
> Oscar
> 


---
Roy Smith
roy@panix.com



Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

iterating over a file with two pointers nikhil Pandey <nikhilpandey90@gmail.com> - 2013-09-18 04:12 -0700
  Re: iterating over a file with two pointers Chris Angelico <rosuav@gmail.com> - 2013-09-18 21:21 +1000
    Re: iterating over a file with two pointers nikhil Pandey <nikhilpandey90@gmail.com> - 2013-09-18 05:07 -0700
      Re: iterating over a file with two pointers Travis Griggs <travisgriggs@gmail.com> - 2013-09-18 09:18 -0700
  Re: iterating over a file with two pointers Dave Angel <davea@davea.name> - 2013-09-18 11:39 +0000
    Re: iterating over a file with two pointers Roy Smith <roy@panix.com> - 2013-09-18 08:56 -0400
      Re: iterating over a file with two pointers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-09-18 14:09 +0100
      Re: iterating over a file with two pointers Roy Smith <roy@panix.com> - 2013-09-18 10:36 -0400
      Re: iterating over a file with two pointers Dave Angel <davea@davea.name> - 2013-09-18 20:07 +0000
      Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-19 09:23 +0200
      Re: iterating over a file with two pointers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-09-19 15:16 +0100
      Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-19 16:38 +0200
      Re: iterating over a file with two pointers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-09-19 15:48 +0100
  Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-18 13:44 +0200
    Re: iterating over a file with two pointers nikhil Pandey <nikhilpandey90@gmail.com> - 2013-09-18 05:14 -0700
      Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-18 14:54 +0200
      Re: iterating over a file with two pointers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-19 02:40 +0000
  Re: iterating over a file with two pointers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-19 02:56 +0000
    Re: iterating over a file with two pointers Joshua Landau <joshua@landau.ws> - 2013-09-19 08:04 +0100

csiph-web