Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #54418
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Subject | Re: iterating over a file with two pointers |
| Date | 2013-09-19 09:23 +0200 |
| Organization | None |
| References | (1 earlier) <CAPTjJmoyrJqVR29MeDzcfA9K=gGgHSuqO3uCNXGLQs7APLJByA@mail.gmail.com> <mailman.115.1379504419.18130.python-list@python.org> <roy-B13238.08561818092013@news.panix.com> <CAHVvXxQa6rsrD669kL-EeqCQFn3jKH-k=eWY5iey4RwVBD2RiA@mail.gmail.com> <52B7F7EA-C7C4-4DB6-A93C-25F4C058EB58@panix.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.146.1379575403.18130.python-list@python.org> (permalink) |
Roy Smith wrote:
>> Dave Angel <davea@davea.name> wrote (and I agreed with):
>>> I'd suggest you open the file twice, and get two file objects. Then you
>>> can iterate over them independently.
>
>
> On Sep 18, 2013, at 9:09 AM, Oscar Benjamin wrote:
>> There's no need to use OS resources by opening the file twice or to
>> screw up the IO caching with seek().
>
> There's no reason NOT to use OS resources. That's what the OS is there
> for; to make life easier on application programmers. Opening a file twice
> costs almost nothing. File descriptors are almost as cheap as whitespace.
>
>> Peter's version holds just as many lines as is necessary in an
>> internal Python buffer and performs the minimum possible
>> amount of IO.
>
> I believe by "Peter's version", you're talking about:
>
>> from itertools import islice, tee
>>
>> with open("tmp.txt") as f:
>> while True:
>> for outer in f:
>> print outer,
>> if "*" in outer:
>> f, g = tee(f)
>> for inner in islice(g, 3):
>> print " ", inner,
del g # a good idea in the general case
>> break
>> else:
>> break
>
>
> There's this note from
> http://docs.python.org/2.7/library/itertools.html#itertools.tee:
>
>> This itertool may require significant auxiliary storage (depending on how
>> much temporary data needs to be stored). In general, if one iterator uses
>> most or all of the data before another iterator starts, it is faster to
>> use list() instead of tee().
>
>
> I have no idea how that interacts with the pattern above where you call
> tee() serially.
As I understand it the above says that
items = infinite()
a, b = tee(items)
for item in islice(a, 1000):
pass
for pair in izip(a, b):
pass
stores 1000 items and can go on forever, but
items = infinite()
a, b = tee(items)
for item in a:
pass
will consume unbounded memory and that if items is finite using a list
instead of tee is more efficient. The documentation says nothing about
items = infinite()
a, b = tee(items)
del a
for item in b:
pass
so you have to trust Mr Hettinger or come up with a test case...
> You're basically doing
>
> with open("my_file") as f:
> while True:
> f, g = tee(f)
>
> Are all of those g's just hanging around, eating up memory, while waiting
> to be garbage collected? I have no idea.
I'd say you've just devised a nice test to find out ;)
> But I do know that no such
> problems exist with the two file descriptor versions.
The trade-offs are different. My version works with arbitrary iterators
(think stdin), but will consume unbounded amounts of memory when the inner
loop doesn't stop.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
iterating over a file with two pointers nikhil Pandey <nikhilpandey90@gmail.com> - 2013-09-18 04:12 -0700
Re: iterating over a file with two pointers Chris Angelico <rosuav@gmail.com> - 2013-09-18 21:21 +1000
Re: iterating over a file with two pointers nikhil Pandey <nikhilpandey90@gmail.com> - 2013-09-18 05:07 -0700
Re: iterating over a file with two pointers Travis Griggs <travisgriggs@gmail.com> - 2013-09-18 09:18 -0700
Re: iterating over a file with two pointers Dave Angel <davea@davea.name> - 2013-09-18 11:39 +0000
Re: iterating over a file with two pointers Roy Smith <roy@panix.com> - 2013-09-18 08:56 -0400
Re: iterating over a file with two pointers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-09-18 14:09 +0100
Re: iterating over a file with two pointers Roy Smith <roy@panix.com> - 2013-09-18 10:36 -0400
Re: iterating over a file with two pointers Dave Angel <davea@davea.name> - 2013-09-18 20:07 +0000
Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-19 09:23 +0200
Re: iterating over a file with two pointers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-09-19 15:16 +0100
Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-19 16:38 +0200
Re: iterating over a file with two pointers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-09-19 15:48 +0100
Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-18 13:44 +0200
Re: iterating over a file with two pointers nikhil Pandey <nikhilpandey90@gmail.com> - 2013-09-18 05:14 -0700
Re: iterating over a file with two pointers Peter Otten <__peter__@web.de> - 2013-09-18 14:54 +0200
Re: iterating over a file with two pointers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-19 02:40 +0000
Re: iterating over a file with two pointers Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-09-19 02:56 +0000
Re: iterating over a file with two pointers Joshua Landau <joshua@landau.ws> - 2013-09-19 08:04 +0100
csiph-web