Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #31993 > unrolled thread

Re: Fast forward-backward (write-read)

Started byDennis Lee Bieber <wlfraed@ix.netcom.com>
First post2012-10-24 01:23 -0400
Last post2012-10-24 14:36 -0400
Articles 3 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Fast forward-backward (write-read) Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-10-24 01:23 -0400
    Re: Fast forward-backward (write-read) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-24 08:05 +0000
      Re: Fast forward-backward (write-read) Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-10-24 14:36 -0400

#31993 — Re: Fast forward-backward (write-read)

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2012-10-24 01:23 -0400
SubjectRe: Fast forward-backward (write-read)
Message-ID<mailman.2717.1351056298.27098.python-list@python.org>
On Tue, 23 Oct 2012 16:35:40 -0700, emile <emile@fenx.com> declaimed the
following in gmane.comp.python.general:

> On 10/23/2012 04:19 PM, David Hutto wrote:
> > forward =  [line.rstrip('\n') for line in f.readlines()]
> 
> f.readlines() will be big(!) and have overhead... and forward results in 
> something again as big.
>
	Well, since file objects are iterable, could one just drop the
.readlines() ? ( ... line in f )

> > backward =  [line.rstrip('\n') for line in reversed(forward)]
> 
> and defining backward looks to me to require space to build backward and 
> hold reversed(forward)
>
	And since the line-ends have already been stripped from forward,
backward should just be:

	backward = reversed(forward)
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [next] | [standalone]


#32019

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-10-24 08:05 +0000
Message-ID<5087a12e$0$29882$c3e8da3$5496439d@news.astraweb.com>
In reply to#31993
On Wed, 24 Oct 2012 01:23:58 -0400, Dennis Lee Bieber wrote:

> On Tue, 23 Oct 2012 16:35:40 -0700, emile <emile@fenx.com> declaimed the
> following in gmane.comp.python.general:
> 
>> On 10/23/2012 04:19 PM, David Hutto wrote:
>> > forward =  [line.rstrip('\n') for line in f.readlines()]
>> 
>> f.readlines() will be big(!) and have overhead... and forward results
>> in something again as big.
>>
> 	Well, since file objects are iterable, could one just drop the
> .readlines() ? ( ... line in f )

Yes, but the bottleneck is still that the list comprehension will run to 
completion, trying to process the entire 100+ GB file in one go.

[...]
> 	And since the line-ends have already been stripped from forward,
> backward should just be:
> 
> 	backward = reversed(forward)

reversed returns a lazy iterator, but it requires that forward is a non-
lazy (eager) sequence. So again you're stuck trying to read the entire 
file into RAM.

-- 
Steven

[toc] | [prev] | [next] | [standalone]


#32060

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2012-10-24 14:36 -0400
Message-ID<mailman.2790.1351103785.27098.python-list@python.org>
In reply to#32019
On 24 Oct 2012 08:05:02 GMT, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> declaimed the following in
gmane.comp.python.general:

> 
> Yes, but the bottleneck is still that the list comprehension will run to 
> completion, trying to process the entire 100+ GB file in one go.
>
	Concede, but 100GB once has to still be better than 100GB twice <G>
[or, as an algorithm used for smaller data sets, the non-readlines
version may fit in memory when the other fails]

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web