Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #31993 > unrolled thread
| Started by | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| First post | 2012-10-24 01:23 -0400 |
| Last post | 2012-10-24 14:36 -0400 |
| Articles | 3 — 2 participants |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Fast forward-backward (write-read) Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-10-24 01:23 -0400
Re: Fast forward-backward (write-read) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-24 08:05 +0000
Re: Fast forward-backward (write-read) Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-10-24 14:36 -0400
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2012-10-24 01:23 -0400 |
| Subject | Re: Fast forward-backward (write-read) |
| Message-ID | <mailman.2717.1351056298.27098.python-list@python.org> |
On Tue, 23 Oct 2012 16:35:40 -0700, emile <emile@fenx.com> declaimed the
following in gmane.comp.python.general:
> On 10/23/2012 04:19 PM, David Hutto wrote:
> > forward = [line.rstrip('\n') for line in f.readlines()]
>
> f.readlines() will be big(!) and have overhead... and forward results in
> something again as big.
>
Well, since file objects are iterable, could one just drop the
.readlines() ? ( ... line in f )
> > backward = [line.rstrip('\n') for line in reversed(forward)]
>
> and defining backward looks to me to require space to build backward and
> hold reversed(forward)
>
And since the line-ends have already been stripped from forward,
backward should just be:
backward = reversed(forward)
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-10-24 08:05 +0000 |
| Message-ID | <5087a12e$0$29882$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #31993 |
On Wed, 24 Oct 2012 01:23:58 -0400, Dennis Lee Bieber wrote:
> On Tue, 23 Oct 2012 16:35:40 -0700, emile <emile@fenx.com> declaimed the
> following in gmane.comp.python.general:
>
>> On 10/23/2012 04:19 PM, David Hutto wrote:
>> > forward = [line.rstrip('\n') for line in f.readlines()]
>>
>> f.readlines() will be big(!) and have overhead... and forward results
>> in something again as big.
>>
> Well, since file objects are iterable, could one just drop the
> .readlines() ? ( ... line in f )
Yes, but the bottleneck is still that the list comprehension will run to
completion, trying to process the entire 100+ GB file in one go.
[...]
> And since the line-ends have already been stripped from forward,
> backward should just be:
>
> backward = reversed(forward)
reversed returns a lazy iterator, but it requires that forward is a non-
lazy (eager) sequence. So again you're stuck trying to read the entire
file into RAM.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2012-10-24 14:36 -0400 |
| Message-ID | <mailman.2790.1351103785.27098.python-list@python.org> |
| In reply to | #32019 |
On 24 Oct 2012 08:05:02 GMT, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> declaimed the following in
gmane.comp.python.general:
>
> Yes, but the bottleneck is still that the list comprehension will run to
> completion, trying to process the entire 100+ GB file in one go.
>
Concede, but 100GB once has to still be better than 100GB twice <G>
[or, as an algorithm used for smaller data sets, the non-readlines
version may fit in memory when the other fails]
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web