Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <50871ff6$0$29978$c3e8da3$5496439d@news.astraweb.com>
References: <5086AA35.4000509@it.uu.se> <mailman.2694.1351029058.27098.python-list@python.org> <50871ff6$0$29978$c3e8da3$5496439d@news.astraweb.com>
Date: Tue, 23 Oct 2012 19:34:15 -0400
Subject: Re: Fast forward-backward (write-read)
From: David Hutto <dwightdhutto@gmail.com>
To: "Steven D'Aprano" <steve+comp.lang.python@pearwood.info>
Content-Type: text/plain; charset=ISO-8859-1
Cc: python-list@python.org
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.2701.1351035258.27098.python-list@python.org>
Lines: 63
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:31972

On Tue, Oct 23, 2012 at 6:53 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:
>
>> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <vs@it.uu.se> wrote:
>>> I am working with some rather large data files (>100GB)
> [...]
>>> Finally, to my question --- What is a fast way to write these variables
>>> to an external file and then read them in backwards?
>>
>> Don't forget to use timeit for an average OS utilization.
>
> Given that the data files are larger than 100 gigabytes, the time
> required to process each file is likely to be in hours, not microseconds.
> That being the case, timeit is the wrong tool for the job, it is
> optimized for timings tiny code snippets. You could use it, of course,
> but the added inconvenience doesn't gain you any added accuracy.

It depends on the end result, and the fact that if the iterations
themselves are about the same time, then just using a segment of the
iterations could be scaled down, and a full run might be worth it, if
you have a second computer running optimization.

>
> Here's a neat context manager that makes timing long-running code simple:
>
>
> http://code.activestate.com/recipes/577896


I'll test this out for big O notation later. For the OP:

http://en.wikipedia.org/wiki/Big_O_notation





>
>
>
>> I'd suggest two list comprehensions for now, until I've reviewed it some
>> more:
>
> I would be very surprised if the poster will be able to fit 100 gigabytes
> of data into even a single list comprehension, let alone two.
Again, these can be scaled depending on the operations of the function
in question, and the average time of aforementioned function(s)

>
> This is a classic example of why the old external processing algorithms
> of the 1960s and 70s will never be obsolete. No matter how much memory
> you have, there will always be times when you want to process more data
> than you can fit into memory

This is a common misconception. You can engineer a device that
accommodates this if it's a direct experimental necessity.
>

-- 
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com