Groups > comp.lang.python > #31961 > unrolled thread

Re: Fast forward-backward (write-read)

Started by	David Hutto <dwightdhutto@gmail.com>
First post	2012-10-23 17:50 -0400
Last post	2012-10-24 13:56 +0000
Articles	8 — 5 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Fast forward-backward (write-read) David Hutto <dwightdhutto@gmail.com> - 2012-10-23 17:50 -0400
    Re: Fast forward-backward (write-read) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-23 22:53 +0000
      Re: Fast forward-backward (write-read) Demian Brecht <demianbrecht@gmail.com> - 2012-10-23 15:57 -0700
      Re: Fast forward-backward (write-read) David Hutto <dwightdhutto@gmail.com> - 2012-10-23 19:34 -0400
      Re: Fast forward-backward (write-read) Virgil Stokes <vs@it.uu.se> - 2012-10-24 09:17 +0200
      Re: Fast forward-backward (write-read) Virgil Stokes <vs@it.uu.se> - 2012-10-24 09:19 +0200
      Re: Fast forward-backward (write-read) David Hutto <dwightdhutto@gmail.com> - 2012-10-24 03:26 -0400
      Re: Fast forward-backward (write-read) Grant Edwards <invalid@invalid.invalid> - 2012-10-24 13:56 +0000

#31961 — Re: Fast forward-backward (write-read)

From	David Hutto <dwightdhutto@gmail.com>
Date	2012-10-23 17:50 -0400
Subject	Re: Fast forward-backward (write-read)
Message-ID	<mailman.2694.1351029058.27098.python-list@python.org>

On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <vs@it.uu.se> wrote:
> I am working with some rather large data files (>100GB) that contain time
> series data. The data (t_k,y(t_k)), k = 0,1,...,N are stored in ASCII
> format. I perform various types of processing on these data (e.g. moving
> median, moving average, and Kalman-filter, Kalman-smoother) in a sequential
> manner and only a small number of these data need be stored in RAM when
> being processed. When performing Kalman-filtering (forward in time pass, k =
> 0,1,...,N) I need to save to an external file several variables (e.g. 11*32
> bytes) for each (t_k, y(t_k)). These are inputs to the Kalman-smoother
> (backward in time pass, k = N,N-1,...,0). Thus, I will need to input these
> variables saved to an external file from the forward pass, in reverse order
> --- from last written to first written.
>
> Finally, to my question --- What is a fast way to write these variables to
> an external file and then read them in backwards?

Don't forget to use timeit for an average OS utilization.

I'd suggest two list comprehensions for now, until I've reviewed it some more:

forward =  ["%i = %s" % (i,chr(i)) for i in range(33,126)]
backward = ["%i = %s" % (i,chr(i)) for i in range(126,32,-1)]

for var in forward:
	print var

for var in backward:
	print var

You could also use a dict, and iterate through a straight loop that
assigned a front and back to a dict_one =  {0 : [0.100], 1 : [1.99]}
and the iterate through the loop, and call the first or second in the
dict's var list for frontwards , or backwards calls.


But there might be faster implementations, depending on other
function's usage of certain lower level functions.


-- 
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com

[toc] | [next] | [standalone]

#31967

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2012-10-23 22:53 +0000
Message-ID	<50871ff6$0$29978$c3e8da3$5496439d@news.astraweb.com>
In reply to	#31961

On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:

> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <vs@it.uu.se> wrote:
>> I am working with some rather large data files (>100GB) 
[...]
>> Finally, to my question --- What is a fast way to write these variables
>> to an external file and then read them in backwards?
> 
> Don't forget to use timeit for an average OS utilization.

Given that the data files are larger than 100 gigabytes, the time 
required to process each file is likely to be in hours, not microseconds. 
That being the case, timeit is the wrong tool for the job, it is 
optimized for timings tiny code snippets. You could use it, of course, 
but the added inconvenience doesn't gain you any added accuracy.

Here's a neat context manager that makes timing long-running code simple:

http://code.activestate.com/recipes/577896

> I'd suggest two list comprehensions for now, until I've reviewed it some
> more:

I would be very surprised if the poster will be able to fit 100 gigabytes 
of data into even a single list comprehension, let alone two.

This is a classic example of why the old external processing algorithms 
of the 1960s and 70s will never be obsolete. No matter how much memory 
you have, there will always be times when you want to process more data 
than you can fit into memory.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#31968

From	Demian Brecht <demianbrecht@gmail.com>
Date	2012-10-23 15:57 -0700
Message-ID	<mailman.2698.1351033070.27098.python-list@python.org>
In reply to	#31967

> This is a classic example of why the old external processing algorithms 
> of the 1960s and 70s will never be obsolete. No matter how much memory 
> you have, there will always be times when you want to process more data 
> than you can fit into memory.


But surely nobody will *ever* need more than 640k…

Right?

Demian Brecht
@demianbrecht
http://demianbrecht.github.com

[toc] | [prev] | [next] | [standalone]

#31972

From	David Hutto <dwightdhutto@gmail.com>
Date	2012-10-23 19:34 -0400
Message-ID	<mailman.2701.1351035258.27098.python-list@python.org>
In reply to	#31967

On Tue, Oct 23, 2012 at 6:53 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:
>
>> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <vs@it.uu.se> wrote:
>>> I am working with some rather large data files (>100GB)
> [...]
>>> Finally, to my question --- What is a fast way to write these variables
>>> to an external file and then read them in backwards?
>>
>> Don't forget to use timeit for an average OS utilization.
>
> Given that the data files are larger than 100 gigabytes, the time
> required to process each file is likely to be in hours, not microseconds.
> That being the case, timeit is the wrong tool for the job, it is
> optimized for timings tiny code snippets. You could use it, of course,
> but the added inconvenience doesn't gain you any added accuracy.

It depends on the end result, and the fact that if the iterations
themselves are about the same time, then just using a segment of the
iterations could be scaled down, and a full run might be worth it, if
you have a second computer running optimization.

>
> Here's a neat context manager that makes timing long-running code simple:
>
>
> http://code.activestate.com/recipes/577896


I'll test this out for big O notation later. For the OP:

http://en.wikipedia.org/wiki/Big_O_notation





>
>
>
>> I'd suggest two list comprehensions for now, until I've reviewed it some
>> more:
>
> I would be very surprised if the poster will be able to fit 100 gigabytes
> of data into even a single list comprehension, let alone two.
Again, these can be scaled depending on the operations of the function
in question, and the average time of aforementioned function(s)

>
> This is a classic example of why the old external processing algorithms
> of the 1960s and 70s will never be obsolete. No matter how much memory
> you have, there will always be times when you want to process more data
> than you can fit into memory

This is a common misconception. You can engineer a device that
accommodates this if it's a direct experimental necessity.
>

-- 
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com

[toc] | [prev] | [next] | [standalone]

#32011

From	Virgil Stokes <vs@it.uu.se>
Date	2012-10-24 09:17 +0200
Message-ID	<mailman.2734.1351063030.27098.python-list@python.org>
In reply to	#31967

On 24-Oct-2012 00:57, Demian Brecht wrote:
>> This is a classic example of why the old external processing algorithms
>> of the 1960s and 70s will never be obsolete. No matter how much memory
>> you have, there will always be times when you want to process more data
>> than you can fit into memory.
>
> But surely nobody will *ever* need more than 640k…
>
> Right?
>
> Demian Brecht
> @demianbrecht
> http://demianbrecht.github.com
>
>
>
>
Yes, I can still remember such quotes --- thanks for jogging my memory, Demian :-)

[toc] | [prev] | [next] | [standalone]

#32012

From	Virgil Stokes <vs@it.uu.se>
Date	2012-10-24 09:19 +0200
Message-ID	<mailman.2735.1351063172.27098.python-list@python.org>
In reply to	#31967

On 24-Oct-2012 00:53, Steven D'Aprano wrote:
> On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote:
>
>> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes <vs@it.uu.se> wrote:
>>> I am working with some rather large data files (>100GB)
> [...]
>>> Finally, to my question --- What is a fast way to write these variables
>>> to an external file and then read them in backwards?
>> Don't forget to use timeit for an average OS utilization.
> Given that the data files are larger than 100 gigabytes, the time
> required to process each file is likely to be in hours, not microseconds.
> That being the case, timeit is the wrong tool for the job, it is
> optimized for timings tiny code snippets. You could use it, of course,
> but the added inconvenience doesn't gain you any added accuracy.
>
> Here's a neat context manager that makes timing long-running code simple:
>
>
> http://code.activestate.com/recipes/577896
Thanks for this link
>
>
>
>> I'd suggest two list comprehensions for now, until I've reviewed it some
>> more:
> I would be very surprised if the poster will be able to fit 100 gigabytes
> of data into even a single list comprehension, let alone two.
You are correct and I have been looking at working with blocks that are sized to 
the RAM available for processing.
>
> This is a classic example of why the old external processing algorithms
> of the 1960s and 70s will never be obsolete. No matter how much memory
> you have, there will always be times when you want to process more data
> than you can fit into memory.
>
>
>
Thanks for your insights :-)

[toc] | [prev] | [next] | [standalone]

#32014

From	David Hutto <dwightdhutto@gmail.com>
Date	2012-10-24 03:26 -0400
Message-ID	<mailman.2737.1351063592.27098.python-list@python.org>
In reply to	#31967

On Wed, Oct 24, 2012 at 3:17 AM, Virgil Stokes <vs@it.uu.se> wrote:
> On 24-Oct-2012 00:57, Demian Brecht wrote:
>>>
>>> This is a classic example of why the old external processing algorithms
>>> of the 1960s and 70s will never be obsolete. No matter how much memory
>>> you have, there will always be times when you want to process more data
>>> than you can fit into memory.
>>
>>
>> But surely nobody will *ever* need more than 640k…
>>
>> Right?
>>
>> Demian Brecht
>> @demianbrecht
>> http://demianbrecht.github.com
>>
>>
>>
>>
> Yes, I can still remember such quotes --- thanks for jogging my memory,
> Demian :-)


This is only on equipment designed by others, otherwise, you could
engineer the hardware yourself to perfom just certain functions for
you(RISC), and pass that back to the CISC(from a PCB design).


-- 
Best Regards,
David Hutto
CEO: http://www.hitwebdevelopment.com

[toc] | [prev] | [next] | [standalone]

#32042

From	Grant Edwards <invalid@invalid.invalid>
Date	2012-10-24 13:56 +0000
Message-ID	<k68s38$lqr$1@reader1.panix.com>
In reply to	#31967

On 2012-10-23, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> I would be very surprised if the poster will be able to fit 100
> gigabytes of data into even a single list comprehension, let alone
> two.
>
> This is a classic example of why the old external processing
> algorithms of the 1960s and 70s will never be obsolete. No matter how
> much memory you have, there will always be times when you want to
> process more data than you can fit into memory.

Too true.  One of the projects I did in grad school about 20 years ago
was a plugin for some fancy data visualization software (I think it
was DX: http://www.research.ibm.com/dx/). My plugin would subsample
"on the fly" a selected section of a huge 2D array of data in a file.
IBM and SGI had all sorts of widgets you could use to sample,
transform and visualize data, but they all assumed that the input data
would fit into virtual memory.

-- 
Grant Edwards               grant.b.edwards        Yow! I Know A Joke!!
                                  at               
                              gmail.com

[toc] | [prev] | [standalone]

csiph-web

Re: Fast forward-backward (write-read)

Contents

#31961 — Re: Fast forward-backward (write-read)

#31967

#31968

#31972

#32011

#32012

#32014

#32042