Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.006 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'context': 0.05; 'result,': 0.05; 'accuracy.': 0.09; 'experimental': 0.09; 'notation': 0.09; 'url:activestate': 0.09; 'cc:addr:python-list': 0.10; 'suggest': 0.11; 'function(s)': 0.16; 'necessity.': 0.16; 'oct': 0.16; 'received:209.85.216.53': 0.16; 'scaled': 0.16; 'segment': 0.16; 'to:addr:pearwood.info': 0.16; 'to:addr:steve+comp.lang.python': 0.16; "to:name:steven d'aprano": 0.16; 'two.': 0.16; 'wrote:': 0.17; 'variables': 0.17; '>>>': 0.18; 'memory': 0.18; 'finally,': 0.22; 'hours,': 0.22; "i'd": 0.22; 'cc:2**0': 0.23; 'example': 0.23; "i've": 0.23; 'cc:no real name:2**0': 0.24; 'second': 0.24; 'device': 0.24; 'external': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; '---': 0.26; 'common': 0.26; 'fit': 0.26; 'url:wiki': 0.26; 'am,': 0.27; 'question': 0.27; 'have,': 0.27; 'message-id:@mail.gmail.com': 0.27; "doesn't": 0.28; 'run': 0.28; "d'aprano": 0.29; 'steven': 0.29; 'surprised': 0.29; 'url:wikipedia': 0.29; 'case,': 0.29; 'url:code': 0.29; 'function': 0.30; 'code': 0.31; 'file': 0.32; 'running': 0.32; 'could': 0.32; 'likely': 0.33; 'operations': 0.33; 'themselves': 0.33; 'received:google.com': 0.34; 'wrong': 0.34; 'list': 0.35; 'pm,': 0.35; 'received:209.85': 0.35; 'there': 0.35; 'tool': 0.36; 'but': 0.36; 'url:org': 0.36; 'alone': 0.36; 'depends': 0.36; "i'll": 0.36; 'test': 0.36; 'subject: (': 0.36; 'two': 0.37; 'being': 0.37; 'why': 0.37; 'rather': 0.37; 'received:209': 0.37; 'received:209.85.216': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'fact': 0.38; 'some': 0.38; 'url:en': 0.38; 'subject:-': 0.40; 'header:Received:5': 0.40; 'end': 0.40; 'matter': 0.61; 'time,': 0.62; 'worth': 0.63; 'times': 0.63; 'more': 0.63; 'direct': 0.69; 'reviewed': 0.74; '100': 0.78; 'gain': 0.79; 'more:': 0.84; 'subject:Fast': 0.84; 'subject:read': 0.84; 'subject:write': 0.84; 'timings': 0.84; 'average': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=xc+Gh5ILQLet5Mvr/RMl80omLSWA6Eoa7bgeUOrLYqo=; b=uR1AGm7BB7EK5k5B2ADgQRtq6xLfD8tWKs/W1pjkz+bnlJ7pj0cCAVFu7mdq6Warin gO1n8dKXVZZmepCsLo3ZjVq75Jhnd3p8b01GhOozN93JnqV/33WnDWuAEnzt/BYjw+SG jvsKf2Nvj4LwbeozoBp4UXJ7Tfq1lUesG+taQG8CO8L6ERo4KYTHZrMRbviEC3juARWa 6UDDG1A74XL/9e2M1rAWlr/3HBwnuEeJldp6R5T+B5+FB8Ti7cKmoflniOeZcSdUwPOK KTfXvWrH2cqHf5//ORENFLRx/2MRRwQyvWCXiDAEOI0ax9Deb/VzPzkEhdF+FJ6kTIZ9 AQWg== MIME-Version: 1.0 In-Reply-To: <50871ff6$0$29978$c3e8da3$5496439d@news.astraweb.com> References: <5086AA35.4000509@it.uu.se> <50871ff6$0$29978$c3e8da3$5496439d@news.astraweb.com> Date: Tue, 23 Oct 2012 19:34:15 -0400 Subject: Re: Fast forward-backward (write-read) From: David Hutto To: "Steven D'Aprano" Content-Type: text/plain; charset=ISO-8859-1 Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 63 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1351035258 news.xs4all.nl 6921 [2001:888:2000:d::a6]:55368 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:31972 On Tue, Oct 23, 2012 at 6:53 PM, Steven D'Aprano wrote: > On Tue, 23 Oct 2012 17:50:55 -0400, David Hutto wrote: > >> On Tue, Oct 23, 2012 at 10:31 AM, Virgil Stokes wrote: >>> I am working with some rather large data files (>100GB) > [...] >>> Finally, to my question --- What is a fast way to write these variables >>> to an external file and then read them in backwards? >> >> Don't forget to use timeit for an average OS utilization. > > Given that the data files are larger than 100 gigabytes, the time > required to process each file is likely to be in hours, not microseconds. > That being the case, timeit is the wrong tool for the job, it is > optimized for timings tiny code snippets. You could use it, of course, > but the added inconvenience doesn't gain you any added accuracy. It depends on the end result, and the fact that if the iterations themselves are about the same time, then just using a segment of the iterations could be scaled down, and a full run might be worth it, if you have a second computer running optimization. > > Here's a neat context manager that makes timing long-running code simple: > > > http://code.activestate.com/recipes/577896 I'll test this out for big O notation later. For the OP: http://en.wikipedia.org/wiki/Big_O_notation > > > >> I'd suggest two list comprehensions for now, until I've reviewed it some >> more: > > I would be very surprised if the poster will be able to fit 100 gigabytes > of data into even a single list comprehension, let alone two. Again, these can be scaled depending on the operations of the function in question, and the average time of aforementioned function(s) > > This is a classic example of why the old external processing algorithms > of the 1960s and 70s will never be obsolete. No matter how much memory > you have, there will always be times when you want to process more data > than you can fit into memory This is a common misconception. You can engineer a device that accommodates this if it's a direct experimental necessity. > -- Best Regards, David Hutto CEO: http://www.hitwebdevelopment.com