Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.021 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'memory.': 0.05; 'f.close()': 0.07; 'filename': 0.07; 'budget.': 0.09; 'notation': 0.09; 'received:mail-qc0-f174.google.com': 0.09; 'through.': 0.09; 'cc:addr:python-list': 0.10; "'r')": 0.16; '7:35': 0.16; 'backward': 0.16; 'iterated': 0.16; 'oct': 0.16; 'wrote:': 0.17; 'file.': 0.20; 'sort': 0.21; 'trying': 0.21; 'not,': 0.21; 'context.': 0.22; 'cc:2**0': 0.23; 'somewhere': 0.24; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply- To:1': 0.25; 'looks': 0.26; 'skip:[ 10': 0.26; 'skip:" 20': 0.26; 'guess': 0.27; 'see,': 0.27; 'message-id:@mail.gmail.com': 0.27; 'actual': 0.28; 'post': 0.28; 'factor': 0.29; 'fast.': 0.29; 'probably': 0.29; 'point': 0.31; 'system,': 0.32; 'print': 0.32; 'defining': 0.33; 'retain': 0.33; 'received:google.com': 0.34; 'machines': 0.35; 'so,': 0.35; 'pm,': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'there': 0.35; 'but': 0.36; "i'll": 0.36; 'should': 0.36; 'subject: (': 0.36; 'does': 0.37; 'being': 0.37; 'rather': 0.37; 'received:209': 0.37; 'received:209.85.216': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'files': 0.38; 'some': 0.38; 'sure': 0.38; 'build': 0.39; 'space': 0.39; 'list,': 0.39; 'skip:" 10': 0.40; 'subject:-': 0.40; 'header:Received:5': 0.40; "you've": 0.61; 'dedicated': 0.61; 'chance': 0.61; 'close': 0.63; 'results': 0.65; 'forward': 0.66; 'stated': 0.69; 'access?': 0.84; 'dict,': 0.84; 'forward,': 0.84; 'subject:Fast': 0.84; 'subject:read': 0.84; 'subject:write': 0.84; 'average': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=EA2LxeJUIa/W9RTXswAu3KnZIK1aC+8seyHCQrP23Us=; b=SOz1XPX8UIeOk8i64FtUsvQxq/i/rk1vKsweMe0GOTF2TbAvOblUHZqblpBwb0Vev2 h/spdTtNSP8ni5VrAyAaUFRwKJkLvKAQerEtBz+dOireE6XysXQFf1gDZ6zlaqgHZ19h 1ZEVfb5o+aECdYi75Zq9SH7MOp4s6UuO1j8/ULtgOwIrPxqdSUCp0m5zsglpaon4QVe7 CdAloVpZIjK43H6NkkcVugkev2XyZANm4OaZIp1di1HH75j161Rce1o47BUn9AXOowS6 K8FgWFcFwwDTsvujnsFXTe0mtbgvC7PmFH5d9MgzsVJU+/lc/3hE6De4X7PH9tUhZt9g ohFg== MIME-Version: 1.0 In-Reply-To: References: <5086AA35.4000509@it.uu.se> Date: Tue, 23 Oct 2012 20:01:36 -0400 Subject: Re: Fast forward-backward (write-read) From: David Hutto To: emile Content-Type: text/plain; charset=ISO-8859-1 Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 61 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1351036898 news.xs4all.nl 6916 [2001:888:2000:d::a6]:52752 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:31978 On Tue, Oct 23, 2012 at 7:35 PM, emile wrote: > On 10/23/2012 04:19 PM, David Hutto wrote: >> >> Whether this is fast enough, or not, I don't know: > > > well, the OP's original post started with > "I am working with some rather large data files (>100GB)..." Well, is this a dedicated system, and one that they have the budget to upgrade? Data files have some sort of parsing, unless it's one huge dict, or list, so there has to be an average size to the parse. So big O notation should begin to refine without a full file. > > >> filename = "data_file.txt" >> f = open(filename, 'r') >> forward = [line.rstrip('\n') for line in f.readlines()] > > > f.readlines() will be big(!) and have overhead... and forward results in > something again as big. > Not if an average can be taken, and then refined as the actual gigs are being iterated through. > >> backward = [line.rstrip('\n') for line in reversed(forward)] > > > and defining backward looks to me to require space to build backward and > hold reversed(forward) > > So, let's see, at that point in time (building backward) you've got > probably somewhere close to 400-500Gb in memory. > > My guess -- probably not so fast. Thrashing is sure to be a factor on all > but machines I'll never have a chance to work on. But does the OP have access? They never stated their hardware, and upgradable budget. > > >> f.close() >> print forward, "\n\n", "********************\n\n", backward, "\n" > > > > It's good to retain context. Trying to practice good form ;). -- Best Regards, David Hutto CEO: http://www.hitwebdevelopment.com