Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!feeder2.cambriumusenet.nl!feed.tweaknews.nl!194.109.133.85.MISMATCH!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <4F5E50F6.9070309@it.uu.se>
References: <4F5E50F6.9070309@it.uu.se>
Date: Mon, 12 Mar 2012 20:31:26 +0000
Subject: Re: Fast file data retrieval?
From: Arnaud Delobelle <arnodel@gmail.com>
To: Virgil Stokes <vs@it.uu.se>
Content-Type: text/plain; charset=UTF-8
Cc: python-list@python.org
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.593.1331584290.3037.python-list@python.org>
Lines: 33
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:21545

On 12 March 2012 19:39, Virgil Stokes <vs@it.uu.se> wrote:
> I have a rather large ASCII file that is structured as follows
>
> header line
> 9 nonblank lines with alphanumeric data
> header line
> 9 nonblank lines with alphanumeric data
> ...
> ...
> ...
> header line
> 9 nonblank lines with alphanumeric data
> EOF
>
> where, a data set contains 10 lines (header + 9 nonblank) and there can be
> several thousand
> data sets in a single file. In addition, each header has a unique ID code.
>
> Is there a fast method for the retrieval of a data set from this large file
> given its ID code?

It depends.  I guess if it's a long running application, you could
load up all the data into a dictionary at startup time (several
thousand data sets doesn't sound like that much).  Another option
would be to put all this into a dbm database file
(http://docs.python.org/library/dbm.html) - it would be very easy to
do.

Or you could have your own custom solution where you scan the file and
build a dictionary mapping keys to file offsets, then when requesting
a dataset you can seek directly to the correct position.
-- 
Arnaud