Path: csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Date: Mon, 12 Mar 2012 20:31:35 +0000
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: Fast file data retrieval?
References: <4F5E50F6.9070309@it.uu.se>
In-Reply-To: <4F5E50F6.9070309@it.uu.se>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Reply-To: python-list@python.org
Newsgroups: comp.lang.python
Message-ID: <mailman.592.1331584145.3037.python-list@python.org>
Lines: 28
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:21544

On 12/03/2012 19:39, Virgil Stokes wrote:
> I have a rather large ASCII file that is structured as follows
>
> header line
> 9 nonblank lines with alphanumeric data
> header line
> 9 nonblank lines with alphanumeric data
> ...
> ...
> ...
> header line
> 9 nonblank lines with alphanumeric data
> EOF
>
> where, a data set contains 10 lines (header + 9 nonblank) and there can
> be several thousand
> data sets in a single file. In addition,*each header has a* *unique ID
> code*.
>
> Is there a fast method for the retrieval of a data set from this large
> file given its ID code?
>
Probably the best solution is to put it into a database. Have a look at
the sqlite3 module.

Alternatively, you could scan the file, recording the ID and the file
offset in a dict so that, given an ID, you can seek directly to that
file position.