Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #21550

Re: Fast file data retrieval?

Newsgroups comp.lang.python
Date 2012-03-12 20:38 -0700
References <4F5E50F6.9070309@it.uu.se> <mailman.592.1331584145.3037.python-list@python.org>
Subject Re: Fast file data retrieval?
From Jon Clements <joncle@googlemail.com>
Message-ID <mailman.599.1331609909.3037.python-list@python.org> (permalink)

Show all headers | View raw


On Monday, 12 March 2012 20:31:35 UTC, MRAB  wrote:
> On 12/03/2012 19:39, Virgil Stokes wrote:
> > I have a rather large ASCII file that is structured as follows
> >
> > header line
> > 9 nonblank lines with alphanumeric data
> > header line
> > 9 nonblank lines with alphanumeric data
> > ...
> > ...
> > ...
> > header line
> > 9 nonblank lines with alphanumeric data
> > EOF
> >
> > where, a data set contains 10 lines (header + 9 nonblank) and there can
> > be several thousand
> > data sets in a single file. In addition,*each header has a* *unique ID
> > code*.
> >
> > Is there a fast method for the retrieval of a data set from this large
> > file given its ID code?
> >
> Probably the best solution is to put it into a database. Have a look at
> the sqlite3 module.
> 
> Alternatively, you could scan the file, recording the ID and the file
> offset in a dict so that, given an ID, you can seek directly to that
> file position.

I would have a look at either bsddb, Tokyo (or Kyoto) Cabinet or hamsterdb. If it's really going to get large and needs a full blown server, maybe MongoDB/redis/hadoop...

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Re: Fast file data retrieval? MRAB <python@mrabarnett.plus.com> - 2012-03-12 20:31 +0000
  Re: Fast file data retrieval? Jon Clements <joncle@googlemail.com> - 2012-03-12 20:38 -0700
  Re: Fast file data retrieval? Jon Clements <joncle@googlemail.com> - 2012-03-12 20:38 -0700
  Re: Fast file data retrieval? Jorgen Grahn <grahn+nntp@snipabacken.se> - 2012-03-13 20:44 +0000
    Re: Fast file data retrieval? Stefan Behnel <stefan_ml@behnel.de> - 2012-03-21 17:32 +0100

csiph-web