Groups > comp.lang.python > #32308 > unrolled thread

Re: fastest data / database format for reading large files

Started by	Chris Rebert <clp2@rebertia.com>
First post	2012-10-28 02:26 -0700
Last post	2012-10-28 02:26 -0700
Articles	1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: fastest data / database format for reading large files Chris Rebert <clp2@rebertia.com> - 2012-10-28 02:26 -0700

#32308 — Re: fastest data / database format for reading large files

From	Chris Rebert <clp2@rebertia.com>
Date	2012-10-28 02:26 -0700
Subject	Re: fastest data / database format for reading large files
Message-ID	<mailman.2965.1351416384.27098.python-list@python.org>

On Tue, Oct 16, 2012 at 11:35 AM, Pradipto Banerjee
<pradipto.banerjee@adainvestments.com> wrote:
> I am working with a series of large files with sizes 4 to 10GB and may need to read these files repeated. What data format (i.e. pickle, json, csv, etc.) is considered the fastest for reading via python?

Pickle /ought/ to be fastest, since it's binary (unless you use the
oldest protocol version) and native to Python. Be sure to specify
HIGHEST_PROTOCOL and use cPickle.
http://docs.python.org/2/library/pickle.html#module-cPickle
http://docs.python.org/2/library/pickle.html#pickle.HIGHEST_PROTOCOL

You might consider using SQLite (or some other database) if you will
be doing queries over the data that would be amenable to SQL or
similar.
http://docs.python.org/2/library/sqlite3.html

Cheers,
Chris

P.S. The verbose disclaimer at the end of your emails is kinda annoying...

[toc] | [standalone]

csiph-web

Re: fastest data / database format for reading large files

Contents

#32308 — Re: fastest data / database format for reading large files