Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #11848 > unrolled thread

Replacement for the shelve module?

Started byForafo San <ppv.grps@gmail.com>
First post2011-08-19 08:31 -0700
Last post2011-08-21 01:54 +0100
Articles 11 — 8 participants

Back to article view | Back to comp.lang.python


Contents

  Replacement for the shelve module? Forafo San <ppv.grps@gmail.com> - 2011-08-19 08:31 -0700
    Re: Replacement for the shelve module? Ken Watford <kwatford@gmail.com> - 2011-08-19 11:49 -0400
    Re: Replacement for the shelve module? Thomas Jollans <t@jollybox.de> - 2011-08-19 17:54 +0200
      Re: Replacement for the shelve module? Forafo San <ppv.grps@gmail.com> - 2011-08-19 09:21 -0700
    Re: Replacement for the shelve module? Miki Tebeka <miki.tebeka@gmail.com> - 2011-08-19 10:15 -0700
    Re: Replacement for the shelve module? Robert Kern <robert.kern@gmail.com> - 2011-08-19 12:45 -0500
    Re: Replacement for the shelve module? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-08-20 06:36 +1000
      Re: Replacement for the shelve module? Robert Kern <robert.kern@gmail.com> - 2011-08-19 17:24 -0500
        Re: Replacement for the shelve module? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-08-20 11:38 +1000
          Re: Replacement for the shelve module? Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-08-21 12:37 +1200
            Re: Replacement for the shelve module? Chris Angelico <rosuav@gmail.com> - 2011-08-21 01:54 +0100

#11848 — Replacement for the shelve module?

FromForafo San <ppv.grps@gmail.com>
Date2011-08-19 08:31 -0700
SubjectReplacement for the shelve module?
Message-ID<1e35ff5e-785e-41db-a50f-976e6ef60692@h9g2000vbr.googlegroups.com>
Folks,
What might be a good replacement for the shelve module, but one that
can handle a few gigs of data. I'm doing some calculations on daily
stock prices and the result is a nested list like:

[[date_1, floating result 1],
 [date_2, floating result 2],
...
 [date_n, floating result n]]

However, there are about 5,000 lists like that, one for each stock
symbol. Using the shelve module I could easily save them to a file
( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
data. But shelve is deprecated AND when a lot of data is written
shelve was acting weird (refusing to write, filesizes reported with an
"ls" did not make sense, etc.).

Thanks in advance for your suggestions.

[toc] | [next] | [standalone]


#11849

FromKen Watford <kwatford@gmail.com>
Date2011-08-19 11:49 -0400
Message-ID<mailman.223.1313768999.27778.python-list@python.org>
In reply to#11848
On Fri, Aug 19, 2011 at 11:31 AM, Forafo San <ppv.grps@gmail.com> wrote:
> Folks,
> What might be a good replacement for the shelve module, but one that
> can handle a few gigs of data. I'm doing some calculations on daily
> stock prices and the result is a nested list like:

For what you're doing, I would give PyTables a try.

[toc] | [prev] | [next] | [standalone]


#11852

FromThomas Jollans <t@jollybox.de>
Date2011-08-19 17:54 +0200
Message-ID<mailman.225.1313769596.27778.python-list@python.org>
In reply to#11848
On 19/08/11 17:31, Forafo San wrote:
> Folks,
> What might be a good replacement for the shelve module, but one that
> can handle a few gigs of data. I'm doing some calculations on daily
> stock prices and the result is a nested list like:
> 
> [[date_1, floating result 1],
>  [date_2, floating result 2],
> ...
>  [date_n, floating result n]]
> 
> However, there are about 5,000 lists like that, one for each stock
> symbol. Using the shelve module I could easily save them to a file
> ( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
> data. But shelve is deprecated AND when a lot of data is written
> shelve was acting weird (refusing to write, filesizes reported with an
> "ls" did not make sense, etc.).
> 
> Thanks in advance for your suggestions.

Firstly, since when is shelve deprecated? Shouldn't there be a
deprecation warning on http://docs.python.org/dev/library/shelve.html ?

If you want to keep your current approach of having an object containing
all the data for each symbol, you will have to think about how to
serialise the data, as well as how to store the documents/objects
individually. For the serialisation, you can use pickle (as shelve does)
or JSON (probably better because it's easier to edit directly, and
therefore easier to debug).
To store these documents, you could use a huge pickle'd Python
dictionary (bad idea), a UNIX database (dbm module, anydbm in Python2;
this is what shelve uses), or simple the file system: one file per
serialised object.

Looking at your use case, however, I think what you really should use is
a SQL database. SQLite is part of Python and will do the job nicely.
Just use a single table with three columns: symbol, date, value.

Thomas

[toc] | [prev] | [next] | [standalone]


#11855

FromForafo San <ppv.grps@gmail.com>
Date2011-08-19 09:21 -0700
Message-ID<c9f9f26c-ec98-4461-b9ef-becb855bfc69@z7g2000vbp.googlegroups.com>
In reply to#11852
On Aug 19, 11:54 am, Thomas Jollans <t...@jollybox.de> wrote:
> On 19/08/11 17:31, Forafo San wrote:
>
>
>
>
>
>
>
>
>
> > Folks,
> > What might be a good replacement for the shelve module, but one that
> > can handle a few gigs of data. I'm doing some calculations on daily
> > stock prices and the result is a nested list like:
>
> > [[date_1, floating result 1],
> >  [date_2, floating result 2],
> > ...
> >  [date_n, floating result n]]
>
> > However, there are about 5,000 lists like that, one for each stock
> > symbol. Using the shelve module I could easily save them to a file
> > ( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
> > data. But shelve is deprecated AND when a lot of data is written
> > shelve was acting weird (refusing to write, filesizes reported with an
> > "ls" did not make sense, etc.).
>
> > Thanks in advance for your suggestions.
>
> Firstly, since when is shelve deprecated? Shouldn't there be a
> deprecation warning onhttp://docs.python.org/dev/library/shelve.html?
>
> If you want to keep your current approach of having an object containing
> all the data for each symbol, you will have to think about how to
> serialise the data, as well as how to store the documents/objects
> individually. For the serialisation, you can use pickle (as shelve does)
> or JSON (probably better because it's easier to edit directly, and
> therefore easier to debug).
> To store these documents, you could use a huge pickle'd Python
> dictionary (bad idea), a UNIX database (dbm module, anydbm in Python2;
> this is what shelve uses), or simple the file system: one file per
> serialised object.
>
> Looking at your use case, however, I think what you really should use is
> a SQL database. SQLite is part of Python and will do the job nicely.
> Just use a single table with three columns: symbol, date, value.
>
> Thomas

Sorry. There is no indication that shelve is deprecated. I was using
it on a FreeBSD system and it turns out that the bsddb module is
deprecated and confused it with the shelve module.

Thanks Ken and Thomas for your suggestions -- I will play around with
both and pick one.

[toc] | [prev] | [next] | [standalone]


#11859

FromMiki Tebeka <miki.tebeka@gmail.com>
Date2011-08-19 10:15 -0700
Message-ID<d339297b-f687-4756-8ba9-0d31d629f9dc@glegroupsg2000goo.googlegroups.com>
In reply to#11848
You might check one of many binary encoders (like Avro, Thrift ...).
The other option is to use a database, sqlite3 is pretty fast (if you schema is fixed). Otherwise you can look at some NoSQL ones (like MongoDB).

[toc] | [prev] | [next] | [standalone]


#11861

FromRobert Kern <robert.kern@gmail.com>
Date2011-08-19 12:45 -0500
Message-ID<mailman.228.1313775949.27778.python-list@python.org>
In reply to#11848
On 8/19/11 10:49 AM, Ken Watford wrote:
> On Fri, Aug 19, 2011 at 11:31 AM, Forafo San<ppv.grps@gmail.com>  wrote:
>> Folks,
>> What might be a good replacement for the shelve module, but one that
>> can handle a few gigs of data. I'm doing some calculations on daily
>> stock prices and the result is a nested list like:
>
> For what you're doing, I would give PyTables a try.

For a few gigs of stock price data, this is what I use. Much better than SQLite 
for that amount of data.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

[toc] | [prev] | [next] | [standalone]


#11878

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-08-20 06:36 +1000
Message-ID<4e4ec962$0$29986$c3e8da3$5496439d@news.astraweb.com>
In reply to#11848
Forafo San wrote:

> Folks,
> What might be a good replacement for the shelve module, but one that
> can handle a few gigs of data. I'm doing some calculations on daily
> stock prices and the result is a nested list like:
> 
> [[date_1, floating result 1],
>  [date_2, floating result 2],
> ...
>  [date_n, floating result n]]
> 
> However, there are about 5,000 lists like that, one for each stock
> symbol. 


You might save some memory by using tuples rather than lists:

>>> sys.getsizeof(["01/01/2000", 123.456])  # On a 32-bit system.
40
>>> sys.getsizeof(("01/01/2000", 123.456))
32


By the way, you know that you should never, ever use floats for currency,
right? 

http://vladzloteanu.wordpress.com/2010/01/11/why-you-shouldnt-use-float-for-currency-floating-point-issues-explained-for-ruby-and-ror/
http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency


> Using the shelve module I could easily save them to a file 
> ( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
> data. But shelve is deprecated 

It certainly is not.

http://docs.python.org/library/shelve.html
http://docs.python.org/py3k/library/shelve.html

Not a word about it being deprecated in either Python 2.x or 3.x.


> AND when a lot of data is written 
> shelve was acting weird (refusing to write, filesizes reported with an
> "ls" did not make sense, etc.).

I would like to see this replicated. If it is true, that's a bug in shelve,
but I expect you're probably doing something wrong.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#11890

FromRobert Kern <robert.kern@gmail.com>
Date2011-08-19 17:24 -0500
Message-ID<mailman.247.1313792673.27778.python-list@python.org>
In reply to#11878
On 8/19/11 3:36 PM, Steven D'Aprano wrote:

> By the way, you know that you should never, ever use floats for currency,
> right?

That's just incorrect. You shouldn't use (binary) floats for many *accounting* 
purposes, but for many financial/econometric analyses, floats are de rigeur and 
work much better than decimals (either floating or fixed point). If you are 
collecting gigs of stock prices, you are much more likely to be doing the latter 
than the former.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

[toc] | [prev] | [next] | [standalone]


#11893

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-08-20 11:38 +1000
Message-ID<4e4f1004$0$29966$c3e8da3$5496439d@news.astraweb.com>
In reply to#11890
Robert Kern wrote:

> On 8/19/11 3:36 PM, Steven D'Aprano wrote:
> 
>> By the way, you know that you should never, ever use floats for currency,
>> right?
> 
> That's just incorrect. You shouldn't use (binary) floats for many
> *accounting* purposes, but for many financial/econometric analyses, floats
> are de rigeur and work much better than decimals (either floating or fixed
> point). If you are collecting gigs of stock prices, you are much more
> likely to be doing the latter than the former.


That makes sense, and I stand corrected.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#11934

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2011-08-21 12:37 +1200
Message-ID<9bb2ahFgjbU1@mid.individual.net>
In reply to#11893
> Robert Kern wrote:
> 
>>That's just incorrect. You shouldn't use (binary) floats for many
>>*accounting* purposes, but for many financial/econometric analyses, floats
>>are de rigeur and work much better than decimals

There's a certain accounting package I work with that *does*
use floats -- binary ones -- for accounting purposes, and
somehow manages to get away with it. Not something I would
recommend trying at home, though.

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#11935

FromChris Angelico <rosuav@gmail.com>
Date2011-08-21 01:54 +0100
Message-ID<mailman.274.1313888073.27778.python-list@python.org>
In reply to#11934
On Sun, Aug 21, 2011 at 1:37 AM, Gregory Ewing
<greg.ewing@canterbury.ac.nz> wrote:
> There's a certain accounting package I work with that *does*
> use floats -- binary ones -- for accounting purposes, and
> somehow manages to get away with it. Not something I would
> recommend trying at home, though.
>

Probably quite a few, actually. It's not a very visible problem so
long as you always have plenty of "spare precision", and you round
everything off to two decimals (or however many for your currency).
Eventually you'll start seeing weird results that are a cent off, but
you won't notice them often. And hey. You store $1.23 as 1.23, and it
just works! It must be the right thing to do!

Me, I store dollars-and-cents currency in cents. Always. But that's
because I never need fractional cents. I'm not sure what the best way
to handle fractional cents is, but I'm fairly confident that this
isn't it:

http://thedailywtf.com/Articles/Price-in-Nonsense.aspx

ChrisA

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web