Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #11848 > unrolled thread
| Started by | Forafo San <ppv.grps@gmail.com> |
|---|---|
| First post | 2011-08-19 08:31 -0700 |
| Last post | 2011-08-21 01:54 +0100 |
| Articles | 11 — 8 participants |
Back to article view | Back to comp.lang.python
Replacement for the shelve module? Forafo San <ppv.grps@gmail.com> - 2011-08-19 08:31 -0700
Re: Replacement for the shelve module? Ken Watford <kwatford@gmail.com> - 2011-08-19 11:49 -0400
Re: Replacement for the shelve module? Thomas Jollans <t@jollybox.de> - 2011-08-19 17:54 +0200
Re: Replacement for the shelve module? Forafo San <ppv.grps@gmail.com> - 2011-08-19 09:21 -0700
Re: Replacement for the shelve module? Miki Tebeka <miki.tebeka@gmail.com> - 2011-08-19 10:15 -0700
Re: Replacement for the shelve module? Robert Kern <robert.kern@gmail.com> - 2011-08-19 12:45 -0500
Re: Replacement for the shelve module? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-08-20 06:36 +1000
Re: Replacement for the shelve module? Robert Kern <robert.kern@gmail.com> - 2011-08-19 17:24 -0500
Re: Replacement for the shelve module? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-08-20 11:38 +1000
Re: Replacement for the shelve module? Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-08-21 12:37 +1200
Re: Replacement for the shelve module? Chris Angelico <rosuav@gmail.com> - 2011-08-21 01:54 +0100
| From | Forafo San <ppv.grps@gmail.com> |
|---|---|
| Date | 2011-08-19 08:31 -0700 |
| Subject | Replacement for the shelve module? |
| Message-ID | <1e35ff5e-785e-41db-a50f-976e6ef60692@h9g2000vbr.googlegroups.com> |
Folks, What might be a good replacement for the shelve module, but one that can handle a few gigs of data. I'm doing some calculations on daily stock prices and the result is a nested list like: [[date_1, floating result 1], [date_2, floating result 2], ... [date_n, floating result n]] However, there are about 5,000 lists like that, one for each stock symbol. Using the shelve module I could easily save them to a file ( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the data. But shelve is deprecated AND when a lot of data is written shelve was acting weird (refusing to write, filesizes reported with an "ls" did not make sense, etc.). Thanks in advance for your suggestions.
[toc] | [next] | [standalone]
| From | Ken Watford <kwatford@gmail.com> |
|---|---|
| Date | 2011-08-19 11:49 -0400 |
| Message-ID | <mailman.223.1313768999.27778.python-list@python.org> |
| In reply to | #11848 |
On Fri, Aug 19, 2011 at 11:31 AM, Forafo San <ppv.grps@gmail.com> wrote: > Folks, > What might be a good replacement for the shelve module, but one that > can handle a few gigs of data. I'm doing some calculations on daily > stock prices and the result is a nested list like: For what you're doing, I would give PyTables a try.
[toc] | [prev] | [next] | [standalone]
| From | Thomas Jollans <t@jollybox.de> |
|---|---|
| Date | 2011-08-19 17:54 +0200 |
| Message-ID | <mailman.225.1313769596.27778.python-list@python.org> |
| In reply to | #11848 |
On 19/08/11 17:31, Forafo San wrote: > Folks, > What might be a good replacement for the shelve module, but one that > can handle a few gigs of data. I'm doing some calculations on daily > stock prices and the result is a nested list like: > > [[date_1, floating result 1], > [date_2, floating result 2], > ... > [date_n, floating result n]] > > However, there are about 5,000 lists like that, one for each stock > symbol. Using the shelve module I could easily save them to a file > ( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the > data. But shelve is deprecated AND when a lot of data is written > shelve was acting weird (refusing to write, filesizes reported with an > "ls" did not make sense, etc.). > > Thanks in advance for your suggestions. Firstly, since when is shelve deprecated? Shouldn't there be a deprecation warning on http://docs.python.org/dev/library/shelve.html ? If you want to keep your current approach of having an object containing all the data for each symbol, you will have to think about how to serialise the data, as well as how to store the documents/objects individually. For the serialisation, you can use pickle (as shelve does) or JSON (probably better because it's easier to edit directly, and therefore easier to debug). To store these documents, you could use a huge pickle'd Python dictionary (bad idea), a UNIX database (dbm module, anydbm in Python2; this is what shelve uses), or simple the file system: one file per serialised object. Looking at your use case, however, I think what you really should use is a SQL database. SQLite is part of Python and will do the job nicely. Just use a single table with three columns: symbol, date, value. Thomas
[toc] | [prev] | [next] | [standalone]
| From | Forafo San <ppv.grps@gmail.com> |
|---|---|
| Date | 2011-08-19 09:21 -0700 |
| Message-ID | <c9f9f26c-ec98-4461-b9ef-becb855bfc69@z7g2000vbp.googlegroups.com> |
| In reply to | #11852 |
On Aug 19, 11:54 am, Thomas Jollans <t...@jollybox.de> wrote: > On 19/08/11 17:31, Forafo San wrote: > > > > > > > > > > > Folks, > > What might be a good replacement for the shelve module, but one that > > can handle a few gigs of data. I'm doing some calculations on daily > > stock prices and the result is a nested list like: > > > [[date_1, floating result 1], > > [date_2, floating result 2], > > ... > > [date_n, floating result n]] > > > However, there are about 5,000 lists like that, one for each stock > > symbol. Using the shelve module I could easily save them to a file > > ( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the > > data. But shelve is deprecated AND when a lot of data is written > > shelve was acting weird (refusing to write, filesizes reported with an > > "ls" did not make sense, etc.). > > > Thanks in advance for your suggestions. > > Firstly, since when is shelve deprecated? Shouldn't there be a > deprecation warning onhttp://docs.python.org/dev/library/shelve.html? > > If you want to keep your current approach of having an object containing > all the data for each symbol, you will have to think about how to > serialise the data, as well as how to store the documents/objects > individually. For the serialisation, you can use pickle (as shelve does) > or JSON (probably better because it's easier to edit directly, and > therefore easier to debug). > To store these documents, you could use a huge pickle'd Python > dictionary (bad idea), a UNIX database (dbm module, anydbm in Python2; > this is what shelve uses), or simple the file system: one file per > serialised object. > > Looking at your use case, however, I think what you really should use is > a SQL database. SQLite is part of Python and will do the job nicely. > Just use a single table with three columns: symbol, date, value. > > Thomas Sorry. There is no indication that shelve is deprecated. I was using it on a FreeBSD system and it turns out that the bsddb module is deprecated and confused it with the shelve module. Thanks Ken and Thomas for your suggestions -- I will play around with both and pick one.
[toc] | [prev] | [next] | [standalone]
| From | Miki Tebeka <miki.tebeka@gmail.com> |
|---|---|
| Date | 2011-08-19 10:15 -0700 |
| Message-ID | <d339297b-f687-4756-8ba9-0d31d629f9dc@glegroupsg2000goo.googlegroups.com> |
| In reply to | #11848 |
You might check one of many binary encoders (like Avro, Thrift ...). The other option is to use a database, sqlite3 is pretty fast (if you schema is fixed). Otherwise you can look at some NoSQL ones (like MongoDB).
[toc] | [prev] | [next] | [standalone]
| From | Robert Kern <robert.kern@gmail.com> |
|---|---|
| Date | 2011-08-19 12:45 -0500 |
| Message-ID | <mailman.228.1313775949.27778.python-list@python.org> |
| In reply to | #11848 |
On 8/19/11 10:49 AM, Ken Watford wrote: > On Fri, Aug 19, 2011 at 11:31 AM, Forafo San<ppv.grps@gmail.com> wrote: >> Folks, >> What might be a good replacement for the shelve module, but one that >> can handle a few gigs of data. I'm doing some calculations on daily >> stock prices and the result is a nested list like: > > For what you're doing, I would give PyTables a try. For a few gigs of stock price data, this is what I use. Much better than SQLite for that amount of data. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2011-08-20 06:36 +1000 |
| Message-ID | <4e4ec962$0$29986$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #11848 |
Forafo San wrote:
> Folks,
> What might be a good replacement for the shelve module, but one that
> can handle a few gigs of data. I'm doing some calculations on daily
> stock prices and the result is a nested list like:
>
> [[date_1, floating result 1],
> [date_2, floating result 2],
> ...
> [date_n, floating result n]]
>
> However, there are about 5,000 lists like that, one for each stock
> symbol.
You might save some memory by using tuples rather than lists:
>>> sys.getsizeof(["01/01/2000", 123.456]) # On a 32-bit system.
40
>>> sys.getsizeof(("01/01/2000", 123.456))
32
By the way, you know that you should never, ever use floats for currency,
right?
http://vladzloteanu.wordpress.com/2010/01/11/why-you-shouldnt-use-float-for-currency-floating-point-issues-explained-for-ruby-and-ror/
http://stackoverflow.com/questions/3730019/why-not-use-double-or-float-to-represent-currency
> Using the shelve module I could easily save them to a file
> ( myshelvefile['symbol_1') = symbol_1_list) and likewise retrieve the
> data. But shelve is deprecated
It certainly is not.
http://docs.python.org/library/shelve.html
http://docs.python.org/py3k/library/shelve.html
Not a word about it being deprecated in either Python 2.x or 3.x.
> AND when a lot of data is written
> shelve was acting weird (refusing to write, filesizes reported with an
> "ls" did not make sense, etc.).
I would like to see this replicated. If it is true, that's a bug in shelve,
but I expect you're probably doing something wrong.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Robert Kern <robert.kern@gmail.com> |
|---|---|
| Date | 2011-08-19 17:24 -0500 |
| Message-ID | <mailman.247.1313792673.27778.python-list@python.org> |
| In reply to | #11878 |
On 8/19/11 3:36 PM, Steven D'Aprano wrote: > By the way, you know that you should never, ever use floats for currency, > right? That's just incorrect. You shouldn't use (binary) floats for many *accounting* purposes, but for many financial/econometric analyses, floats are de rigeur and work much better than decimals (either floating or fixed point). If you are collecting gigs of stock prices, you are much more likely to be doing the latter than the former. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2011-08-20 11:38 +1000 |
| Message-ID | <4e4f1004$0$29966$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #11890 |
Robert Kern wrote: > On 8/19/11 3:36 PM, Steven D'Aprano wrote: > >> By the way, you know that you should never, ever use floats for currency, >> right? > > That's just incorrect. You shouldn't use (binary) floats for many > *accounting* purposes, but for many financial/econometric analyses, floats > are de rigeur and work much better than decimals (either floating or fixed > point). If you are collecting gigs of stock prices, you are much more > likely to be doing the latter than the former. That makes sense, and I stand corrected. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Gregory Ewing <greg.ewing@canterbury.ac.nz> |
|---|---|
| Date | 2011-08-21 12:37 +1200 |
| Message-ID | <9bb2ahFgjbU1@mid.individual.net> |
| In reply to | #11893 |
> Robert Kern wrote: > >>That's just incorrect. You shouldn't use (binary) floats for many >>*accounting* purposes, but for many financial/econometric analyses, floats >>are de rigeur and work much better than decimals There's a certain accounting package I work with that *does* use floats -- binary ones -- for accounting purposes, and somehow manages to get away with it. Not something I would recommend trying at home, though. -- Greg
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2011-08-21 01:54 +0100 |
| Message-ID | <mailman.274.1313888073.27778.python-list@python.org> |
| In reply to | #11934 |
On Sun, Aug 21, 2011 at 1:37 AM, Gregory Ewing <greg.ewing@canterbury.ac.nz> wrote: > There's a certain accounting package I work with that *does* > use floats -- binary ones -- for accounting purposes, and > somehow manages to get away with it. Not something I would > recommend trying at home, though. > Probably quite a few, actually. It's not a very visible problem so long as you always have plenty of "spare precision", and you round everything off to two decimals (or however many for your currency). Eventually you'll start seeing weird results that are a cent off, but you won't notice them often. And hey. You store $1.23 as 1.23, and it just works! It must be the right thing to do! Me, I store dollars-and-cents currency in cents. Always. But that's because I never need fractional cents. I'm not sure what the best way to handle fractional cents is, but I'm fairly confident that this isn't it: http://thedailywtf.com/Articles/Price-in-Nonsense.aspx ChrisA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web