Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #56871 > unrolled thread
| Started by | Harsh Jha <harshjha2006@gmail.com> |
|---|---|
| First post | 2013-10-15 23:55 -0700 |
| Last post | 2013-10-16 23:09 +0200 |
| Articles | 9 — 9 participants |
Back to article view | Back to comp.lang.python
How pickle helps in reading huge files? Harsh Jha <harshjha2006@gmail.com> - 2013-10-15 23:55 -0700
Re: How pickle helps in reading huge files? Stephane Wirtel <stephane@wirtel.be> - 2013-10-16 09:05 +0200
Re: How pickle helps in reading huge files? rusi <rustompmody@gmail.com> - 2013-10-16 01:51 -0700
Re: How pickle helps in reading huge files? Chris Angelico <rosuav@gmail.com> - 2013-10-16 20:09 +1100
Re: How pickle helps in reading huge files? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-16 08:39 +0100
Re: How pickle helps in reading huge files? Roy Smith <roy@panix.com> - 2013-10-16 08:29 -0400
Re: How pickle helps in reading huge files? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-10-16 13:32 -0400
Re: How pickle helps in reading huge files? Peter Cacioppi <peter.cacioppi@gmail.com> - 2013-10-16 14:04 -0700
Re: How pickle helps in reading huge files? Irmen de Jong <irmen.NOSPAM@xs4all.nl> - 2013-10-16 23:09 +0200
| From | Harsh Jha <harshjha2006@gmail.com> |
|---|---|
| Date | 2013-10-15 23:55 -0700 |
| Subject | How pickle helps in reading huge files? |
| Message-ID | <0044bfd0-f07f-4f7b-b976-5df034b6fec6@googlegroups.com> |
I've a huge csv file and I want to read stuff from it again and again. Is it useful to pickle it and keep and then unpickle it whenever I need to use that data? Is it faster that accessing that file simply by opening it again and again? Please explain, why? Thank you.
[toc] | [next] | [standalone]
| From | Stephane Wirtel <stephane@wirtel.be> |
|---|---|
| Date | 2013-10-16 09:05 +0200 |
| Message-ID | <mailman.1107.1381907510.18130.python-list@python.org> |
| In reply to | #56871 |
Keep it in memory > On 16 oct. 2013, at 08:55 AM, Harsh Jha <harshjha2006@gmail.com> wrote: > > I've a huge csv file and I want to read stuff from it again and again. Is it useful to pickle it and keep and then unpickle it whenever I need to use that data? Is it faster that accessing that file simply by opening it again and again? Please explain, why? > > Thank you. > -- > https://mail.python.org/mailman/listinfo/python-list
[toc] | [prev] | [next] | [standalone]
| From | rusi <rustompmody@gmail.com> |
|---|---|
| Date | 2013-10-16 01:51 -0700 |
| Message-ID | <81e53ed7-cc3e-437d-966d-9c1d79dc8c9f@googlegroups.com> |
| In reply to | #56872 |
On Wednesday, October 16, 2013 12:35:42 PM UTC+5:30, Stéphane Wirtel wrote: > Keep it in memory Thats a strange answer given that the OP says his file is huge. Of course 'huge' may not really be huge -- that really depends on the h/w he's using.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-10-16 20:09 +1100 |
| Message-ID | <mailman.1109.1381914565.18130.python-list@python.org> |
| In reply to | #56874 |
On Wed, Oct 16, 2013 at 7:51 PM, rusi <rustompmody@gmail.com> wrote: > On Wednesday, October 16, 2013 12:35:42 PM UTC+5:30, Stéphane Wirtel wrote: >> Keep it in memory > > Thats a strange answer given that the OP says his file is huge. > Of course 'huge' may not really be huge -- that really depends on the h/w he's using. Most people's idea of a big file is one that has a few thousand lines in it. That may be pretty huge in terms of manual work, but it'd fit inside memory easily enough. And even if it really is bigger than memory, chances are you can use your page file and still keep it in "memory" - and that's generally the easiest, if perhaps not the most efficient, solution. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2013-10-16 08:39 +0100 |
| Message-ID | <mailman.1108.1381909219.18130.python-list@python.org> |
| In reply to | #56871 |
On 16/10/2013 07:55, Harsh Jha wrote: > I've a huge csv file and I want to read stuff from it again and again. Is it useful to pickle it and keep and then unpickle it whenever I need to use that data? Is it faster that accessing that file simply by opening it again and again? Please explain, why? > > Thank you. > What's your definition of huge? Maybe it would be effective to pickle and unpickle but until you try it, perhaps with a relatively small data sample, how can you know? Why can't you leave the file open and keep iterating over the contents? -- Roses are red, Violets are blue, Most poems rhyme, But this one doesn't. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2013-10-16 08:29 -0400 |
| Message-ID | <roy-0A1DFD.08294616102013@news.panix.com> |
| In reply to | #56871 |
In article <0044bfd0-f07f-4f7b-b976-5df034b6fec6@googlegroups.com>, Harsh Jha <harshjha2006@gmail.com> wrote: > I've a huge csv file and I want to read stuff from it again and again. Is it > useful to pickle it and keep and then unpickle it whenever I need to use that > data? Is it faster that accessing that file simply by opening it again and > again? Please explain, why? > > Thank you. It can be. I did a project a bunch of years ago which involved reading (and parsing) SNMP MIBs before you could do any work. Startup took something like 10-20 seconds. If I pre-parsed the MIBs and wrote out the data structures as pickles, I could cut startup time to a couple of seconds. But, that's because the parsing I was doing was pretty complicated. Parsing a CSV file is much easier, so I wouldn't expect you to have much improvement reading a pickle file vs. reading the original CSV. The bottom line is, you should try it. Pickling a data structure is about one line of code (not counting the 'import cPickle'). Try it and see what happens. Time how long it takes to read the original file, and how long it takes to read the pickle. Let us know your results. Also, let us know what "huge" means. 1000 rows? A million? 100 million?
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2013-10-16 13:32 -0400 |
| Message-ID | <mailman.1115.1381944762.18130.python-list@python.org> |
| In reply to | #56871 |
On Tue, 15 Oct 2013 23:55:26 -0700 (PDT), Harsh Jha
<harshjha2006@gmail.com> declaimed the following:
>I've a huge csv file and I want to read stuff from it again and again. Is it useful to pickle it and keep and then unpickle it whenever I need to use that data? Is it faster that accessing that file simply by opening it again and again? Please explain, why?
>
As others mention, what is "huge"?
Does it get updated often? How extensive are updates?
I suspect I'd use the CSV module to parse it into an SQLite3 database,
then use the database for the repetitive access. NOTE: I've never used
pickle -- but for stuff that is coming in as simple CSV I'd suspect the
parsing (even including the various int()/float() wrapping of numeric
fields) can't be much slower than the object creation/unwrapping used by
pickle; SQLite3 should let you leave the data in numeric formats without
the translation penalty on each use.
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
| From | Peter Cacioppi <peter.cacioppi@gmail.com> |
|---|---|
| Date | 2013-10-16 14:04 -0700 |
| Message-ID | <7e49229c-4dc7-43f3-8785-b72c1ef30018@googlegroups.com> |
| In reply to | #56871 |
On Tuesday, October 15, 2013 11:55:26 PM UTC-7, Harsh Jha wrote: > I've a huge csv file and I want to read stuff from it again and again. Is it useful to pickle it and keep and then unpickle it whenever I need to use that data? Is it faster that accessing that file simply by opening it again and again? Please explain, why? > > > > Thank you. Surprising no-one else mentioned a fairly typical pattern for this sort of situation - the compromise between "read from disk" and "read from memory" is "implement a cache". I've had lots of good experiences hand rolling simple caches, especially if there is an application specific access pattern. Python has nice implementations of things like tuple and dictionary which make caching fairly easy compared to other languages.
[toc] | [prev] | [next] | [standalone]
| From | Irmen de Jong <irmen.NOSPAM@xs4all.nl> |
|---|---|
| Date | 2013-10-16 23:09 +0200 |
| Message-ID | <525f008a$0$15895$e4fe514c@news.xs4all.nl> |
| In reply to | #56897 |
On 16-10-2013 23:04, Peter Cacioppi wrote: > On Tuesday, October 15, 2013 11:55:26 PM UTC-7, Harsh Jha wrote: >> I've a huge csv file and I want to read stuff from it again and again. Is it useful >> to pickle it and keep and then unpickle it whenever I need to use that data? Is it >> faster that accessing that file simply by opening it again and again? Please >> explain, why? >> >> >> >> Thank you. > > Surprising no-one else mentioned a fairly typical pattern for this sort of situation > - the compromise between "read from disk" and "read from memory" is "implement a > cache". ...or: use memory mapped I/O. Just let the OS deal with the 'caching' of memory pages. Irmen
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web