Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #46224
| From | Roy Smith <roy@panix.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() |
| Date | 2013-05-27 16:56 -0400 |
| Organization | PANIX Public Access Internet and UNIX, NYC |
| Message-ID | <roy-E04E0C.16563127052013@news.panix.com> (permalink) |
| References | <10be5c62-4c58-4b4f-b00a-82d85ee4ef8e@googlegroups.com> |
In article <10be5c62-4c58-4b4f-b00a-82d85ee4ef8e@googlegroups.com>, Bryan Britten <britten.bryan@gmail.com> wrote: > If I use the following code: > > <code> > import urllib > > urlStr = "https://stream.twitter.com/1/statuses/sample.json" > > fileHandle = urllib.urlopen(urlStr) > > twtrText = fileHandle.readlines() > </code> > > It takes hours (upwards of 6 or 7, if not more) to finish computing the last > command. I'm not surprised! readlines() reads in the ENTIRE file in one gulp. That a lot of tweets! > With that being said, my question is whether there is a more efficient manner > to do this. In general, when reading a large file, you want to iterate over lines of the file and process each one. Something like: for line in urllib.urlopen(urlStr): twtrDict = json.loads(line) You still need to download and process all the data, but at least you don't need to store it in memory all at once. There is an assumption here that there's exactly one json object per line. If that's not the case, things might get a little more complicated.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 13:47 -0700
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Roy Smith <roy@panix.com> - 2013-05-27 16:56 -0400
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 14:29 -0700
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Denis McMahon <denismfmcmahon@gmail.com> - 2013-05-27 21:35 +0000
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Fábio Santos <fabiosantosart@gmail.com> - 2013-05-28 00:36 +0100
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Dave Angel <davea@davea.name> - 2013-05-27 19:58 -0400
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 20:11 -0700
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Fábio Santos <fabiosantosart@gmail.com> - 2013-05-28 08:31 +0100
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-28 07:32 -0700
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Alister <alister.ware@ntlworld.com> - 2013-05-28 17:52 +0000
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-05-27 21:40 -0400
csiph-web