Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #46224

Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()

From Roy Smith <roy@panix.com>
Newsgroups comp.lang.python
Subject Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
Date 2013-05-27 16:56 -0400
Organization PANIX Public Access Internet and UNIX, NYC
Message-ID <roy-E04E0C.16563127052013@news.panix.com> (permalink)
References <10be5c62-4c58-4b4f-b00a-82d85ee4ef8e@googlegroups.com>

Show all headers | View raw


In article <10be5c62-4c58-4b4f-b00a-82d85ee4ef8e@googlegroups.com>,
 Bryan Britten <britten.bryan@gmail.com> wrote:

> If I use the following code:
> 
> <code>
> import urllib
> 
> urlStr = "https://stream.twitter.com/1/statuses/sample.json"
> 
> fileHandle = urllib.urlopen(urlStr)
> 
> twtrText = fileHandle.readlines()
> </code>
> 
> It takes hours (upwards of 6 or 7, if not more) to finish computing the last 
> command.

I'm not surprised!  readlines() reads in the ENTIRE file in one gulp.  
That a lot of tweets!

> With that being said, my question is whether there is a more efficient manner 
> to do this.

In general, when reading a large file, you want to iterate over lines of 
the file and process each one.  Something like:

for line in urllib.urlopen(urlStr):
   twtrDict = json.loads(line)

You still need to download and process all the data, but at least you 
don't need to store it in memory all at once.  There is an assumption 
here that there's exactly one json object per line.  If that's not the 
case, things might get a little more complicated.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 13:47 -0700
  Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Roy Smith <roy@panix.com> - 2013-05-27 16:56 -0400
    Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 14:29 -0700
      Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Denis McMahon <denismfmcmahon@gmail.com> - 2013-05-27 21:35 +0000
      Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Fábio Santos <fabiosantosart@gmail.com> - 2013-05-28 00:36 +0100
  Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Dave Angel <davea@davea.name> - 2013-05-27 19:58 -0400
    Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 20:11 -0700
      Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Fábio Santos <fabiosantosart@gmail.com> - 2013-05-28 08:31 +0100
        Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-28 07:32 -0700
        Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Alister <alister.ware@ntlworld.com> - 2013-05-28 17:52 +0000
  Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-05-27 21:40 -0400

csiph-web