Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #46223

Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()

Newsgroups comp.lang.python
Date 2013-05-27 13:47 -0700
Message-ID <10be5c62-4c58-4b4f-b00a-82d85ee4ef8e@googlegroups.com> (permalink)
Subject Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
From Bryan Britten <britten.bryan@gmail.com>

Show all headers | View raw


Hey, everyone! 

I'm very new to Python and have only been using it for a couple of days, but have some experience in programming (albeit mostly statistical programming in SAS or R) so I'm hoping someone can answer this question in a technical way, but without using an abundant amount of jargon.

The issue I'm having is that I'm trying to pull information from a website to practice Python with, but I'm having trouble getting the data in a timely fashion. If I use the following code:

<code>
import json
import urllib

urlStr = "https://stream.twitter.com/1/statuses/sample.json"

twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)]
</code>

I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if that helps at all.

If I use the following code:

<code>
import urllib

urlStr = "https://stream.twitter.com/1/statuses/sample.json"

fileHandle = urllib.urlopen(urlStr)

twtrText = fileHandle.readlines()
</code>

It takes hours (upwards of 6 or 7, if not more) to finish computing the last command.

With that being said, my question is whether there is a more efficient manner to do this. I'm worried that if it's taking this long to process the .readlines() command, trying to work with the data is going to be a computational nightmare.

Thanks in advance for any insights or advice!

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 13:47 -0700
  Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Roy Smith <roy@panix.com> - 2013-05-27 16:56 -0400
    Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 14:29 -0700
      Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Denis McMahon <denismfmcmahon@gmail.com> - 2013-05-27 21:35 +0000
      Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Fábio Santos <fabiosantosart@gmail.com> - 2013-05-28 00:36 +0100
  Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Dave Angel <davea@davea.name> - 2013-05-27 19:58 -0400
    Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 20:11 -0700
      Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Fábio Santos <fabiosantosart@gmail.com> - 2013-05-28 08:31 +0100
        Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-28 07:32 -0700
        Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Alister <alister.ware@ntlworld.com> - 2013-05-28 17:52 +0000
  Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-05-27 21:40 -0400

csiph-web