X-Received: by 10.224.200.202 with SMTP id ex10mr16014445qab.8.1369687660373; Mon, 27 May 2013 13:47:40 -0700 (PDT) X-Received: by 10.49.106.169 with SMTP id gv9mr2281343qeb.7.1369687660260; Mon, 27 May 2013 13:47:40 -0700 (PDT) Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!ch1no1391541qab.0!news-out.google.com!y6ni51517qax.0!nntp.google.com!ch1no1391533qab.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Mon, 27 May 2013 13:47:40 -0700 (PDT) Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=71.227.60.38; posting-account=qbQxwgoAAAC_cqNZ8o1eYcWwCyhq-pVx NNTP-Posting-Host: 71.227.60.38 User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <10be5c62-4c58-4b4f-b00a-82d85ee4ef8e@googlegroups.com> Subject: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() From: Bryan Britten Injection-Date: Mon, 27 May 2013 20:47:40 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Xref: csiph.com comp.lang.python:46223 Hey, everyone!=20 I'm very new to Python and have only been using it for a couple of days, bu= t have some experience in programming (albeit mostly statistical programmin= g in SAS or R) so I'm hoping someone can answer this question in a technica= l way, but without using an abundant amount of jargon. The issue I'm having is that I'm trying to pull information from a website = to practice Python with, but I'm having trouble getting the data in a timel= y fashion. If I use the following code: import json import urllib urlStr =3D "https://stream.twitter.com/1/statuses/sample.json" twtrDict =3D [json.loads(line) for line in urllib.urlopen(urlStr)] I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if t= hat helps at all. If I use the following code: import urllib urlStr =3D "https://stream.twitter.com/1/statuses/sample.json" fileHandle =3D urllib.urlopen(urlStr) twtrText =3D fileHandle.readlines() It takes hours (upwards of 6 or 7, if not more) to finish computing the las= t command. With that being said, my question is whether there is a more efficient mann= er to do this. I'm worried that if it's taking this long to process the .re= adlines() command, trying to work with the data is going to be a computatio= nal nightmare. Thanks in advance for any insights or advice!