Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #46234

Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()

Path csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!cs.uu.nl!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <davea@davea.name>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.010
X-Spam-Evidence '*H*': 0.98; '*S*': 0.00; 'everyone!': 0.07; 'json': 0.07; 'lines,': 0.07; '32-bit': 0.09; 'fashion.': 0.09; 'separately': 0.09; 'statistical': 0.09; 'subject:()': 0.09; 'url:sample': 0.09; 'python': 0.11; '2.7': 0.14; 'mostly': 0.14; 'bryan': 0.16; 'fetch': 0.16; 'record,': 0.16; 'subject:Reading': 0.16; 'subject:URL': 0.16; 'subject:versus': 0.16; 'url:json': 0.16; 'urllib': 0.16; 'all.': 0.16; 'wrote:': 0.18; 'all,': 0.19; 'trying': 0.19; 'file,': 0.19; 'memory': 0.22; 'programming': 0.22; 'import': 0.22; 'issue.': 0.22; 'header:User-Agent:1': 0.23; 'days,': 0.24; 'regardless': 0.24; 'file.': 0.24; 'looks': 0.24; 'question': 0.24; 'least': 0.26; 'code:': 0.26; 'header:In-Reply- To:1': 0.27; 'idea': 0.28; 'timely': 0.30; "i'm": 0.30; 'getting': 0.31; 'lines': 0.31; 'with,': 0.31; 'file': 0.32; 'running': 0.33; 'trouble': 0.34; "i'd": 0.34; 'subject:from': 0.34; 'could': 0.34; "can't": 0.35; 'skip:u 20': 0.35; 'but': 0.35; 'like,': 0.36; 'much.': 0.36; 'ram': 0.36; 'two': 0.37; 'writes': 0.38; 'skip:[ 10': 0.38; 'to:addr:python-list': 0.38; 'issue': 0.38; 'pm,': 0.38; 'use.': 0.39; 'to:addr:python.org': 0.39; 'space': 0.40; 'how': 0.40; 'is.': 0.60; 'hardware': 0.61; 'helps': 0.61; 'new': 0.61; 'browser': 0.61; 'simple': 0.61; "you're": 0.61; 'first': 0.61; 'information': 0.63; 'name': 0.63; 'such': 0.63; 'size.': 0.65; 'received:74.208': 0.68; 'unusual': 0.74; 'hey,': 0.75; 'hoping': 0.75; 'experiment': 0.84; 'max,': 0.84; 'received:74.208.4.194': 0.84; 'sas': 0.84; 'cutting': 0.91; 'dozen': 0.91
Date Mon, 27 May 2013 19:58:05 -0400
From Dave Angel <davea@davea.name>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6
MIME-Version 1.0
To python-list@python.org
Subject Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()
References <10be5c62-4c58-4b4f-b00a-82d85ee4ef8e@googlegroups.com>
In-Reply-To <10be5c62-4c58-4b4f-b00a-82d85ee4ef8e@googlegroups.com>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-Provags-ID V02:K0:xLvq5PFVcA9p5LR7IF2R4/eJLD5B+nOFzue9ECh2c+B Qy2naKwQhMr680Mbj0/8T9lSR7MeQnkdFNJ61SHxaE2aPQ1+RV XYyOGvQmpEDRmQmKRtFOQDARBqCt/z7ePMD4C9CWNEr+S8W5EL g7Y10vL0vO91sorlVyIXZAbuaWTxLlgU8X1Hs7CAwfF4OH2ksR NdXPvmpwQY+TuQsmkk1R8tpLEbsuwByxUYzvW/ZB/8sOdlyZmx +iax/75RYM03chAq1JvYQLl7Moi+WQRbSJnZYRcRk50+u53I8q eNRwwqhPaQMGclJ+lTNIy1RwlHR4u4PVbCt3Gqf1oJY3PcAM4N 3YwE9FsQIFZ8KwHXJXgI=
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.2268.1369699108.3114.python-list@python.org> (permalink)
Lines 43
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1369699108 news.xs4all.nl 15890 [2001:888:2000:d::a6]:56978
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:46234

Show key headers only | View raw


On 05/27/2013 04:47 PM, Bryan Britten wrote:
> Hey, everyone!
>
> I'm very new to Python and have only been using it for a couple of days, but have some experience in programming (albeit mostly statistical programming in SAS or R) so I'm hoping someone can answer this question in a technical way, but without using an abundant amount of jargon.
>
> The issue I'm having is that I'm trying to pull information from a website to practice Python with, but I'm having trouble getting the data in a timely fashion. If I use the following code:
>
> <code>
> import json
> import urllib
>
> urlStr = "https://stream.twitter.com/1/statuses/sample.json"
>
> twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)]
> </code>
>
> I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if that helps at all.

Which OS?

The first question I'd ask is how big this file is.  I can't tell, since 
it needs a user name & password to actually get the file.  But it's not 
unusual to need at least double that space in memory, and in Windoze 
you're limited to two gig max, regardless of how big your hardware might be.

If you separately fetch the file, then you can experiment with it, 
including cutting it down to a dozen lines, and see if you can deal with 
that much.

How could you fetch it?  With wget, with a browser (and saveAs), with a 
simple loop which uses read(4096) repeatedly and writes each block to a 
local file.  Don't forget to use 'wb', as you don't know yet what line 
endings it might use.

Once you have an idea what the data looks like, you can answer such 
questions as whether it's json at all, whether the lines each contain a 
single json record, or what.

For all we know, the file might be a few terabytes in size.


-- 
DaveA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 13:47 -0700
  Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Roy Smith <roy@panix.com> - 2013-05-27 16:56 -0400
    Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 14:29 -0700
      Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Denis McMahon <denismfmcmahon@gmail.com> - 2013-05-27 21:35 +0000
      Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Fábio Santos <fabiosantosart@gmail.com> - 2013-05-28 00:36 +0100
  Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Dave Angel <davea@davea.name> - 2013-05-27 19:58 -0400
    Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 20:11 -0700
      Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Fábio Santos <fabiosantosart@gmail.com> - 2013-05-28 08:31 +0100
        Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-28 07:32 -0700
        Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Alister <alister.ware@ntlworld.com> - 2013-05-28 17:52 +0000
  Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-05-27 21:40 -0400

csiph-web