Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #46248
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2013-05-27 20:11 -0700 |
| References | <10be5c62-4c58-4b4f-b00a-82d85ee4ef8e@googlegroups.com> <mailman.2268.1369699108.3114.python-list@python.org> |
| Message-ID | <4db5a3be-d9dc-455c-8e3b-5adebad2dcdd@googlegroups.com> (permalink) |
| Subject | Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() |
| From | Bryan Britten <britten.bryan@gmail.com> |
On Monday, May 27, 2013 7:58:05 PM UTC-4, Dave Angel wrote:
> On 05/27/2013 04:47 PM, Bryan Britten wrote:
>
> > Hey, everyone!
>
> >
>
> > I'm very new to Python and have only been using it for a couple of days, but have some experience in programming (albeit mostly statistical programming in SAS or R) so I'm hoping someone can answer this question in a technical way, but without using an abundant amount of jargon.
>
> >
>
> > The issue I'm having is that I'm trying to pull information from a website to practice Python with, but I'm having trouble getting the data in a timely fashion. If I use the following code:
>
> >
>
> > <code>
>
> > import json
>
> > import urllib
>
> >
>
> > urlStr = "https://stream.twitter.com/1/statuses/sample.json"
>
> >
>
> > twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)]
>
> > </code>
>
> >
>
> > I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if that helps at all.
>
>
>
> Which OS?
I'm operating on Windows 7.
>
> The first question I'd ask is how big this file is. I can't tell, since
>
> it needs a user name & password to actually get the file.
If you have Twitter, you can just use your log-in information to access the file.
> But it's not unusual to need at least double that space in memory, and in Windoze
>
> you're limited to two gig max, regardless of how big your hardware might be.
>
>
>
> If you separately fetch the file, then you can experiment with it,
>
> including cutting it down to a dozen lines, and see if you can deal with
>
> that much.
>
>
>
> How could you fetch it? With wget, with a browser (and saveAs), with a
>
> simple loop which uses read(4096) repeatedly and writes each block to a
>
> local file. Don't forget to use 'wb', as you don't know yet what line
>
> endings it might use.
>
I'm not familiar with using read(4096), I'll have to look into that. When I tried to just save the file, my computer just sat in limbo for some time and didn't seem to want to process the command.
>
> Once you have an idea what the data looks like, you can answer such
>
> questions as whether it's json at all, whether the lines each contain a
>
> single json record, or what.
>
Based on my *extremely* limited knowledge of JSON, that's definitely the type of file this is. Here is a snippet of what is seen when you log in:
{"created_at":"Tue May 28 03:09:23 +0000 2013","id":339216806461972481,"id_str":"339216806461972481","text":"RT @aleon_11: Sigo creyendo que las noches lluviosas me acercan mucho m\u00e1s a ti!","source":"\u003ca href=\"http:\/\/blackberry.com\/twitter\" rel=\"nofollow\"\u003eTwitter for BlackBerry\u00ae\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":310910123,"id_str":"310910123","name":"\u2661","screen_name":"LaMarielita_","location":"","url":null,"description":"MERCADOLOGA & PUBLICISTA EN PROCESO, AMO A MI DIOS & MI FAMILIA\u2665 ME ENCANTA REIRME , MOLESTAR & HABLAR :D BFF, pancho, ale & china :) LY\u2661","protected":false,"followers_count":506,"friends_count":606,"listed_count":1,"created_at":"Sat Jun 04 15:24:19 +0000 2011","favourites_count":207,"utc_offset":-25200,"time_zone":"Mountain Time (US & Canada)","geo_enabled":false,"verified":false,"statuses_count":17241,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"FF6699","profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme11\/bg.gif","profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme11\/bg.gif","profile_background_tile":true,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3720425493\/13a48910e56ca34edeea07ff04075c77_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3720425493\/13a48910e56ca34edeea07ff04075c77_normal.jpeg","profile_link_color":"B40B43","profile_sidebar_border_color":"CC3366","profile_sidebar_fill_color":"E5507E","profile_text_color":"362720","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Tue May 28 02:57:40 +0000 2013","id":339213856922537984,"id_str":"339213856922537984","text":"Sigo creyendo que las noches lluviosas me acercan mucho m\u00e1s a ti!","source":"web","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":105252134,"id_str":"105252134","name":"Alejandra Le\u00f3n","screen_name":"aleon_11","location":"Guatemala","url":null,"description":"La vida se disfruta m\u00e1s, cuando no se le pone tanta importancia.","protected":false,"followers_count":143,"friends_count":251,"listed_count":0,"created_at":"Fri Jan 15 20:49:38 +0000 2010","favourites_count":83,"utc_offset":-28800,"time_zone":"Pacific Time (US & Canada)","geo_enabled":false,"verified":false,"statuses_count":1863,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"F8F2FC","profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_images\/811443451\/81abf2f37ee3e37deda396befa7fb557.jpeg","profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/811443451\/81abf2f37ee3e37deda396befa7fb557.jpeg","profile_background_tile":true,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3578979563\/e973196904e25af5d960f2971616eb61_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3578979563\/e973196904e25af5d960f2971616eb61_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/105252134\/1364957374","profile_link_color":"F01A1A","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"7AC3EE","profile_text_color":"3D1957","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":2,"favorite_count":0,"entities":{"hashtags":[],"symbols":[],"urls":[],"user_mentions":[]},"favorited":false,"retweeted":false,"lang":"es"},"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"symbols":[],"urls":[],"user_mentions":[{"screen_name":"aleon_11","name":"Alejandra Le\u00f3n","id":105252134,"id_str":"105252134","indices":[3,12]}]},"favorited":false,"retweeted":false,"filter_level":"low"}
>
> For all we know, the file might be a few terabytes in size.
>
>
>
>
>
> --
>
> DaveA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 13:47 -0700
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Roy Smith <roy@panix.com> - 2013-05-27 16:56 -0400
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 14:29 -0700
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Denis McMahon <denismfmcmahon@gmail.com> - 2013-05-27 21:35 +0000
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Fábio Santos <fabiosantosart@gmail.com> - 2013-05-28 00:36 +0100
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Dave Angel <davea@davea.name> - 2013-05-27 19:58 -0400
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-27 20:11 -0700
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Fábio Santos <fabiosantosart@gmail.com> - 2013-05-28 08:31 +0100
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Bryan Britten <britten.bryan@gmail.com> - 2013-05-28 07:32 -0700
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Alister <alister.ware@ntlworld.com> - 2013-05-28 17:52 +0000
Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines() Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-05-27 21:40 -0400
csiph-web