Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #101204
| X-Received | by 10.98.8.148 with SMTP id 20mr47777802pfi.6.1451819010927; Sun, 03 Jan 2016 03:03:30 -0800 (PST) |
|---|---|
| X-Received | by 10.50.142.42 with SMTP id rt10mr749591igb.0.1451819010897; Sun, 03 Jan 2016 03:03:30 -0800 (PST) |
| Path | csiph.com!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!news.glorb.com!mv3no24873462igc.0!news-out.google.com!l1ni1945igd.0!nntp.google.com!mv3no24873453igc.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail |
| Newsgroups | comp.lang.python |
| Date | Sun, 3 Jan 2016 03:03:30 -0800 (PST) |
| Complaints-To | groups-abuse@google.com |
| Injection-Info | glegroupsg2000goo.googlegroups.com; posting-host=2601:c7:4202:48c0:8d94:6f03:c614:86fe; posting-account=9bBn8QoAAADMuh2I0WricLxnhsUV3OWO |
| NNTP-Posting-Host | 2601:c7:4202:48c0:8d94:6f03:c614:86fe |
| User-Agent | G2/1.0 |
| MIME-Version | 1.0 |
| Message-ID | <43ddcfac-c810-4f85-9b6b-806503ea2b3d@googlegroups.com> (permalink) |
| Subject | Ajax Request + Write to Json Extremely Slow (Webpage Crawler) |
| From | jonafleuraime@gmail.com |
| Injection-Date | Sun, 03 Jan 2016 11:03:30 +0000 |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | quoted-printable |
| Xref | csiph.com comp.lang.python:101204 |
Show key headers only | View raw
I'm editing a simple scraper that crawls a Youtube video's comment page. The crawler uses Ajax to page through comments on the page (infinite scroll) and then saves them to a json file. Even with small number of comments (< 5), it still takes 3+ min for the comments to be added to the json file. I've tried including requests-cache and using ujson instead of json to see if there are any benefits but there's no noticeable difference. You can view the code here: http://stackoverflow.com/questions/34575586/how-to-speed-up-ajax-requests-python-youtube-scraper I'm new to Python so I'm not sure where the bottlenecks are. The finished script will be used to parse through 100,000+ comments so performance is a large factor. -Would using multithreading solve the issue? And if so how would I refactor this to benefit from it? -Or is this strictly a network issue? Thanks!
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
Ajax Request + Write to Json Extremely Slow (Webpage Crawler) jonafleuraime@gmail.com - 2016-01-03 03:03 -0800 Re: Ajax Request + Write to Json Extremely Slow (Webpage Crawler) Steven D'Aprano <steve@pearwood.info> - 2016-01-04 02:42 +1100
csiph-web