Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #101204

Ajax Request + Write to Json Extremely Slow (Webpage Crawler)

Newsgroups comp.lang.python
Date 2016-01-03 03:03 -0800
Message-ID <43ddcfac-c810-4f85-9b6b-806503ea2b3d@googlegroups.com> (permalink)
Subject Ajax Request + Write to Json Extremely Slow (Webpage Crawler)
From jonafleuraime@gmail.com

Show all headers | View raw


I'm editing a simple scraper that crawls a Youtube video's comment page. The crawler uses Ajax to page through comments on the page (infinite scroll) and then saves them to a json file. Even with small number of comments (< 5), it still takes 3+ min for the comments to be added to the json file.

I've tried including requests-cache and using ujson instead of json to see if there are any benefits but there's no noticeable difference.

You can view the code here: http://stackoverflow.com/questions/34575586/how-to-speed-up-ajax-requests-python-youtube-scraper

I'm new to Python so I'm not sure where the bottlenecks are. The finished script will be used to parse through 100,000+ comments so performance is a large factor.

-Would using multithreading solve the issue? And if so how would I refactor this to benefit from it?
-Or is this strictly a network issue?

Thanks!

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Ajax Request + Write to Json Extremely Slow (Webpage Crawler) jonafleuraime@gmail.com - 2016-01-03 03:03 -0800
  Re: Ajax Request + Write to Json Extremely Slow (Webpage Crawler) Steven D'Aprano <steve@pearwood.info> - 2016-01-04 02:42 +1100

csiph-web