Groups > comp.lang.python > #52656 > unrolled thread

crawling/parsing a webpage based on dynamic javascript

Started by	bruce <badouglas@gmail.com>
First post	2013-08-18 08:40 -0400
Last post	2013-08-18 08:40 -0400
Articles	1 — 1 participant

Back to article view | Back to comp.lang.python

  crawling/parsing a webpage based on dynamic javascript bruce <badouglas@gmail.com> - 2013-08-18 08:40 -0400

#52656 — crawling/parsing a webpage based on dynamic javascript

From	bruce <badouglas@gmail.com>
Date	2013-08-18 08:40 -0400
Subject	crawling/parsing a webpage based on dynamic javascript
Message-ID	<mailman.27.1376829645.23369.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

Hi.

Looking at using python/cerely/twisted to test in parsing a test site. Also
looking at being able to parse a site created using dynamic javascript.

I've got test apps to parse a site, but I'm interested in getting a better
understanding of using multi-thread/multi-processing approaches to spin out
as many fetch processes as possible.

At the same time, I'm interested in understanding a bit better what's used
for parsing the javascript pages in the py world.

Also, rather than just point me to something like "scrapy", I'm actually
interested in finding someone who's done this that I can talk to.

Heck, for the right person, I'll even toss some cash your way!!

Thanks

[toc] | [standalone]

csiph-web

crawling/parsing a webpage based on dynamic javascript

Contents

#52656 — crawling/parsing a webpage based on dynamic javascript