Groups > comp.lang.python > #3075 > unrolled thread

Re: download web pages that are updated by ajax

Started by	Chris Rebert <clp2@rebertia.com>
First post	2011-04-12 11:30 -0700
Last post	2011-04-12 11:30 -0700
Articles	1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: download web pages that are updated by ajax Chris Rebert <clp2@rebertia.com> - 2011-04-12 11:30 -0700

#3075 — Re: download web pages that are updated by ajax

From	Chris Rebert <clp2@rebertia.com>
Date	2011-04-12 11:30 -0700
Subject	Re: download web pages that are updated by ajax
Message-ID	<mailman.273.1302633041.9059.python-list@python.org>

On Tue, Apr 12, 2011 at 7:47 AM, Jabba Laci <jabba.laci@gmail.com> wrote:
> Hi,
>
> I want to download a web page that is updated by AJAX. The page
> requires no human interaction, it is updated automatically:
> http://www.ncbi.nlm.nih.gov/nuccore/CP002059.1
>
> If I download it with wget, I get a file of size 97 KB. The source is
> full of AJAX calls, i.e. the content of the page is not expanded.
> If I open it in a browser and save it manually, the result is a file
> of almost 5 MB whose content is expanded.
>
> (1) How to download such a page with Python? I need the post-AJAX
> version of the page.

I've heard you can drive a web browser using Selenium
(http://code.google.com/p/selenium/ ), have it visit the webpage and
run the JavaScript on it, and then grab the final result.

Cheers,
Chris

[toc] | [standalone]

csiph-web

Re: download web pages that are updated by ajax

Contents

#3075 — Re: download web pages that are updated by ajax