Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #76156

Re: Suitable Python code to scrape specific details from web pages.

From Roy Smith <roy@panix.com>
Newsgroups comp.lang.python
Subject Re: Suitable Python code to scrape specific details from web pages.
Date 2014-08-12 20:30 -0400
Organization PANIX Public Access Internet and UNIX, NYC
Message-ID <roy-008918.20303912082014@news.panix.com> (permalink)
References <a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3@googlegroups.com> <e2011de5-10fa-4de1-89fa-4e41882a6646@googlegroups.com> <53eaab7d$0$29979$c3e8da3$5496439d@news.astraweb.com>

Show all headers | View raw


In article <53eaab7d$0$29979$c3e8da3$5496439d@news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> By studying how other scraping programs work, and studying how your racing
> pages store data, you should be able to put the two together and see how to
> get the data you want.

It's also worth mentioning, that some web sites *want* you to have their 
data, and make it easy to do so by exposing it via public APIs or other 
download methods.  Wikipedia.  Many government web sites.  Twitter.  
Facebook.  Reddit.

Whenever you start thinking about web scraping, it's always worth 
spending a little time investigating if such an API exists.  If it does, 
that's where you want to go.  If not, well, there's always Beautiful 
Soup :-)

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 13:00 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Rob Gaddi <rgaddi@technologyhighland.invalid> - 2014-08-12 13:11 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 17:28 -0400
    Re: Suitable Python code to scrape specific details from  web pages. alex23 <wuwei23@gmail.com> - 2014-08-18 15:04 +1000
  Re: Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 15:44 -0700
    Re: Suitable Python code to scrape specific details from  web pages. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-13 10:04 +1000
      Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 20:30 -0400
    Re: Suitable Python code to scrape specific details from  web pages. Peter Pearson <ppearson@nowhere.invalid> - 2014-08-13 00:50 +0000
  Re: Suitable Python code to scrape specific details from  web pages. Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-13 14:53 +0000

csiph-web