Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #76450

Re: Suitable Python code to scrape specific details from web pages.

From alex23 <wuwei23@gmail.com>
Newsgroups comp.lang.python
Subject Re: Suitable Python code to scrape specific details from web pages.
Date 2014-08-18 15:04 +1000
Organization A noiseless patient Spider
Message-ID <lss1gu$qaf$1@dont-email.me> (permalink)
References <a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3@googlegroups.com> <roy-AD3509.17281512082014@news.panix.com>

Show all headers | View raw


On 13/08/2014 7:28 AM, Roy Smith wrote:
> Second, if you're going to be parsing web pages, trying to use regexes
> is a losing game.  You need something that knows how to parse HTML.  The
> canonical answer is lxml (http://lxml.de/), but Beautiful Soup
> (http://www.crummy.com/software/BeautifulSoup/) is less intimidating to
> use.

lxml also has a BeautifulSoup parser, so you can easily mix and match 
approaches:

http://lxml.de/elementsoup.html

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 13:00 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Rob Gaddi <rgaddi@technologyhighland.invalid> - 2014-08-12 13:11 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 17:28 -0400
    Re: Suitable Python code to scrape specific details from  web pages. alex23 <wuwei23@gmail.com> - 2014-08-18 15:04 +1000
  Re: Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 15:44 -0700
    Re: Suitable Python code to scrape specific details from  web pages. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-13 10:04 +1000
      Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 20:30 -0400
    Re: Suitable Python code to scrape specific details from  web pages. Peter Pearson <ppearson@nowhere.invalid> - 2014-08-13 00:50 +0000
  Re: Suitable Python code to scrape specific details from  web pages. Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-13 14:53 +0000

csiph-web