Re: Suitable Python code to scrape specific details from web pages.

Path	csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From	Denis McMahon <denismfmcmahon@gmail.com>
Newsgroups	comp.lang.python
Subject	Re: Suitable Python code to scrape specific details from web pages.
Date	Wed, 13 Aug 2014 14:53:41 +0000 (UTC)
Organization	A noiseless patient Spider
Lines	36
Message-ID	<lsfu5l$o7d$3@dont-email.me> (permalink)
References	<a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3@googlegroups.com>
Mime-Version	1.0
Content-Type	text/plain; charset=UTF-8
Content-Transfer-Encoding	8bit
Injection-Date	Wed, 13 Aug 2014 14:53:41 +0000 (UTC)
Injection-Info	mx05.eternal-september.org; posting-host="66ffcfa4470a58bcddbdcd1913f98ab4"; logging-data="24813"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18qa4bR61MJJXjAPCHedPETSWXhQd8YPTM="
User-Agent	Pan/0.136 (I'm far too busy being delicious; GIT 926a150 git://git.gnome.org/pan2)
Cancel-Lock	sha1:FhQooKItpCAFjQRGkLfay/byHeY=
Xref	csiph.com comp.lang.python:76206

Show key headers only | View raw

On Tue, 12 Aug 2014 13:00:30 -0700, Simon Evans wrote:

> in accessing from the 'Racing Post' on a daily basis. Anyhow, the code

Following is some starter code. You will have to look at the output, 
compare it to the web page, and work out how you want to process it 
further. Note that I use beautifulsoup and requests. The output is the 
html for each cell in the table with a line of "+" characters at the 
table row breaks. I suggest you look at the beautifulsoup documentation 
at http://www.crummy.com/software/BeautifulSoup/bs4/doc/ to work out how 
you may wish to select which table cells contain data you are interested 
in and how to extract it.

#!/usr/bin/python
"""
Program to extract data from racingpost.
"""

from bs4 import BeautifulSoup
import requests

r = requests.get( "http://www.racingpost.com/horses2/cards/card.sd?
race_id=607466&r_date=2014-08-13#raceTabs=sc_" )

if r.status_code == 200:
    soup = BeautifulSoup( r.content )
    table = soup.find( "table", id="sc_horseCard" )
    for row in table.find_all( "tr" ):
        for cell in row.find_all( "td" ):
            print cell
        print "+++++++++++++++++++++++++++++++++++++"
else:
    print "HTTP Status", r.status_code

-- 
Denis McMahon, denismfmcmahon@gmail.com

Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread

Thread

Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 13:00 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Rob Gaddi <rgaddi@technologyhighland.invalid> - 2014-08-12 13:11 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 17:28 -0400
    Re: Suitable Python code to scrape specific details from  web pages. alex23 <wuwei23@gmail.com> - 2014-08-18 15:04 +1000
  Re: Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 15:44 -0700
    Re: Suitable Python code to scrape specific details from  web pages. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-13 10:04 +1000
      Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 20:30 -0400
    Re: Suitable Python code to scrape specific details from  web pages. Peter Pearson <ppearson@nowhere.invalid> - 2014-08-13 00:50 +0000
  Re: Suitable Python code to scrape specific details from  web pages. Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-13 14:53 +0000

csiph-web