Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #76206

Re: Suitable Python code to scrape specific details from web pages.

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From Denis McMahon <denismfmcmahon@gmail.com>
Newsgroups comp.lang.python
Subject Re: Suitable Python code to scrape specific details from web pages.
Date Wed, 13 Aug 2014 14:53:41 +0000 (UTC)
Organization A noiseless patient Spider
Lines 36
Message-ID <lsfu5l$o7d$3@dont-email.me> (permalink)
References <a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3@googlegroups.com>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding 8bit
Injection-Date Wed, 13 Aug 2014 14:53:41 +0000 (UTC)
Injection-Info mx05.eternal-september.org; posting-host="66ffcfa4470a58bcddbdcd1913f98ab4"; logging-data="24813"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18qa4bR61MJJXjAPCHedPETSWXhQd8YPTM="
User-Agent Pan/0.136 (I'm far too busy being delicious; GIT 926a150 git://git.gnome.org/pan2)
Cancel-Lock sha1:FhQooKItpCAFjQRGkLfay/byHeY=
Xref csiph.com comp.lang.python:76206

Show key headers only | View raw


On Tue, 12 Aug 2014 13:00:30 -0700, Simon Evans wrote:

> in accessing from the 'Racing Post' on a daily basis. Anyhow, the code

Following is some starter code. You will have to look at the output, 
compare it to the web page, and work out how you want to process it 
further. Note that I use beautifulsoup and requests. The output is the 
html for each cell in the table with a line of "+" characters at the 
table row breaks. I suggest you look at the beautifulsoup documentation 
at http://www.crummy.com/software/BeautifulSoup/bs4/doc/ to work out how 
you may wish to select which table cells contain data you are interested 
in and how to extract it.

#!/usr/bin/python
"""
Program to extract data from racingpost.
"""

from bs4 import BeautifulSoup
import requests

r = requests.get( "http://www.racingpost.com/horses2/cards/card.sd?
race_id=607466&r_date=2014-08-13#raceTabs=sc_" )

if r.status_code == 200:
    soup = BeautifulSoup( r.content )
    table = soup.find( "table", id="sc_horseCard" )
    for row in table.find_all( "tr" ):
        for cell in row.find_all( "td" ):
            print cell
        print "+++++++++++++++++++++++++++++++++++++"
else:
    print "HTTP Status", r.status_code

-- 
Denis McMahon, denismfmcmahon@gmail.com

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 13:00 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Rob Gaddi <rgaddi@technologyhighland.invalid> - 2014-08-12 13:11 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 17:28 -0400
    Re: Suitable Python code to scrape specific details from  web pages. alex23 <wuwei23@gmail.com> - 2014-08-18 15:04 +1000
  Re: Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 15:44 -0700
    Re: Suitable Python code to scrape specific details from  web pages. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-13 10:04 +1000
      Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 20:30 -0400
    Re: Suitable Python code to scrape specific details from  web pages. Peter Pearson <ppearson@nowhere.invalid> - 2014-08-13 00:50 +0000
  Re: Suitable Python code to scrape specific details from  web pages. Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-13 14:53 +0000

csiph-web