Re: Suitable Python code to scrape specific details from web pages.

X-Received	by 10.224.45.131 with SMTP id e3mr521429qaf.1.1407883498444; Tue, 12 Aug 2014 15:44:58 -0700 (PDT)
X-Received	by 10.140.32.227 with SMTP id h90mr8068qgh.26.1407883498428; Tue, 12 Aug 2014 15:44:58 -0700 (PDT)
Path	csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!peer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!j15no6667950qaq.0!news-out.google.com!b3ni24360qac.1!nntp.google.com!j15no6667947qaq.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups	comp.lang.python
Date	Tue, 12 Aug 2014 15:44:58 -0700 (PDT)
In-Reply-To	<a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3@googlegroups.com>
Complaints-To	groups-abuse@google.com
Injection-Info	glegroupsg2000goo.googlegroups.com; posting-host=146.90.214.3; posting-account=59tTfwoAAACIDa2nz1oVlQJc3aCJi_5b
NNTP-Posting-Host	146.90.214.3
References	<a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3@googlegroups.com>
User-Agent	G2/1.0
MIME-Version	1.0
Message-ID	<e2011de5-10fa-4de1-89fa-4e41882a6646@googlegroups.com> (permalink)
Subject	Re: Suitable Python code to scrape specific details from web pages.
From	Simon Evans <musicalhacksaw@yahoo.co.uk>
Injection-Date	Tue, 12 Aug 2014 22:44:58 +0000
Content-Type	text/plain; charset=ISO-8859-1
Content-Transfer-Encoding	quoted-printable
X-Received-Bytes	4017
X-Received-Body-CRC	1676937530
Xref	csiph.com comp.lang.python:76154

Show key headers only | View raw

On Tuesday, August 12, 2014 9:00:30 PM UTC+1, Simon Evans wrote:
> Dear Programmers,
> 
> I have been looking at the You tube 'Web Scraping Tutorials' of Chris Reeves. I have tried a few of his python programs in the Python27 command prompt, but altered them from accessing data using links say from the Dow Jones index, to accessing the details I would be interested in accessing from the 'Racing Post' on a daily basis. Anyhow, the code it returns is not in the example I am going to give, is not the information I am seeking, instead of returning the given odds on a horse, it only returns a [], which isn't much use. 
> 
> I would be glad if you could tell me where I am going wrong. 
> 
> Yours faithfully
> 
> Simon Evans.
> 
> --------------------------------------------------------------------------------
> 
> >>>import urllib
> 
> >>>import re
> 
> >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd?
> 
> 
> 
> race_id=600048r_date=2014-05-08#raceTabs=sc_")
> 
> htmltext = htmlfile.read()
> 
> regex = '<strong>1<a href="http://www.racingpost.com/horses/horse_home.sd?
> 
> 
> 
> horse_id=758752"onclick="scorecards.send(&quot;horse_name&quot:):return Html.popup(this,
> 
> 
> 
> {width:695,height:800})"title="Full details about this HORSE">Lively 
> 
> 
> 
> Baron</a>9/4F</strong><br/>'
> 
> >>>pattern = re.compile(regex)
> 
> >>>odds=re.findall(pattern,htmltext)
> 
> >>>print odds
> 
> []
> 
> >>>
> 
> --------------------------------------------------------------------------------
> 
> >>>import urllib
> 
> >>>import re
> 
> >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd?
> 
> 
> 
> >>>race_id=600048r_date=2014-05-08#raceTabs=sc_")
> 
> >>>htmltext = htmlfile.read()
> 
> >>>regex = '<a></a>'
> 
> >>>pattern = re.compile(regex)
> 
> >>>odds=re.findall(pattern,htmltext)
> 
> >>>print odds
> 
> []
> 
> >>>
> 
> -------------------------------------------------------------------------------
Dear Programmers, Thank you for your responses. I have installed 'Beautiful Soup' and I have the 'Getting Started in Beautiful Soup' book, but can't seem to make  any progress with it, I am too thick to make much use of it. I was hoping I could scrape specified stuff off Web pages without using it. I have installed 'Requests' also, is there any code I can use that you can suggest that can access the sort of Web page values that I have referred to ?  such as odds, names of runners, stuff like that off the 'inspect element' or 'source' htaml pages, on www.Racingpost.com.

Thread

Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 13:00 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Rob Gaddi <rgaddi@technologyhighland.invalid> - 2014-08-12 13:11 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 17:28 -0400
    Re: Suitable Python code to scrape specific details from  web pages. alex23 <wuwei23@gmail.com> - 2014-08-18 15:04 +1000
  Re: Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 15:44 -0700
    Re: Suitable Python code to scrape specific details from  web pages. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-13 10:04 +1000
      Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 20:30 -0400
    Re: Suitable Python code to scrape specific details from  web pages. Peter Pearson <ppearson@nowhere.invalid> - 2014-08-13 00:50 +0000
  Re: Suitable Python code to scrape specific details from  web pages. Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-13 14:53 +0000

csiph-web