Suitable Python code to scrape specific details from web pages.

Newsgroups	comp.lang.python
Date	2014-08-12 13:00 -0700
Message-ID	<a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3@googlegroups.com> (permalink)
Subject	Suitable Python code to scrape specific details from web pages.
From	Simon Evans <musicalhacksaw@yahoo.co.uk>

Show all headers | View raw

Dear Programmers,
I have been looking at the You tube 'Web Scraping Tutorials' of Chris Reeves. I have tried a few of his python programs in the Python27 command prompt, but altered them from accessing data using links say from the Dow Jones index, to accessing the details I would be interested in accessing from the 'Racing Post' on a daily basis. Anyhow, the code it returns is not in the example I am going to give, is not the information I am seeking, instead of returning the given odds on a horse, it only returns a [], which isn't much use. 
I would be glad if you could tell me where I am going wrong. 
Yours faithfully
Simon Evans.
--------------------------------------------------------------------------------
>>>import urllib
>>>import re
>>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd?

race_id=600048r_date=2014-05-08#raceTabs=sc_")
htmltext = htmlfile.read()
regex = '<strong>1<a href="http://www.racingpost.com/horses/horse_home.sd?

horse_id=758752"onclick="scorecards.send(&quot;horse_name&quot:):return Html.popup(this,

{width:695,height:800})"title="Full details about this HORSE">Lively 

Baron</a>9/4F</strong><br/>'
>>>pattern = re.compile(regex)
>>>odds=re.findall(pattern,htmltext)
>>>print odds
[]
>>>
--------------------------------------------------------------------------------
>>>import urllib
>>>import re
>>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd?

>>>race_id=600048r_date=2014-05-08#raceTabs=sc_")
>>>htmltext = htmlfile.read()
>>>regex = '<a></a>'
>>>pattern = re.compile(regex)
>>>odds=re.findall(pattern,htmltext)
>>>print odds
[]
>>>
-------------------------------------------------------------------------------

Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread

Thread

Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 13:00 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Rob Gaddi <rgaddi@technologyhighland.invalid> - 2014-08-12 13:11 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 17:28 -0400
    Re: Suitable Python code to scrape specific details from  web pages. alex23 <wuwei23@gmail.com> - 2014-08-18 15:04 +1000
  Re: Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 15:44 -0700
    Re: Suitable Python code to scrape specific details from  web pages. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-13 10:04 +1000
      Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 20:30 -0400
    Re: Suitable Python code to scrape specific details from  web pages. Peter Pearson <ppearson@nowhere.invalid> - 2014-08-13 00:50 +0000
  Re: Suitable Python code to scrape specific details from  web pages. Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-13 14:53 +0000

csiph-web