Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #76154

Re: Suitable Python code to scrape specific details from web pages.

X-Received by 10.224.45.131 with SMTP id e3mr521429qaf.1.1407883498444; Tue, 12 Aug 2014 15:44:58 -0700 (PDT)
X-Received by 10.140.32.227 with SMTP id h90mr8068qgh.26.1407883498428; Tue, 12 Aug 2014 15:44:58 -0700 (PDT)
Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!peer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!j15no6667950qaq.0!news-out.google.com!b3ni24360qac.1!nntp.google.com!j15no6667947qaq.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups comp.lang.python
Date Tue, 12 Aug 2014 15:44:58 -0700 (PDT)
In-Reply-To <a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3@googlegroups.com>
Complaints-To groups-abuse@google.com
Injection-Info glegroupsg2000goo.googlegroups.com; posting-host=146.90.214.3; posting-account=59tTfwoAAACIDa2nz1oVlQJc3aCJi_5b
NNTP-Posting-Host 146.90.214.3
References <a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3@googlegroups.com>
User-Agent G2/1.0
MIME-Version 1.0
Message-ID <e2011de5-10fa-4de1-89fa-4e41882a6646@googlegroups.com> (permalink)
Subject Re: Suitable Python code to scrape specific details from web pages.
From Simon Evans <musicalhacksaw@yahoo.co.uk>
Injection-Date Tue, 12 Aug 2014 22:44:58 +0000
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding quoted-printable
X-Received-Bytes 4017
X-Received-Body-CRC 1676937530
Xref csiph.com comp.lang.python:76154

Show key headers only | View raw


On Tuesday, August 12, 2014 9:00:30 PM UTC+1, Simon Evans wrote:
> Dear Programmers,
> 
> I have been looking at the You tube 'Web Scraping Tutorials' of Chris Reeves. I have tried a few of his python programs in the Python27 command prompt, but altered them from accessing data using links say from the Dow Jones index, to accessing the details I would be interested in accessing from the 'Racing Post' on a daily basis. Anyhow, the code it returns is not in the example I am going to give, is not the information I am seeking, instead of returning the given odds on a horse, it only returns a [], which isn't much use. 
> 
> I would be glad if you could tell me where I am going wrong. 
> 
> Yours faithfully
> 
> Simon Evans.
> 
> --------------------------------------------------------------------------------
> 
> >>>import urllib
> 
> >>>import re
> 
> >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd?
> 
> 
> 
> race_id=600048r_date=2014-05-08#raceTabs=sc_")
> 
> htmltext = htmlfile.read()
> 
> regex = '<strong>1<a href="http://www.racingpost.com/horses/horse_home.sd?
> 
> 
> 
> horse_id=758752"onclick="scorecards.send(&quot;horse_name&quot:):return Html.popup(this,
> 
> 
> 
> {width:695,height:800})"title="Full details about this HORSE">Lively 
> 
> 
> 
> Baron</a>9/4F</strong><br/>'
> 
> >>>pattern = re.compile(regex)
> 
> >>>odds=re.findall(pattern,htmltext)
> 
> >>>print odds
> 
> []
> 
> >>>
> 
> --------------------------------------------------------------------------------
> 
> >>>import urllib
> 
> >>>import re
> 
> >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd?
> 
> 
> 
> >>>race_id=600048r_date=2014-05-08#raceTabs=sc_")
> 
> >>>htmltext = htmlfile.read()
> 
> >>>regex = '<a></a>'
> 
> >>>pattern = re.compile(regex)
> 
> >>>odds=re.findall(pattern,htmltext)
> 
> >>>print odds
> 
> []
> 
> >>>
> 
> -------------------------------------------------------------------------------
Dear Programmers, Thank you for your responses. I have installed 'Beautiful Soup' and I have the 'Getting Started in Beautiful Soup' book, but can't seem to make  any progress with it, I am too thick to make much use of it. I was hoping I could scrape specified stuff off Web pages without using it. I have installed 'Requests' also, is there any code I can use that you can suggest that can access the sort of Web page values that I have referred to ?  such as odds, names of runners, stuff like that off the 'inspect element' or 'source' htaml pages, on www.Racingpost.com. 

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 13:00 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Rob Gaddi <rgaddi@technologyhighland.invalid> - 2014-08-12 13:11 -0700
  Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 17:28 -0400
    Re: Suitable Python code to scrape specific details from  web pages. alex23 <wuwei23@gmail.com> - 2014-08-18 15:04 +1000
  Re: Suitable Python code to scrape specific details from  web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 15:44 -0700
    Re: Suitable Python code to scrape specific details from  web pages. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-13 10:04 +1000
      Re: Suitable Python code to scrape specific details from  web pages. Roy Smith <roy@panix.com> - 2014-08-12 20:30 -0400
    Re: Suitable Python code to scrape specific details from  web pages. Peter Pearson <ppearson@nowhere.invalid> - 2014-08-13 00:50 +0000
  Re: Suitable Python code to scrape specific details from  web pages. Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-13 14:53 +0000

csiph-web