Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #76148
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.fsmpi.rwth-aachen.de!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail |
|---|---|
| From | Rob Gaddi <rgaddi@technologyhighland.invalid> |
| Newsgroups | comp.lang.python |
| Subject | Re: Suitable Python code to scrape specific details from web pages. |
| Date | Tue, 12 Aug 2014 13:11:47 -0700 |
| Organization | Highland Technology, Inc. |
| Lines | 48 |
| Message-ID | <20140812131147.5c99507c@rg.highlandtechnology.com> (permalink) |
| References | <a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3@googlegroups.com> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=US-ASCII |
| Content-Transfer-Encoding | 7bit |
| Injection-Info | mx05.eternal-september.org; posting-host="903ac420d4384e8fcf51b0ca3b6abd1b"; logging-data="29605"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18SVQYsdrfUvXOma5dWYmax" |
| X-Newsreader | Claws Mail 3.9.3 (GTK+ 2.24.23; x86_64-pc-linux-gnu) |
| Cancel-Lock | sha1:uryMt1gSp0e2hZIOI3PFfSjf5mI= |
| Xref | csiph.com comp.lang.python:76148 |
Show key headers only | View raw
On Tue, 12 Aug 2014 13:00:30 -0700 (PDT)
Simon Evans <musicalhacksaw@yahoo.co.uk> wrote:
> Dear Programmers,
> I have been looking at the You tube 'Web Scraping Tutorials' of Chris Reeves. I have tried a few of his python programs in the Python27 command prompt, but altered them from accessing data using links say from the Dow Jones index, to accessing the details I would be interested in accessing from the 'Racing Post' on a daily basis. Anyhow, the code it returns is not in the example I am going to give, is not the information I am seeking, instead of returning the given odds on a horse, it only returns a [], which isn't much use.
> I would be glad if you could tell me where I am going wrong.
> Yours faithfully
> Simon Evans.
> --------------------------------------------------------------------------------
> >>>import urllib
> >>>import re
> >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd?
>
> race_id=600048r_date=2014-05-08#raceTabs=sc_")
> htmltext = htmlfile.read()
> regex = '<strong>1<a href="http://www.racingpost.com/horses/horse_home.sd?
>
> horse_id=758752"onclick="scorecards.send("horse_name":):return Html.popup(this,
>
> {width:695,height:800})"title="Full details about this HORSE">Lively
>
> Baron</a>9/4F</strong><br/>'
> >>>pattern = re.compile(regex)
> >>>odds=re.findall(pattern,htmltext)
> >>>print odds
> []
> >>>
> --------------------------------------------------------------------------------
> >>>import urllib
> >>>import re
> >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd?
>
> >>>race_id=600048r_date=2014-05-08#raceTabs=sc_")
> >>>htmltext = htmlfile.read()
> >>>regex = '<a></a>'
> >>>pattern = re.compile(regex)
> >>>odds=re.findall(pattern,htmltext)
> >>>print odds
> []
> >>>
> -------------------------------------------------------------------------------
If you want web scraping, you want to use
http://www.crummy.com/software/BeautifulSoup/ . End of story.
--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Suitable Python code to scrape specific details from web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 13:00 -0700
Re: Suitable Python code to scrape specific details from web pages. Rob Gaddi <rgaddi@technologyhighland.invalid> - 2014-08-12 13:11 -0700
Re: Suitable Python code to scrape specific details from web pages. Roy Smith <roy@panix.com> - 2014-08-12 17:28 -0400
Re: Suitable Python code to scrape specific details from web pages. alex23 <wuwei23@gmail.com> - 2014-08-18 15:04 +1000
Re: Suitable Python code to scrape specific details from web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 15:44 -0700
Re: Suitable Python code to scrape specific details from web pages. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-13 10:04 +1000
Re: Suitable Python code to scrape specific details from web pages. Roy Smith <roy@panix.com> - 2014-08-12 20:30 -0400
Re: Suitable Python code to scrape specific details from web pages. Peter Pearson <ppearson@nowhere.invalid> - 2014-08-13 00:50 +0000
Re: Suitable Python code to scrape specific details from web pages. Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-13 14:53 +0000
csiph-web