Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #76158
| From | Peter Pearson <ppearson@nowhere.invalid> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Suitable Python code to scrape specific details from web pages. |
| Date | 2014-08-13 00:50 +0000 |
| Message-ID | <c4vr3fFd48mU1@mid.individual.net> (permalink) |
| References | <a8f10c4f-d4a0-48ed-ae92-2a43e9a094c3@googlegroups.com> <e2011de5-10fa-4de1-89fa-4e41882a6646@googlegroups.com> |
On Tue, 12 Aug 2014 15:44:58 -0700 (PDT), Simon Evans wrote:
[snip]
> Dear Programmers, Thank you for your responses. I have installed
> 'Beautiful Soup' and I have the 'Getting Started in Beautiful Soup'
> book, but can't seem to make any progress with it, I am too thick to
> make much use of it. I was hoping I could scrape specified stuff off
> Web pages without using it.
I've only used BeautifulSoup a little bit, and am no expert, but
with it one can do wonderfully complex things with simple code.
Perhaps you can find some examples online; this newsgroup sometimes
has awesome demonstrations of BS prowess.
At the risk of embarrassing myself in public, I'll show you some
code I wrote that scrapes data from a web page containing a
description of a drug. The drug's web page contains the desired
data in tags that look like this:
<input id="form-widgets-minconcentration" name="form.widgets.minconcentration"
class="text-widget float-field" value="1.0" type="text" />
The following code finds all these tags and builds a dict by which you
can lookup the "value" for any given "name".
from BeautifulSoup import BeautifulSoup as BS
...
def dump_drug_data(url):
"""Fetch data from one drug's URL and print selected fields in columns.
"""
contents = urllib2.urlopen(url=url).read()
soup = BS(contents)
inputs = soup.findAll("input")
input_dict = dict((i.get("name"), i.get("value")) for i in inputs)
print(" ".join(f.format(input_dict[n]) for f, n in (
("{0:5s}", "form.widgets.absorption_halflife"),
("{0:5s}", "form.widgets.elimination_halflife"),
("{0:5s}", "form.widgets.minconcentration"),
("{0:5s}", "form.widgets.maxconcentration"),
("{0:13s}", "form.widgets.title"),
)))
Try giving a more specific picture of your quest, and it's very
likely that people smarter than me will give you good help.
--
To email me, substitute nowhere->spamcop, invalid->net.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Suitable Python code to scrape specific details from web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 13:00 -0700
Re: Suitable Python code to scrape specific details from web pages. Rob Gaddi <rgaddi@technologyhighland.invalid> - 2014-08-12 13:11 -0700
Re: Suitable Python code to scrape specific details from web pages. Roy Smith <roy@panix.com> - 2014-08-12 17:28 -0400
Re: Suitable Python code to scrape specific details from web pages. alex23 <wuwei23@gmail.com> - 2014-08-18 15:04 +1000
Re: Suitable Python code to scrape specific details from web pages. Simon Evans <musicalhacksaw@yahoo.co.uk> - 2014-08-12 15:44 -0700
Re: Suitable Python code to scrape specific details from web pages. Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-08-13 10:04 +1000
Re: Suitable Python code to scrape specific details from web pages. Roy Smith <roy@panix.com> - 2014-08-12 20:30 -0400
Re: Suitable Python code to scrape specific details from web pages. Peter Pearson <ppearson@nowhere.invalid> - 2014-08-13 00:50 +0000
Re: Suitable Python code to scrape specific details from web pages. Denis McMahon <denismfmcmahon@gmail.com> - 2014-08-13 14:53 +0000
csiph-web