Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #99678
| X-Received | by 10.98.42.82 with SMTP id q79mr38115393pfq.2.1448748191574; Sat, 28 Nov 2015 14:03:11 -0800 (PST) |
|---|---|
| X-Received | by 10.50.79.233 with SMTP id m9mr187586igx.8.1448748191539; Sat, 28 Nov 2015 14:03:11 -0800 (PST) |
| Path | csiph.com!optima2.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!mv3no6514676igc.0!news-out.google.com!f6ni14263igq.0!nntp.google.com!mv3no6514674igc.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail |
| Newsgroups | comp.lang.python |
| Date | Sat, 28 Nov 2015 14:03:10 -0800 (PST) |
| Complaints-To | groups-abuse@google.com |
| Injection-Info | glegroupsg2000goo.googlegroups.com; posting-host=69.206.246.110; posting-account=QHCkKAoAAAAtwxtoSlGaj-ksHegzSKUu |
| NNTP-Posting-Host | 69.206.246.110 |
| User-Agent | G2/1.0 |
| MIME-Version | 1.0 |
| Message-ID | <e13afc4b-ac4e-4a75-bca6-1c7be9399cb6@googlegroups.com> (permalink) |
| Subject | Does Python allow variables to be passed into function for dynamic screen scraping? |
| From | ryguy7272 <ryanshuell@gmail.com> |
| Injection-Date | Sat, 28 Nov 2015 22:03:11 +0000 |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | quoted-printable |
| Xref | csiph.com comp.lang.python:99678 |
Show key headers only | View raw
I'm looking at this URL.
https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names
If I hit F12 I can see tags such as these:
<a title=
<a class=
And so on and so forth.
I'm wondering if someone can share a script, or a function, that will allow me to pass in variables and download (or simply print) the results. I saw a sample online that I thought would work, and I made a few modifications but now I keep getting a message that says: ValueError: All objects passed were None
Here's the script that I'm playing around with.
import requests
import pandas as pd
from bs4 import BeautifulSoup
#Get the relevant webpage set the data up for parsing
url = "https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names"
r = requests.get(url)
soup=BeautifulSoup(r.content,"lxml")
#set up a function to parse the "soup" for each category of information and put it in a DataFrame
def get_match_info(soup,tag,class_name):
info_array=[]
for info in soup.find_all('%s'%tag,attrs={'class':'%s'%class_name}):
return pd.DataFrame(info_array)
#for each category pass the above function the relevant information i.e. tag names
tag1 = get_match_info(soup,"td","title")
tag2 = get_match_info(soup,"td","class")
#Concatenate the DataFrames to present a final table of all the above info
match_info = pd.concat([tag1,tag2],ignore_index=False,axis=1)
print match_info
I'd greatly appreciate any help with this.
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
Does Python allow variables to be passed into function for dynamic screen scraping? ryguy7272 <ryanshuell@gmail.com> - 2015-11-28 14:03 -0800
Re: Does Python allow variables to be passed into function for dynamic screen scraping? Laura Creighton <lac@openend.se> - 2015-11-28 23:28 +0100
Re: Does Python allow variables to be passed into function for dynamic screen scraping? ryguy7272 <ryanshuell@gmail.com> - 2015-11-28 14:37 -0800
Re: Does Python allow variables to be passed into function for dynamic screen scraping? Laura Creighton <lac@openend.se> - 2015-11-28 23:44 +0100
Re: Does Python allow variables to be passed into function for dynamic screen scraping? Steven D'Aprano <steve@pearwood.info> - 2015-11-29 12:58 +1100
Re: Does Python allow variables to be passed into function for dynamic screen scraping? ryguy7272 <ryanshuell@gmail.com> - 2015-11-28 20:52 -0800
csiph-web