X-Received: by 10.107.136.233 with SMTP id s102mr55326775ioi.21.1448750247363; Sat, 28 Nov 2015 14:37:27 -0800 (PST) X-Received: by 10.50.155.8 with SMTP id vs8mr188204igb.10.1448750247347; Sat, 28 Nov 2015 14:37:27 -0800 (PST) Path: csiph.com!optima2.xanadu-bbs.net!xanadu-bbs.net!news.glorb.com!mv3no6535570igc.0!news-out.google.com!l1ni331igd.0!nntp.google.com!mv3no5107324igc.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Sat, 28 Nov 2015 14:37:26 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=69.206.246.110; posting-account=QHCkKAoAAAAtwxtoSlGaj-ksHegzSKUu NNTP-Posting-Host: 69.206.246.110 References: User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <48f7bb74-93f0-4bf8-b781-e7f4b2daf032@googlegroups.com> Subject: Re: Does Python allow variables to be passed into function for dynamic screen scraping? From: ryguy7272 Injection-Date: Sat, 28 Nov 2015 22:37:27 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Xref: csiph.com comp.lang.python:99681 On Saturday, November 28, 2015 at 5:28:55 PM UTC-5, Laura Creighton wrote: > In a message of Sat, 28 Nov 2015 14:03:10 -0800, ryguy7272 writes: > >I'm looking at this URL. > >https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names > > > >If I hit F12 I can see tags such as these: > > > >And so on and so forth. =20 > > > >I'm wondering if someone can share a script, or a function, that will al= low me to pass in variables and download (or simply print) the results. I = saw a sample online that I thought would work, and I made a few modificatio= ns but now I keep getting a message that says: ValueError: All objects pass= ed were None > > > >Here's the script that I'm playing around with. > > > >import requests > >import pandas as pd > >from bs4 import BeautifulSoup > > > >#Get the relevant webpage set the data up for parsing > >url =3D "https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names" > >r =3D requests.get(url) > >soup=3DBeautifulSoup(r.content,"lxml") > > > >#set up a function to parse the "soup" for each category of information = and put it in a DataFrame > >def get_match_info(soup,tag,class_name): > > info_array=3D[] > > for info in soup.find_all('%s'%tag,attrs=3D{'class':'%s'%class_name}= ): > > return pd.DataFrame(info_array) > > > >#for each category pass the above function the relevant information i.e.= tag names > >tag1 =3D get_match_info(soup,"td","title") > >tag2 =3D get_match_info(soup,"td","class") > > > >#Concatenate the DataFrames to present a final table of all the above in= fo=20 > >match_info =3D pd.concat([tag1,tag2],ignore_index=3DFalse,axis=3D1) > > > >print match_info > > > >I'd greatly appreciate any help with this. >=20 > Post your error traceback. If you are getting Value Errors about None, > then probably something you expect to return a match, isn't. But without > the actual error, we cannot help much. >=20 > Laura Ok. How do I post the error traceback? I'm using Spyder Python 2.7.