Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #99682

Re: Does Python allow variables to be passed into function for dynamic screen scraping?

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Laura Creighton <lac@openend.se>
Newsgroups comp.lang.python
Subject Re: Does Python allow variables to be passed into function for dynamic screen scraping?
Date Sat, 28 Nov 2015 23:44:21 +0100
Lines 71
Message-ID <mailman.2.1448750672.14615.python-list@python.org> (permalink)
References <e13afc4b-ac4e-4a75-bca6-1c7be9399cb6@googlegroups.com> <mailman.1.1448749716.14615.python-list@python.org> <48f7bb74-93f0-4bf8-b781-e7f4b2daf032@googlegroups.com>
Mime-Version 1.0
Content-Type text/plain; charset="us-ascii"
Content-Transfer-Encoding quoted-printable
X-Trace news.uni-berlin.de lFuUwMtS3ryJcKsDVxNedwtX+paG8UD2Lxy2RDCw61lg==
Return-Path <lac@openend.se>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'subject:Python': 0.05; 'none,': 0.05; 'valueerror:': 0.07; 'cc:addr:python-list': 0.09; 'creighton': 0.09; 'received:openend.se': 0.09; 'received:theraft.openend.se': 0.09; 'script,': 0.09; 'subject:Does': 0.09; 'subject:into': 0.09; 'python': 0.10; 'skip:p 40': 0.15; 'variables': 0.15; '>on': 0.16; 'cc:addr:lac': 0.16; 'cc:addr:openend.se': 0.16; 'forth.': 0.16; 'from:addr:lac': 0.16; 'from:addr:openend.se': 0.16; 'from:name:laura creighton': 0.16; "isn't.": 0.16; 'message-id:@fido.openend.se': 0.16; 'received:fido': 0.16; 'received:fido.openend.se': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'subject:allow': 0.16; 'subject:screen': 0.16; 'traceback.': 0.16; 'url.': 0.16; 'wrote:': 0.16; 'laura': 0.18; '2015': 0.20; 'cc:addr:python.org': 0.20; 'work,': 0.21; 'cc:2**1': 0.22; 'function,': 0.22; 'parse': 0.22; 'parsing': 0.22; 'pass': 0.22; 'code,': 0.23; '(or': 0.23; '(like': 0.23; 'errors': 0.23; 'sat,': 0.23; 'import': 0.24; 'somewhere': 0.24; 'requests': 0.25; 'wondering': 0.25; 'script': 0.25; 'error': 0.27; 'function': 0.28; 'this.': 0.28; 'actual': 0.28; 'received:se': 0.29; 'url:wikipedia': 0.29; 'cc:no real name:2**1': 0.29; 'objects': 0.29; "i'm": 0.30; 'url:wiki': 0.30; 'code': 0.30; 'skip:g 30': 0.30; 'probably': 0.31; 'post': 0.31; 'table': 0.32; 'getting': 0.33; 'traceback': 0.33; 'info': 0.34; 'i.e.': 0.35; 'nov': 0.35; 'skip:> 10': 0.35; 'something': 0.35; 'but': 0.36; 'url:org': 0.36; 'subject:?': 0.36; 'subject:: ': 0.37; 'expect': 0.37; 'skip:s 50': 0.37; 'thought': 0.37; 'charset :us-ascii': 0.37; 'names': 0.38; 'skip:p 20': 0.38; 'someone': 0.38; 'data': 0.39; 'url:en': 0.39; 'along': 0.39; 'your': 0.60; 'share': 0.61; 'header:Message-Id:1': 0.61; 'caused': 0.61; 'saturday,': 0.63; 'sample': 0.63; 'information': 0.63; 'webpage': 0.66; 'cut': 0.67; 'results.': 0.67; 'email,': 0.69; 'online': 0.71; '>from': 0.76; 'saw': 0.77; '2.7.': 0.84; '>def': 0.84; '>if': 0.84; 'header:In-reply-to:1': 0.84
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=openend.se; s=default; t=1448750663; bh=Ru0QzEPpZ+17AD/fDjkCg3fJ/eulMaLOUPdZzzHAG54=; h=To:cc:From:Subject:In-reply-to:References:Date:From; b=nJix54C/ZNs0IhLcO1pgLE519SDfXm46ybTeh39dAHYzneLXDqj1ucwdaSGGiCbnr yGqw2q7PZzIGJWj1UsZa9PwD6HcTd4oxp2Nb7FJH7aPc+fQqN+fwUmxwYVbsY/L8Bi jMPDJsSkixdU8dD52OpDRPitAasczyvjjmkfz9EE=
In-reply-to <48f7bb74-93f0-4bf8-b781-e7f4b2daf032@googlegroups.com>
Comments In-reply-to ryguy7272 <ryanshuell@gmail.com> message dated "Sat, 28 Nov 2015 14:37:26 -0800."
Content-ID <26164.1448750661.1@fido>
X-Greylist Sender IP whitelisted, not delayed by milter-greylist-4.3.9 (theraft.openend.se [82.96.5.2]); Sat, 28 Nov 2015 23:44:23 +0100 (CET)
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:99682

Show key headers only | View raw


In a message of Sat, 28 Nov 2015 14:37:26 -0800, ryguy7272 writes:
>On Saturday, November 28, 2015 at 5:28:55 PM UTC-5, Laura Creighton wrote:
>> In a message of Sat, 28 Nov 2015 14:03:10 -0800, ryguy7272 writes:
>> >I'm looking at this URL.
>> >https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names
>> >
>> >If I hit F12 I can see tags such as these:
>> ><a title=
>> ><a class=
>> >And so on and so forth.  
>> >
>> >I'm wondering if someone can share a script, or a function, that will allow me to pass in variables and download (or simply print) the results.  I saw a sample online that I thought would work, and I made a few modifications but now I keep getting a message that says: ValueError: All objects passed were None
>> >
>> >Here's the script that I'm playing around with.
>> >
>> >import requests
>> >import pandas as pd
>> >from bs4 import BeautifulSoup
>> >
>> >#Get the relevant webpage set the data up for parsing
>> >url = "https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names"
>> >r = requests.get(url)
>> >soup=BeautifulSoup(r.content,"lxml")
>> >
>> >#set up a function to parse the "soup" for each category of information and put it in a DataFrame
>> >def get_match_info(soup,tag,class_name):
>> >    info_array=[]
>> >    for info in soup.find_all('%s'%tag,attrs={'class':'%s'%class_name}):
>> >        return pd.DataFrame(info_array)
>> >
>> >#for each category pass the above function the relevant information i.e. tag names
>> >tag1 = get_match_info(soup,"td","title")
>> >tag2 = get_match_info(soup,"td","class")
>> >
>> >#Concatenate the DataFrames to present a final table of all the above info 
>> >match_info = pd.concat([tag1,tag2],ignore_index=False,axis=1)
>> >
>> >print match_info
>> >
>> >I'd greatly appreciate any help with this.
>> 
>> Post your error traceback.  If you are getting Value Errors about None,
>> then probably something you expect to return a match, isn't.  But without
>> the actual error, we cannot help much.
>> 
>> Laura
>
>
>Ok.  How do I post the error traceback?  I'm using Spyder Python 2.7.

You cut and paste it out of wherever you are reading it, and paste it
into the email, along with your code, also cut and pasted from somewhere
(like an editor).  That way we get the exact code that caused the exact
traceback you are getting.

Laura

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Does Python allow variables to be passed into function for dynamic screen scraping? ryguy7272 <ryanshuell@gmail.com> - 2015-11-28 14:03 -0800
  Re: Does Python allow variables to be passed into function for dynamic screen scraping? Laura Creighton <lac@openend.se> - 2015-11-28 23:28 +0100
    Re: Does Python allow variables to be passed into function for dynamic screen scraping? ryguy7272 <ryanshuell@gmail.com> - 2015-11-28 14:37 -0800
      Re: Does Python allow variables to be passed into function for dynamic screen scraping? Laura Creighton <lac@openend.se> - 2015-11-28 23:44 +0100
  Re: Does Python allow variables to be passed into function for dynamic screen scraping? Steven D'Aprano <steve@pearwood.info> - 2015-11-29 12:58 +1100
    Re: Does Python allow variables to be passed into function for dynamic screen scraping? ryguy7272 <ryanshuell@gmail.com> - 2015-11-28 20:52 -0800

csiph-web