Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #99494

Re: Screen scraper to get all 'a title' elements

Newsgroups comp.lang.python
Date 2015-11-25 14:04 -0800
References <23ed6f4b-0ef2-4c9e-ade6-e597e7e03ca2@googlegroups.com>
Message-ID <a6f3a0a7-acc3-46db-a36b-c3d774293347@googlegroups.com> (permalink)
Subject Re: Screen scraper to get all 'a title' elements
From ryguy7272 <ryanshuell@gmail.com>

Show all headers | View raw


On Wednesday, November 25, 2015 at 3:42:21 PM UTC-5, ryguy7272 wrote:
> Hello experts.  I'm looking at this url:
> https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names
> 
> I'm trying to figure out how to list all 'a title' elements.  For instance, I see the following:
> <a title="Accident, Maryland" href="/wiki/Accident,_Maryland">Accident</a>
> <a class="new" title="Ala-Lemu (page does not exist)" href="/w/index.php?title=Ala-Lemu&action=edit&redlink=1">Ala-Lemu</a>
> <a title="Alert, Nunavut" href="/wiki/Alert,_Nunavut">Alert</a>
> <a title="Apocalypse Peaks" href="/wiki/Apocalypse_Peaks">Apocalypse Peaks</a>
> 
> So, I tried putting a script together to get 'title'.  Here's my attempt.
> 
> import requests
> import sys
> from bs4 import BeautifulSoup
> 
> url = "https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names"     
> source_code = requests.get(url) 
> plain_text = source_code.text
> soup = BeautifulSoup(plain_text)
> for link in soup.findAll('title'):
>     print(link)
> 
> All that does is get the title of the page.  I tried to get the links from that url, with this script.
> 
> import urllib2
> import re
> 
> #connect to a URL
> website = urllib2.urlopen('https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names')
> 
> #read html code
> html = website.read()
> 
> #use re.findall to get all the links
> links = re.findall('"((http|ftp)s?://.*?)"', html)
> 
> print links
> 
> That doesn't work wither.  Basically, I'd like to see this.
> 
> Accident
> Ala-Lemu
> Alert
> Apocalypse Peaks
> Athol
> Å
> Barbecue
> Båstad
> Bastardstown
> Batman
> Bathmen (Battem), Netherlands
> ...
> Worms
> Yell
> Zigzag
> Zzyzx
> 
> How can I do that?
> Thanks all!!



Ok, I guess that makes sense.  So, I just tried the script below, and got nothing...

import requests
from bs4 import BeautifulSoup

r = requests.get("https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names")
soup = BeautifulSoup(r.content)
print soup.find_all("a",{"title"})

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Screen scraper to get all 'a title' elements ryguy7272 <ryanshuell@gmail.com> - 2015-11-25 12:42 -0800
  Re: Screen scraper to get all 'a title' elements MRAB <python@mrabarnett.plus.com> - 2015-11-25 20:55 +0000
    Re: Screen scraper to get all 'a title' elements Grobu <snailcoder@retrosite.invalid> - 2015-11-25 23:30 +0100
      Re: Screen scraper to get all 'a title' elements ryguy7272 <ryanshuell@gmail.com> - 2015-11-25 14:48 -0800
        Re: Screen scraper to get all 'a title' elements Chris Angelico <rosuav@gmail.com> - 2015-11-26 10:06 +1100
          Re: Screen scraper to get all 'a title' elements Grobu <snailcoder@retrosite.invalid> - 2015-11-26 00:44 +0100
            Re: Screen scraper to get all 'a title' elements Marko Rauhamaa <marko@pacujo.net> - 2015-11-26 01:53 +0200
              Re: Screen scraper to get all 'a title' elements Chris Angelico <rosuav@gmail.com> - 2015-11-26 10:59 +1100
            Re: Screen scraper to get all 'a title' elements Chris Angelico <rosuav@gmail.com> - 2015-11-26 10:54 +1100
            Re: Screen scraper to get all 'a title' elements Grobu <snailcoder@retrosite.invalid> - 2015-11-26 02:05 +0100
        Re: Screen scraper to get all 'a title' elements Grobu <snailcoder@retrosite.invalid> - 2015-11-26 00:33 +0100
          Re: Screen scraper to get all 'a title' elements ryguy7272 <ryanshuell@gmail.com> - 2015-11-25 15:37 -0800
            Re: Screen scraper to get all 'a title' elements Chris Angelico <rosuav@gmail.com> - 2015-11-26 10:42 +1100
  Re: Screen scraper to get all 'a title' elements ryguy7272 <ryanshuell@gmail.com> - 2015-11-25 14:04 -0800
    Re: Screen scraper to get all 'a title' elements Chris Angelico <rosuav@gmail.com> - 2015-11-26 09:10 +1100
  Re: Screen scraper to get all 'a title' elements TP <wingusr@gmail.com> - 2015-11-25 17:15 -0800
  Re: Screen scraper to get all 'a title' elements Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-26 14:49 +0000

csiph-web