X-Received: by 10.31.11.75 with SMTP id 72mr23880078vkl.2.1448484121356; Wed, 25 Nov 2015 12:42:01 -0800 (PST) X-Received: by 10.50.78.134 with SMTP id b6mr187323igx.4.1448484121321; Wed, 25 Nov 2015 12:42:01 -0800 (PST) Path: csiph.com!optima2.xanadu-bbs.net!xanadu-bbs.net!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!peer03.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!f78no2682326qge.1!news-out.google.com!l1ni8806igd.0!nntp.google.com!mv3no3583583igc.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.python Date: Wed, 25 Nov 2015 12:42:00 -0800 (PST) Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=155.201.35.66; posting-account=QHCkKAoAAAAtwxtoSlGaj-ksHegzSKUu NNTP-Posting-Host: 155.201.35.66 User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <23ed6f4b-0ef2-4c9e-ade6-e597e7e03ca2@googlegroups.com> Subject: Screen scraper to get all 'a title' elements From: ryguy7272 Injection-Date: Wed, 25 Nov 2015 20:42:01 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Received-Bytes: 2667 X-Received-Body-CRC: 3978769299 Xref: csiph.com comp.lang.python:99484 Hello experts. I'm looking at this url: https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names I'm trying to figure out how to list all 'a title' elements. For instance,= I see the following: Accident<= /a> Ala-Lemu Alert Apocalypse Pe= aks So, I tried putting a script together to get 'title'. Here's my attempt. import requests import sys from bs4 import BeautifulSoup url =3D "https://en.wikipedia.org/wiki/Wikipedia:Unusual_place_names" = =20 source_code =3D requests.get(url)=20 plain_text =3D source_code.text soup =3D BeautifulSoup(plain_text) for link in soup.findAll('title'): print(link) All that does is get the title of the page. I tried to get the links from = that url, with this script. import urllib2 import re #connect to a URL website =3D urllib2.urlopen('https://en.wikipedia.org/wiki/Wikipedia:Unusua= l_place_names') #read html code html =3D website.read() #use re.findall to get all the links links =3D re.findall('"((http|ftp)s?://.*?)"', html) print links That doesn't work wither. Basically, I'd like to see this. Accident Ala-Lemu Alert Apocalypse Peaks Athol =C5 Barbecue B=E5stad Bastardstown Batman Bathmen (Battem), Netherlands ... Worms Yell Zigzag Zzyzx How can I do that? Thanks all!!