Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #58195
| Path | csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <python@mrabarnett.plus.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.018 |
| X-Spam-Evidence | '*H*': 0.96; '*S*': 0.00; 'example:': 0.03; 'urllib2': 0.07; 'subject:using': 0.09; 'from:addr:mrabarnett.plus.com': 0.16; 'from:addr:python': 0.16; 'from:name:mrab': 0.16; 'message-id:@mrabarnett.plus.com': 0.16; 'received:84.93': 0.16; 'received:84.93.230': 0.16; 'soup': 0.16; 'subject:URL': 0.16; 'wrote:': 0.18; 'passing': 0.19; 'subject:page': 0.19; 'import': 0.22; 'header:User-Agent:1': 0.23; 'either.': 0.24; 'parse': 0.24; 'header:In-Reply-To:1': 0.27; 'code': 0.31; 'extract': 0.31; 'file': 0.32; 'says': 0.33; "can't": 0.35; 'skip:u 20': 0.35; 'received:84': 0.35; 'but': 0.35; 'to:addr:python-list': 0.38; 'anything': 0.39; 'does': 0.39; 'to:addr:python.org': 0.39; 'how': 0.40; 'no.': 0.61; "you're": 0.61; 'email addr:gmail.com': 0.63; 'by:': 0.65; 'header:Reply- To:1': 0.67; 'reply-to:no real name:2**0': 0.71; 'given.': 0.84; 'reply-to:addr:python.org': 0.84 |
| X-CM-Score | 0.00 |
| X-CNFS-Analysis | v=2.1 cv=PIY2p5aC c=1 sm=1 tr=0 a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=0Bzu9jTXAAAA:8 a=2LCq-pQfA8cA:10 a=9kofGqYBuHkA:10 a=ihvODaAuJD4A:10 a=OUOv7kDek9cA:10 a=8nJEP1OIZ-IA:10 a=EBOSESyhAAAA:8 a=8AHkEIZyAAAA:8 a=4iCm8Fta_dUA:10 a=pGLkceISAAAA:8 a=1XWaLZrsAAAA:8 a=06uMngI17m7Im08IZCUA:9 a=wPNLvfGTeEIA:10 a=MSl-tDqOz04A:10 |
| X-AUTH | mrabarnett:2500 |
| Date | Thu, 31 Oct 2013 17:36:25 +0000 |
| From | MRAB <python@mrabarnett.plus.com> |
| User-Agent | Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 |
| MIME-Version | 1.0 |
| To | python-list@python.org |
| Subject | Re: how to extract page-URL using BeautifulSoup |
| References | <7fb9c035-c663-4874-9597-ac47d1c30da7@googlegroups.com> |
| In-Reply-To | <7fb9c035-c663-4874-9597-ac47d1c30da7@googlegroups.com> |
| Content-Type | text/plain; charset=ISO-8859-1; format=flowed |
| Content-Transfer-Encoding | 7bit |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| Reply-To | python-list@python.org |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1886.1383240984.18130.python-list@python.org> (permalink) |
| Lines | 21 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1383240984 news.xs4all.nl 15956 [2001:888:2000:d::a6]:41326 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:58195 |
Show key headers only | View raw
On 31/10/2013 15:59, bhaktanishant@gmail.com wrote: > I want to extract the page-url. for example: > if i have this code > > import urllib2 > from bs4 import BeautifulSoup > link = "http://www.google.com" > page = urllib2.urlopen(link).read() > soup = BeautifulSoup(page) > > then i can extract title of page by: > > title = soup.title > > but i want to know that how to extract page-URL from "soup" that will be "http://www.google.com" > Have a look at what you're passing to BeautifulSoup (save it to a file and look at it in an editor). It's HTML. Does it contain anything that says where it came from? No. So BeautifulSoup can't know either. All BeautifulSoup does is parse the HTML that it's given.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
how to extract page-URL using BeautifulSoup bhaktanishant@gmail.com - 2013-10-31 08:59 -0700
Re: how to extract page-URL using BeautifulSoup MRAB <python@mrabarnett.plus.com> - 2013-10-31 17:36 +0000
Re: how to extract page-URL using BeautifulSoup Alister <alister.ware@ntlworld.com> - 2013-11-01 11:33 +0000
Re: how to extract page-URL using BeautifulSoup Joel Goldstick <joel.goldstick@gmail.com> - 2013-11-01 10:14 -0400
csiph-web