Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #58195

Re: how to extract page-URL using BeautifulSoup

Path csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python@mrabarnett.plus.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.018
X-Spam-Evidence '*H*': 0.96; '*S*': 0.00; 'example:': 0.03; 'urllib2': 0.07; 'subject:using': 0.09; 'from:addr:mrabarnett.plus.com': 0.16; 'from:addr:python': 0.16; 'from:name:mrab': 0.16; 'message-id:@mrabarnett.plus.com': 0.16; 'received:84.93': 0.16; 'received:84.93.230': 0.16; 'soup': 0.16; 'subject:URL': 0.16; 'wrote:': 0.18; 'passing': 0.19; 'subject:page': 0.19; 'import': 0.22; 'header:User-Agent:1': 0.23; 'either.': 0.24; 'parse': 0.24; 'header:In-Reply-To:1': 0.27; 'code': 0.31; 'extract': 0.31; 'file': 0.32; 'says': 0.33; "can't": 0.35; 'skip:u 20': 0.35; 'received:84': 0.35; 'but': 0.35; 'to:addr:python-list': 0.38; 'anything': 0.39; 'does': 0.39; 'to:addr:python.org': 0.39; 'how': 0.40; 'no.': 0.61; "you're": 0.61; 'email addr:gmail.com': 0.63; 'by:': 0.65; 'header:Reply- To:1': 0.67; 'reply-to:no real name:2**0': 0.71; 'given.': 0.84; 'reply-to:addr:python.org': 0.84
X-CM-Score 0.00
X-CNFS-Analysis v=2.1 cv=PIY2p5aC c=1 sm=1 tr=0 a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=0Bzu9jTXAAAA:8 a=2LCq-pQfA8cA:10 a=9kofGqYBuHkA:10 a=ihvODaAuJD4A:10 a=OUOv7kDek9cA:10 a=8nJEP1OIZ-IA:10 a=EBOSESyhAAAA:8 a=8AHkEIZyAAAA:8 a=4iCm8Fta_dUA:10 a=pGLkceISAAAA:8 a=1XWaLZrsAAAA:8 a=06uMngI17m7Im08IZCUA:9 a=wPNLvfGTeEIA:10 a=MSl-tDqOz04A:10
X-AUTH mrabarnett:2500
Date Thu, 31 Oct 2013 17:36:25 +0000
From MRAB <python@mrabarnett.plus.com>
User-Agent Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version 1.0
To python-list@python.org
Subject Re: how to extract page-URL using BeautifulSoup
References <7fb9c035-c663-4874-9597-ac47d1c30da7@googlegroups.com>
In-Reply-To <7fb9c035-c663-4874-9597-ac47d1c30da7@googlegroups.com>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
Reply-To python-list@python.org
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1886.1383240984.18130.python-list@python.org> (permalink)
Lines 21
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1383240984 news.xs4all.nl 15956 [2001:888:2000:d::a6]:41326
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:58195

Show key headers only | View raw


On 31/10/2013 15:59, bhaktanishant@gmail.com wrote:
> I want to extract the page-url. for example:
> if i have this code
>
> import urllib2
> from bs4 import BeautifulSoup
> link = "http://www.google.com"
> page = urllib2.urlopen(link).read()
> soup = BeautifulSoup(page)
>
> then i can extract title of page by:
>
> title = soup.title
>
> but i want to know that how to extract page-URL from "soup" that will be "http://www.google.com"
>
Have a look at what you're passing to BeautifulSoup (save it to a file
and look at it in an editor). It's HTML. Does it contain anything that
says where it came from? No. So BeautifulSoup can't know either.

All BeautifulSoup does is parse the HTML that it's given.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

how to extract page-URL using BeautifulSoup bhaktanishant@gmail.com - 2013-10-31 08:59 -0700
  Re: how to extract page-URL using BeautifulSoup MRAB <python@mrabarnett.plus.com> - 2013-10-31 17:36 +0000
  Re: how to extract page-URL using BeautifulSoup Alister <alister.ware@ntlworld.com> - 2013-11-01 11:33 +0000
    Re: how to extract page-URL using BeautifulSoup Joel Goldstick <joel.goldstick@gmail.com> - 2013-11-01 10:14 -0400

csiph-web