Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Thu, 31 Oct 2013 17:36:25 +0000
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: how to extract page-URL using BeautifulSoup
References: <7fb9c035-c663-4874-9597-ac47d1c30da7@googlegroups.com>
In-Reply-To: <7fb9c035-c663-4874-9597-ac47d1c30da7@googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Reply-To: python-list@python.org
Newsgroups: comp.lang.python
Message-ID: <mailman.1886.1383240984.18130.python-list@python.org>
Lines: 21
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:58195

On 31/10/2013 15:59, bhaktanishant@gmail.com wrote:
> I want to extract the page-url. for example:
> if i have this code
>
> import urllib2
> from bs4 import BeautifulSoup
> link = "http://www.google.com"
> page = urllib2.urlopen(link).read()
> soup = BeautifulSoup(page)
>
> then i can extract title of page by:
>
> title = soup.title
>
> but i want to know that how to extract page-URL from "soup" that will be "http://www.google.com"
>
Have a look at what you're passing to BeautifulSoup (save it to a file
and look at it in an editor). It's HTML. Does it contain anything that
says where it came from? No. So BeautifulSoup can't know either.

All BeautifulSoup does is parse the HTML that it's given.