Groups > comp.lang.python > #58188 > unrolled thread

how to extract page-URL using BeautifulSoup

Started by	bhaktanishant@gmail.com
First post	2013-10-31 08:59 -0700
Last post	2013-11-01 10:14 -0400
Articles	4 — 4 participants

Back to article view | Back to comp.lang.python

  how to extract page-URL using BeautifulSoup bhaktanishant@gmail.com - 2013-10-31 08:59 -0700
    Re: how to extract page-URL using BeautifulSoup MRAB <python@mrabarnett.plus.com> - 2013-10-31 17:36 +0000
    Re: how to extract page-URL using BeautifulSoup Alister <alister.ware@ntlworld.com> - 2013-11-01 11:33 +0000
      Re: how to extract page-URL using BeautifulSoup Joel Goldstick <joel.goldstick@gmail.com> - 2013-11-01 10:14 -0400

#58188 — how to extract page-URL using BeautifulSoup

From	bhaktanishant@gmail.com
Date	2013-10-31 08:59 -0700
Subject	how to extract page-URL using BeautifulSoup
Message-ID	<7fb9c035-c663-4874-9597-ac47d1c30da7@googlegroups.com>

I want to extract the page-url. for example:
if i have this code

import urllib2
from bs4 import BeautifulSoup
link = "http://www.google.com"
page = urllib2.urlopen(link).read()
soup = BeautifulSoup(page)

then i can extract title of page by:

title = soup.title

but i want to know that how to extract page-URL from "soup" that will be "http://www.google.com"

[toc] | [next] | [standalone]

#58195

From	MRAB <python@mrabarnett.plus.com>
Date	2013-10-31 17:36 +0000
Message-ID	<mailman.1886.1383240984.18130.python-list@python.org>
In reply to	#58188

On 31/10/2013 15:59, bhaktanishant@gmail.com wrote:
> I want to extract the page-url. for example:
> if i have this code
>
> import urllib2
> from bs4 import BeautifulSoup
> link = "http://www.google.com"
> page = urllib2.urlopen(link).read()
> soup = BeautifulSoup(page)
>
> then i can extract title of page by:
>
> title = soup.title
>
> but i want to know that how to extract page-URL from "soup" that will be "http://www.google.com"
>
Have a look at what you're passing to BeautifulSoup (save it to a file
and look at it in an editor). It's HTML. Does it contain anything that
says where it came from? No. So BeautifulSoup can't know either.

All BeautifulSoup does is parse the HTML that it's given.

[toc] | [prev] | [next] | [standalone]

#58257

From	Alister <alister.ware@ntlworld.com>
Date	2013-11-01 11:33 +0000
Message-ID	<VjMcu.4720$J14.740@fx31.am4>
In reply to	#58188

On Thu, 31 Oct 2013 08:59:00 -0700, bhaktanishant wrote:

> I want to extract the page-url. for example:
> if i have this code
> 
> import urllib2 from bs4 import BeautifulSoup link =
> "http://www.google.com"
> page = urllib2.urlopen(link).read()
> soup = BeautifulSoup(page)
> 
> then i can extract title of page by:
> 
> title = soup.title
> 
> but i want to know that how to extract page-URL from "soup" that will be
> "http://www.google.com"

I must be missing something here, the page url is what you use to open 
the page in the first place in your case link.




-- 
May a Misguided Platypus lay its Eggs in your Jockey Shorts.

[toc] | [prev] | [next] | [standalone]

#58268

From	Joel Goldstick <joel.goldstick@gmail.com>
Date	2013-11-01 10:14 -0400
Message-ID	<mailman.1929.1383315259.18130.python-list@python.org>
In reply to	#58257

This is nearly the same question you asked under another name
yesterday.  Its not clear what you really want to do.  You are asking
what the url is of the page you retrieve by providing the same url.

On Fri, Nov 1, 2013 at 7:33 AM, Alister <alister.ware@ntlworld.com> wrote:
> On Thu, 31 Oct 2013 08:59:00 -0700, bhaktanishant wrote:
>
>> I want to extract the page-url. for example:
>> if i have this code
>>
>> import urllib2 from bs4 import BeautifulSoup link =
>> "http://www.google.com"
>> page = urllib2.urlopen(link).read()
>> soup = BeautifulSoup(page)
>>
>> then i can extract title of page by:
>>
>> title = soup.title
>>
>> but i want to know that how to extract page-URL from "soup" that will be
>> "http://www.google.com"
>
> I must be missing something here, the page url is what you use to open
> the page in the first place in your case link.
>
>
>
>
> --
> May a Misguided Platypus lay its Eggs in your Jockey Shorts.
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
Joel Goldstick
http://joelgoldstick.com

[toc] | [prev] | [standalone]

csiph-web

how to extract page-URL using BeautifulSoup

Contents

#58188 — how to extract page-URL using BeautifulSoup

#58195

#58257

#58268