Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <34998ea2-6b19-4a98-8ea0-389aca0192ca@googlegroups.com>
References: <34998ea2-6b19-4a98-8ea0-389aca0192ca@googlegroups.com>
From: Michael Herman <hermanmu@gmail.com>
Date: Thu, 21 Feb 2013 04:59:26 -0800
Subject: Re: Urllib's urlopen and urlretrieve
To: qoresucks@gmail.com
Content-Type: multipart/alternative; boundary=f46d043894633a0e5e04d63ba6e7
Cc: python-list@python.org
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.2162.1361451589.2939.python-list@python.org>
Lines: 123
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:39421

--f46d043894633a0e5e04d63ba6e7
Content-Type: text/plain; charset=ISO-8859-1

Are you just trying to get the html? If so, you can use this code-

*import urllib*
*
*
*# fetch the and download a webpage, nameing it test.html*
*urllib.urlretrieve("http://www.web2py.com/", filename="test.html")*


I recommend using the requests library, as it's easier to use and more
powerful:

*import requests*

*# retrive the webpage
r = requests.get("http://www.web2py.com/")*

*# write the content to test_request.html
with open("test_requests.html", "wb") as code:
*

*code.write(r.content)*


If you want to get up to speed quickly on internet programming, I have a
course I am developing. It's on kickstarter - http://kck.st/VQj8hq. The
first section of the book dives into web fundamentals and internet
programming.


On Thu, Feb 21, 2013 at 4:12 AM, <qoresucks@gmail.com> wrote:

> I only just started Python and given that I know nothing about network
> programming or internet programming of any kind really, I thought it would
> be interesting to try write something that could create an archive of a
> website for myself. With this I started trying to use the urllib library,
> however I am having a problem understanding why certain things wont work
> with the urllib.urlretrieve and urllib.urlopen then reading.
>
> Why is it that when using urllib.urlopen then reading or
> urllib.urlretrieve, does it only give me parts of the sites, loosing the
> formatting, images, etc...? How can I get around this?
>
> Lastly, while its a bit off topic, I lack a good understanding of network
> programming as a whole. From making programs communicate or to simply
> extract data from URL's, I don't know where to even begin, which has lead
> me to learning python to better understand it hopefully then carry it over
> to other languages I know. Can anyone give me some advice on where to begin
> learning this information? Even if its in another language.
> --
> http://mail.python.org/mailman/listinfo/python-list
>

--f46d043894633a0e5e04d63ba6e7
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Are you just trying to get the html? If so, you can use this code-<div><br>=
</div><div><div><i>import urllib</i></div><div><i><br></i></div><div><i># f=
etch the and download a webpage, nameing it test.html</i></div><div><i>urll=
ib.urlretrieve(&quot;<a href=3D"http://www.web2py.com/">http://www.web2py.c=
om/</a>&quot;, filename=3D&quot;test.html&quot;)</i></div>

<div><br></div><div><br></div><div>I recommend using the requests library, =
as it&#39;s easier to use and more powerful:</div><div>







<p class=3D"p1"><i>import requests</i></p>
<p class=3D"p2"><i># retrive the webpage<br>r =3D requests.get(&quot;<a hre=
f=3D"http://www.web2py.com/">http://www.web2py.com/</a>&quot;)</i></p>
<p class=3D"p2"><i># write the content to test_request.html<br>with open(&q=
uot;test_requests.html&quot;, &quot;wb&quot;) as code: =A0 <br></i></p></di=
v></div><blockquote style=3D"margin:0 0 0 40px;border:none;padding:0px"><di=
v>

<div><p class=3D"p2"><i>code.write(r.content)</i></p><p class=3D"p2"><br></=
p></div></div></blockquote>If you want to get up to speed quickly on intern=
et programming, I have a course I am developing. It&#39;s on kickstarter -=
=A0<a href=3D"http://kck.st/VQj8hq">http://kck.st/VQj8hq</a>. The first sec=
tion of the book dives into web fundamentals and internet programming.=A0<b=
r>

<div><div><br></div><br><div class=3D"gmail_quote">On Thu, Feb 21, 2013 at =
4:12 AM,  <span dir=3D"ltr">&lt;<a href=3D"mailto:qoresucks@gmail.com" targ=
et=3D"_blank">qoresucks@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex">

I only just started Python and given that I know nothing about network prog=
ramming or internet programming of any kind really, I thought it would be i=
nteresting to try write something that could create an archive of a website=
 for myself. With this I started trying to use the urllib library, however =
I am having a problem understanding why certain things wont work with the u=
rllib.urlretrieve and urllib.urlopen then reading.<br>


<br>
Why is it that when using urllib.urlopen then reading or urllib.urlretrieve=
, does it only give me parts of the sites, loosing the formatting, images, =
etc...? How can I get around this?<br>
<br>
Lastly, while its a bit off topic, I lack a good understanding of network p=
rogramming as a whole. From making programs communicate or to simply extrac=
t data from URL&#39;s, I don&#39;t know where to even begin, which has lead=
 me to learning python to better understand it hopefully then carry it over=
 to other languages I know. Can anyone give me some advice on where to begi=
n learning this information? Even if its in another language.<br>


<span class=3D"HOEnZb"><font color=3D"#888888">--<br>
<a href=3D"http://mail.python.org/mailman/listinfo/python-list" target=3D"_=
blank">http://mail.python.org/mailman/listinfo/python-list</a><br>
</font></span></blockquote></div><br></div>

--f46d043894633a0e5e04d63ba6e7--