Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #98595 > unrolled thread
| Started by | Ulli Horlacher <framstag@rus.uni-stuttgart.de> |
|---|---|
| First post | 2015-11-10 13:08 +0000 |
| Last post | 2015-11-10 17:21 +0000 |
| Articles | 8 — 2 participants |
Back to article view | Back to comp.lang.python
corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 13:08 +0000
Re: corrupt download with urllib2 Peter Otten <__peter__@web.de> - 2015-11-10 14:20 +0100
Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 13:43 +0000
Re: corrupt download with urllib2 Peter Otten <__peter__@web.de> - 2015-11-10 14:40 +0100
Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 13:59 +0000
Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 15:51 +0000
Re: corrupt download with urllib2 Peter Otten <__peter__@web.de> - 2015-11-10 17:48 +0100
Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 17:21 +0000
| From | Ulli Horlacher <framstag@rus.uni-stuttgart.de> |
|---|---|
| Date | 2015-11-10 13:08 +0000 |
| Subject | corrupt download with urllib2 |
| Message-ID | <n1sq82$k3g$1@news2.informatik.uni-stuttgart.de> |
I am currently developing a program which should run on Linux and Windows.
Later it shall be compiled with PyInstaller. Therefore I am using Python 2.7
My program must download http://fex.belwue.de/download/7za.exe
I am using this code:
sz = path.join(fexhome,'7za.exe')
szurl = "http://fex.belwue.de/download/7za.exe"
try:
szo = open(sz,'w')
except (IOError,OSError) as e:
die('cannot write %s - %s' % (sz,e.strerror))
import urllib2
printf("\ndownloading %s\n",szurl)
try:
req = urllib2.Request(szurl)
req.add_header('User-Agent',useragent)
u = urllib2.urlopen(req)
except urllib2.URLError as e:
die('cannot get %s - %s' % (szurl,e.reason))
except urllib2.HTTPError as e:
die('cannot get %s - server reply: %d %s' % (szurl,e.code,e.reason))
if u.getcode() == 200:
print(u.read(),file=szo,end='')
szo.close()
else:
die('cannot get %s - server reply: %d' % (szurl,u.getcode()))
It works with Linux, but not with Windows 7, where the downloaded 7za.exe is
corrupt: it has the wrong size, 589044 instead of 587776 Bytes.
Where is my error?
--
Ullrich Horlacher Server und Virtualisierung
Rechenzentrum IZUS/TIK E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart Tel: ++49-711-68565868
Allmandring 30a Fax: ++49-711-682357
70550 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/
[toc] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-11-10 14:20 +0100 |
| Message-ID | <mailman.209.1447161643.16136.python-list@python.org> |
| In reply to | #98595 |
Ulli Horlacher wrote:
> I am currently developing a program which should run on Linux and Windows.
> Later it shall be compiled with PyInstaller. Therefore I am using Python
> 2.7
>
> My program must download http://fex.belwue.de/download/7za.exe
>
> I am using this code:
> It works with Linux, but not with Windows 7, where the downloaded 7za.exe
> is corrupt: it has the wrong size, 589044 instead of 587776 Bytes.
>
> Where is my error?
> sz = path.join(fexhome,'7za.exe')
> szurl = "http://fex.belwue.de/download/7za.exe"
>
> try:
> szo = open(sz,'w')
Open the file in binary mode to avoid the translation of "\n" into "\r\n":
szo = open(sz, 'wb')
> except (IOError,OSError) as e:
> die('cannot write %s - %s' % (sz,e.strerror))
Unrelated, but I recommend that you let the exceptions bubble up for easier
debugging.
Python is not Perl ;)
> import urllib2
> printf("\ndownloading %s\n",szurl)
> try:
> req = urllib2.Request(szurl)
> req.add_header('User-Agent',useragent)
> u = urllib2.urlopen(req)
> except urllib2.URLError as e:
> die('cannot get %s - %s' % (szurl,e.reason))
> except urllib2.HTTPError as e:
> die('cannot get %s - server reply: %d %s' % (szurl,e.code,e.reason))
> if u.getcode() == 200:
> print(u.read(),file=szo,end='')
> szo.close()
> else:
> die('cannot get %s - server reply: %d' % (szurl,u.getcode()))
>
[toc] | [prev] | [next] | [standalone]
| From | Ulli Horlacher <framstag@rus.uni-stuttgart.de> |
|---|---|
| Date | 2015-11-10 13:43 +0000 |
| Message-ID | <n1ss9f$kvp$1@news2.informatik.uni-stuttgart.de> |
| In reply to | #98596 |
Peter Otten <__peter__@web.de> wrote:
> > It works with Linux, but not with Windows 7, where the downloaded 7za.exe
> > is corrupt: it has the wrong size, 589044 instead of 587776 Bytes.
> >
> > Where is my error?
>
> > sz = path.join(fexhome,'7za.exe')
> > szurl = "http://fex.belwue.de/download/7za.exe"
> >
> > try:
> > szo = open(sz,'w')
>
> Open the file in binary mode to avoid the translation of "\n" into "\r\n":
>
> szo = open(sz, 'wb')
Damn.. I should have known this!
Ok, now it works like on Linux. Windows is such a *BEEEP* *CENSORED*
> > except (IOError,OSError) as e:
> > die('cannot write %s - %s' % (sz,e.strerror))
>
> Unrelated, but I recommend that you let the exceptions bubble up for easier
> debugging.
die() is my debugging function :-)
> Python is not Perl ;)
*sigh* This is the problem ;-)
I am a Perl programmer for more than 25 years...
--
Ullrich Horlacher Server und Virtualisierung
Rechenzentrum IZUS/TIK E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart Tel: ++49-711-68565868
Allmandring 30a Fax: ++49-711-682357
70550 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-11-10 14:40 +0100 |
| Message-ID | <mailman.210.1447162874.16136.python-list@python.org> |
| In reply to | #98595 |
Ulli Horlacher wrote:
> if u.getcode() == 200:
> print(u.read(),file=szo,end='')
> szo.close()
> else:
> die('cannot get %s - server reply: %d' % (szurl,u.getcode()))
More random remarks:
- print() gives the impression that you are dealing with text, and using it
with binary strings will produce surprising results when you migrate to
Python 3:
Python 2:
>>> from __future__ import print_function
>>> print(b"foo")
foo
Python 3:
>>> print(b"foo")
b'foo'
- with open(...) ensures that the file is closed when an exception occurs.
It doesn't matter here as your script is going to die() anyway, but using
with is a got habit to get into.
- consider shutil.copyfileobj to limit memory usage when dealing with data
of arbitrary size.
Putting it together:
with open(sz, "wb") as szo:
shutil.copyfileobj(u, szo)
[toc] | [prev] | [next] | [standalone]
| From | Ulli Horlacher <framstag@rus.uni-stuttgart.de> |
|---|---|
| Date | 2015-11-10 13:59 +0000 |
| Message-ID | <n1st8h$l75$1@news2.informatik.uni-stuttgart.de> |
| In reply to | #98598 |
Peter Otten <__peter__@web.de> wrote:
> Ulli Horlacher wrote:
>
> > if u.getcode() == 200:
> > print(u.read(),file=szo,end='')
> > szo.close()
> > else:
> > die('cannot get %s - server reply: %d' % (szurl,u.getcode()))
>
> More random remarks:
Always welcome - I am here to learn :-)
> - print() gives the impression that you are dealing with text, and using it
> with binary strings will produce surprising results when you migrate to
> Python 3:
>
> Python 2:
>
> >>> from __future__ import print_function
I already have this in my code, to make a later transition to Python 3
easier.
> >>> print(b"foo")
> foo
>
> Python 3:
>
> >>> print(b"foo")
> b'foo'
Bad.
Is there a better alternative to write arbitrary binary data?
> - with open(...) ensures that the file is closed when an exception occurs.
> It doesn't matter here as your script is going to die() anyway, but using
> with is a got habit to get into.
When an error occurs I do want to write more data, anyway.
> - consider shutil.copyfileobj to limit memory usage when dealing with data
> of arbitrary size.
>
> Putting it together:
>
> with open(sz, "wb") as szo:
> shutil.copyfileobj(u, szo)
This writes the http stream binary to the file. without handling it
manually chunk by chunk?
Great. This would be my next task! You are answering my questions, before
I ask them! :-)
Background: I am rewriting my Perl program fexsend in Python.
fexsend transfers files up to TB range, see:
http://fex.rus.uni-stuttgart.de/
--
Ullrich Horlacher Server und Virtualisierung
Rechenzentrum IZUS/TIK E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart Tel: ++49-711-68565868
Allmandring 30a Fax: ++49-711-682357
70550 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/
[toc] | [prev] | [next] | [standalone]
| From | Ulli Horlacher <framstag@rus.uni-stuttgart.de> |
|---|---|
| Date | 2015-11-10 15:51 +0000 |
| Message-ID | <n1t3p9$nq3$1@news2.informatik.uni-stuttgart.de> |
| In reply to | #98600 |
Ulli Horlacher <framstag@rus.uni-stuttgart.de> wrote: > Peter Otten <__peter__@web.de> wrote: > > - consider shutil.copyfileobj to limit memory usage when dealing with data > > of arbitrary size. > > > > Putting it together: > > > > with open(sz, "wb") as szo: > > shutil.copyfileobj(u, szo) > > This writes the http stream binary to the file. without handling it > manually chunk by chunk? I have a problem with it: There is no feedback for the user about the progress of the transfer, which can last several hours. For small files shutil.copyfileobj() is a good idea, but not for huge ones. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum IZUS/TIK E-Mail: horlacher@tik.uni-stuttgart.de Universitaet Stuttgart Tel: ++49-711-68565868 Allmandring 30a Fax: ++49-711-682357 70550 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/
[toc] | [prev] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-11-10 17:48 +0100 |
| Message-ID | <mailman.212.1447174160.16136.python-list@python.org> |
| In reply to | #98602 |
Ulli Horlacher wrote:
> Ulli Horlacher <framstag@rus.uni-stuttgart.de> wrote:
>> Peter Otten <__peter__@web.de> wrote:
>
>> > - consider shutil.copyfileobj to limit memory usage when dealing with
>> > data
>> > of arbitrary size.
>> >
>> > Putting it together:
>> >
>> > with open(sz, "wb") as szo:
>> > shutil.copyfileobj(u, szo)
>>
>> This writes the http stream binary to the file. without handling it
>> manually chunk by chunk?
>
> I have a problem with it: There is no feedback for the user about the
> progress of the transfer, which can last several hours.
>
> For small files shutil.copyfileobj() is a good idea, but not for huge
> ones.
Indeed. Have a look at the source code:
def copyfileobj(fsrc, fdst, length=16*1024):
"""copy data from file-like object fsrc to file-like object fdst"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)
As simple as can be. I suggested the function as an alternative to writing
the loop yourself when your example code basically showed
dest.write(source.read())
For the huge downloads that you intend to cater to you probably want your
script not just to print a dot on every iteration, you need expected
remaining time, checksums, ability to stop and resume a download and
whatnot.
Does the Perl code offer that? Then why rewrite?
Or are there Python libraries that do that out of the box? Can you reuse
them?
[toc] | [prev] | [next] | [standalone]
| From | Ulli Horlacher <framstag@rus.uni-stuttgart.de> |
|---|---|
| Date | 2015-11-10 17:21 +0000 |
| Message-ID | <n1t92k$q1o$1@news2.informatik.uni-stuttgart.de> |
| In reply to | #98603 |
Peter Otten <__peter__@web.de> wrote: > > I have a problem with it: There is no feedback for the user about the > > progress of the transfer, which can last several hours. > > > > For small files shutil.copyfileobj() is a good idea, but not for huge > > ones. > > Indeed. Have a look at the source code: > > def copyfileobj(fsrc, fdst, length=16*1024): > """copy data from file-like object fsrc to file-like object fdst""" > while 1: > buf = fsrc.read(length) > if not buf: > break > fdst.write(buf) > > As simple as can be Oooops - that's all?! > I suggested the function as an alternative to writing > the loop yourself when your example code basically showed Good idea :-) > For the huge downloads that you intend to cater to you probably want your > script not just to print a dot on every iteration, you need expected > remaining time, checksums, ability to stop and resume a download and > whatnot. > > Does the Perl code offer that? Of course, yes. For download AND upload. > Then why rewrite? There is no more a Perl compiler for windows which supports https. > Or are there Python libraries that do that out of the box? No. -- Ullrich Horlacher Server und Virtualisierung Rechenzentrum IZUS/TIK E-Mail: horlacher@tik.uni-stuttgart.de Universitaet Stuttgart Tel: ++49-711-68565868 Allmandring 30a Fax: ++49-711-682357 70550 Stuttgart (Germany) WWW: http://www.tik.uni-stuttgart.de/
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web