Groups > comp.lang.python > #98595 > unrolled thread

corrupt download with urllib2

Started by	Ulli Horlacher <framstag@rus.uni-stuttgart.de>
First post	2015-11-10 13:08 +0000
Last post	2015-11-10 17:21 +0000
Articles	8 — 2 participants

Back to article view | Back to comp.lang.python

  corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 13:08 +0000
    Re: corrupt download with urllib2 Peter Otten <__peter__@web.de> - 2015-11-10 14:20 +0100
      Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 13:43 +0000
    Re: corrupt download with urllib2 Peter Otten <__peter__@web.de> - 2015-11-10 14:40 +0100
      Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 13:59 +0000
        Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 15:51 +0000
          Re: corrupt download with urllib2 Peter Otten <__peter__@web.de> - 2015-11-10 17:48 +0100
            Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 17:21 +0000

#98595 — corrupt download with urllib2

From	Ulli Horlacher <framstag@rus.uni-stuttgart.de>
Date	2015-11-10 13:08 +0000
Subject	corrupt download with urllib2
Message-ID	<n1sq82$k3g$1@news2.informatik.uni-stuttgart.de>

I am currently developing a program which should run on Linux and Windows.
Later it shall be compiled with PyInstaller. Therefore I am using Python 2.7

My program must download http://fex.belwue.de/download/7za.exe

I am using this code:

    sz = path.join(fexhome,'7za.exe')
    szurl = "http://fex.belwue.de/download/7za.exe"

   try:
      szo = open(sz,'w')
    except (IOError,OSError) as e:
      die('cannot write %s - %s' % (sz,e.strerror))
    import urllib2
    printf("\ndownloading %s\n",szurl)
    try:
      req = urllib2.Request(szurl)
      req.add_header('User-Agent',useragent)
      u = urllib2.urlopen(req)
    except urllib2.URLError as e:
      die('cannot get %s - %s' % (szurl,e.reason))
    except urllib2.HTTPError as e:
      die('cannot get %s - server reply: %d %s' % (szurl,e.code,e.reason))
    if u.getcode() == 200:
      print(u.read(),file=szo,end='')
      szo.close()
    else:
      die('cannot get %s - server reply: %d' % (szurl,u.getcode()))

It works with Linux, but not with Windows 7, where the downloaded 7za.exe is
corrupt: it has the wrong size, 589044 instead of 587776 Bytes.

Where is my error?


-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum IZUS/TIK         E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart         Tel:    ++49-711-68565868
Allmandring 30a                Fax:    ++49-711-682357
70550 Stuttgart (Germany)      WWW:    http://www.tik.uni-stuttgart.de/

[toc] | [next] | [standalone]

#98596

From	Peter Otten <__peter__@web.de>
Date	2015-11-10 14:20 +0100
Message-ID	<mailman.209.1447161643.16136.python-list@python.org>
In reply to	#98595

Ulli Horlacher wrote:

> I am currently developing a program which should run on Linux and Windows.
> Later it shall be compiled with PyInstaller. Therefore I am using Python
> 2.7
> 
> My program must download http://fex.belwue.de/download/7za.exe
> 
> I am using this code:

> It works with Linux, but not with Windows 7, where the downloaded 7za.exe
> is corrupt: it has the wrong size, 589044 instead of 587776 Bytes.
> 
> Where is my error?

>     sz = path.join(fexhome,'7za.exe')
>     szurl = "http://fex.belwue.de/download/7za.exe"
> 
>    try:
>       szo = open(sz,'w')

Open the file in binary mode to avoid the translation of "\n" into "\r\n":

        szo = open(sz, 'wb')

>     except (IOError,OSError) as e:
>       die('cannot write %s - %s' % (sz,e.strerror))

Unrelated, but I recommend that you let the exceptions bubble up for easier 
debugging.

Python is not Perl ;)

>     import urllib2
>     printf("\ndownloading %s\n",szurl)
>     try:
>       req = urllib2.Request(szurl)
>       req.add_header('User-Agent',useragent)
>       u = urllib2.urlopen(req)
>     except urllib2.URLError as e:
>       die('cannot get %s - %s' % (szurl,e.reason))
>     except urllib2.HTTPError as e:
>       die('cannot get %s - server reply: %d %s' % (szurl,e.code,e.reason))
>     if u.getcode() == 200:
>       print(u.read(),file=szo,end='')
>       szo.close()
>     else:
>       die('cannot get %s - server reply: %d' % (szurl,u.getcode()))
>

[toc] | [prev] | [next] | [standalone]

#98599

From	Ulli Horlacher <framstag@rus.uni-stuttgart.de>
Date	2015-11-10 13:43 +0000
Message-ID	<n1ss9f$kvp$1@news2.informatik.uni-stuttgart.de>
In reply to	#98596

Peter Otten <__peter__@web.de> wrote:


> > It works with Linux, but not with Windows 7, where the downloaded 7za.exe
> > is corrupt: it has the wrong size, 589044 instead of 587776 Bytes.
> > 
> > Where is my error?
> 
> >     sz = path.join(fexhome,'7za.exe')
> >     szurl = "http://fex.belwue.de/download/7za.exe"
> > 
> >    try:
> >       szo = open(sz,'w')
> 
> Open the file in binary mode to avoid the translation of "\n" into "\r\n":
> 
>         szo = open(sz, 'wb')

Damn.. I should have known this!

Ok, now it works like on Linux. Windows is such a *BEEEP* *CENSORED*


> >     except (IOError,OSError) as e:
> >       die('cannot write %s - %s' % (sz,e.strerror))
> 
> Unrelated, but I recommend that you let the exceptions bubble up for easier 
> debugging.

die() is my debugging function :-)


> Python is not Perl ;)

*sigh* This is the problem ;-)
I am a Perl programmer for more than 25 years... 

-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum IZUS/TIK         E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart         Tel:    ++49-711-68565868
Allmandring 30a                Fax:    ++49-711-682357
70550 Stuttgart (Germany)      WWW:    http://www.tik.uni-stuttgart.de/

[toc] | [prev] | [next] | [standalone]

#98598

From	Peter Otten <__peter__@web.de>
Date	2015-11-10 14:40 +0100
Message-ID	<mailman.210.1447162874.16136.python-list@python.org>
In reply to	#98595

Ulli Horlacher wrote:

>     if u.getcode() == 200:
>       print(u.read(),file=szo,end='')
>       szo.close()
>     else:
>       die('cannot get %s - server reply: %d' % (szurl,u.getcode()))

More random remarks:

- print() gives the impression that you are dealing with text, and using it
  with binary strings will produce surprising results when you migrate to
  Python 3:

Python 2:

>>> from __future__ import print_function
>>> print(b"foo")
foo

Python 3:

>>> print(b"foo")
b'foo'

- with open(...) ensures that the file is closed when an exception occurs.
  It doesn't matter here as your script is going to die() anyway, but using
  with is a got habit to get into.
- consider shutil.copyfileobj to limit memory usage when dealing with data
  of arbitrary size.

Putting it together:

    with open(sz, "wb") as szo:
        shutil.copyfileobj(u, szo)

[toc] | [prev] | [next] | [standalone]

#98600

From	Ulli Horlacher <framstag@rus.uni-stuttgart.de>
Date	2015-11-10 13:59 +0000
Message-ID	<n1st8h$l75$1@news2.informatik.uni-stuttgart.de>
In reply to	#98598

Peter Otten <__peter__@web.de> wrote:
> Ulli Horlacher wrote:
> 
> >     if u.getcode() == 200:
> >       print(u.read(),file=szo,end='')
> >       szo.close()
> >     else:
> >       die('cannot get %s - server reply: %d' % (szurl,u.getcode()))
> 
> More random remarks:

Always welcome - I am here to learn :-)


> - print() gives the impression that you are dealing with text, and using it
>   with binary strings will produce surprising results when you migrate to
>   Python 3:
> 
> Python 2:
> 
> >>> from __future__ import print_function

I already have this in my code, to make a later transition to Python 3
easier.


> >>> print(b"foo")
> foo
> 
> Python 3:
> 
> >>> print(b"foo")
> b'foo'

Bad.
Is there a better alternative to write arbitrary binary data?


> - with open(...) ensures that the file is closed when an exception occurs.
>   It doesn't matter here as your script is going to die() anyway, but using
>   with is a got habit to get into.

When an error occurs I do want to write more data, anyway.


> - consider shutil.copyfileobj to limit memory usage when dealing with data
>   of arbitrary size.
> 
> Putting it together:
> 
>     with open(sz, "wb") as szo:
>         shutil.copyfileobj(u, szo)

This writes the http stream binary to the file. without handling it
manually chunk by chunk?

Great. This would be my next task! You are answering my questions, before
I ask them! :-)

Background: I am rewriting my Perl program fexsend in Python.
fexsend transfers files up to TB range, see:
http://fex.rus.uni-stuttgart.de/



-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum IZUS/TIK         E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart         Tel:    ++49-711-68565868
Allmandring 30a                Fax:    ++49-711-682357
70550 Stuttgart (Germany)      WWW:    http://www.tik.uni-stuttgart.de/

[toc] | [prev] | [next] | [standalone]

#98602

From	Ulli Horlacher <framstag@rus.uni-stuttgart.de>
Date	2015-11-10 15:51 +0000
Message-ID	<n1t3p9$nq3$1@news2.informatik.uni-stuttgart.de>
In reply to	#98600

Ulli Horlacher <framstag@rus.uni-stuttgart.de> wrote:
> Peter Otten <__peter__@web.de> wrote:

> > - consider shutil.copyfileobj to limit memory usage when dealing with data
> >   of arbitrary size.
> > 
> > Putting it together:
> > 
> >     with open(sz, "wb") as szo:
> >         shutil.copyfileobj(u, szo)
> 
> This writes the http stream binary to the file. without handling it
> manually chunk by chunk?

I have a problem with it: There is no feedback for the user about the
progress of the transfer, which can last several hours.

For small files shutil.copyfileobj() is a good idea, but not for huge ones.


-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum IZUS/TIK         E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart         Tel:    ++49-711-68565868
Allmandring 30a                Fax:    ++49-711-682357
70550 Stuttgart (Germany)      WWW:    http://www.tik.uni-stuttgart.de/

[toc] | [prev] | [next] | [standalone]

#98603

From	Peter Otten <__peter__@web.de>
Date	2015-11-10 17:48 +0100
Message-ID	<mailman.212.1447174160.16136.python-list@python.org>
In reply to	#98602

Ulli Horlacher wrote:

> Ulli Horlacher <framstag@rus.uni-stuttgart.de> wrote:
>> Peter Otten <__peter__@web.de> wrote:
> 
>> > - consider shutil.copyfileobj to limit memory usage when dealing with
>> > data
>> >   of arbitrary size.
>> > 
>> > Putting it together:
>> > 
>> >     with open(sz, "wb") as szo:
>> >         shutil.copyfileobj(u, szo)
>> 
>> This writes the http stream binary to the file. without handling it
>> manually chunk by chunk?
> 
> I have a problem with it: There is no feedback for the user about the
> progress of the transfer, which can last several hours.
> 
> For small files shutil.copyfileobj() is a good idea, but not for huge
> ones.

Indeed. Have a look at the source code:

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

As simple as can be. I suggested the function as an alternative to writing 
the loop yourself when your example code basically showed

dest.write(source.read())

For the huge downloads that you intend to cater to you probably want your 
script not just to print a dot on every iteration, you need expected 
remaining time, checksums, ability to stop and resume a download and 
whatnot.

Does the Perl code offer that? Then why rewrite?

Or are there Python libraries that do that out of the box? Can you reuse 
them?

[toc] | [prev] | [next] | [standalone]

#98604

From	Ulli Horlacher <framstag@rus.uni-stuttgart.de>
Date	2015-11-10 17:21 +0000
Message-ID	<n1t92k$q1o$1@news2.informatik.uni-stuttgart.de>
In reply to	#98603

Peter Otten <__peter__@web.de> wrote:

> > I have a problem with it: There is no feedback for the user about the
> > progress of the transfer, which can last several hours.
> > 
> > For small files shutil.copyfileobj() is a good idea, but not for huge
> > ones.
> 
> Indeed. Have a look at the source code:
> 
> def copyfileobj(fsrc, fdst, length=16*1024):
>     """copy data from file-like object fsrc to file-like object fdst"""
>     while 1:
>         buf = fsrc.read(length)
>         if not buf:
>             break
>         fdst.write(buf)
> 
> As simple as can be

Oooops - that's all?!


> I suggested the function as an alternative to writing 
> the loop yourself when your example code basically showed

Good idea :-)


> For the huge downloads that you intend to cater to you probably want your 
> script not just to print a dot on every iteration, you need expected 
> remaining time, checksums, ability to stop and resume a download and 
> whatnot.
> 
> Does the Perl code offer that?

Of course, yes. For download AND upload.



> Then why rewrite?

There is no more a Perl compiler for windows which supports https.



> Or are there Python libraries that do that out of the box?

No.


-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum IZUS/TIK         E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart         Tel:    ++49-711-68565868
Allmandring 30a                Fax:    ++49-711-682357
70550 Stuttgart (Germany)      WWW:    http://www.tik.uni-stuttgart.de/

[toc] | [prev] | [standalone]

csiph-web

corrupt download with urllib2

Contents

#98595 — corrupt download with urllib2

#98596

#98599

#98598

#98600

#98602

#98603

#98604