Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #98603

Re: corrupt download with urllib2

Path csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From Peter Otten <__peter__@web.de>
Newsgroups comp.lang.python
Subject Re: corrupt download with urllib2
Date Tue, 10 Nov 2015 17:48:48 +0100
Organization None
Lines 50
Message-ID <mailman.212.1447174160.16136.python-list@python.org> (permalink)
References <n1sq82$k3g$1@news2.informatik.uni-stuttgart.de> <mailman.210.1447162874.16136.python-list@python.org> <n1st8h$l75$1@news2.informatik.uni-stuttgart.de> <n1t3p9$nq3$1@news2.informatik.uni-stuttgart.de>
Mime-Version 1.0
Content-Type text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding 7Bit
X-Trace news.uni-berlin.de spjXXQTBhBoSZWXfpLxfwQNmw9ScP2isQEaF7lO68lWA==
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'binary': 0.05; 'that?': 0.05; 'chunk': 0.07; 'remaining': 0.07; 'file-like': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python': 0.10; 'def': 0.13; 'box?': 0.16; 'iteration,': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'subject:download': 0.16; 'wrote:': 0.16; 'memory': 0.17; 'basically': 0.18; 'suggested': 0.20; 'libraries': 0.22; 'ones.': 0.22; 'file.': 0.22; 'script': 0.25; 'header:User-Agent:1': 0.26; 'example': 0.26; 'header:X-Complaints-To:1': 0.26; 'handling': 0.27; 'function': 0.28; 'idea,': 0.29; 'perl': 0.29; 'them?': 0.29; 'code:': 0.29; 'print': 0.30; 'code': 0.30; 'putting': 0.30; 'writes': 0.30; 'probably': 0.31; 'problem': 0.33; 'source': 0.33; 'http': 0.33; 'stream': 0.33; 'skip:d 20': 0.34; 'downloads': 0.35; 'expected': 0.35; 'but': 0.36; 'there': 0.36; 'to:addr :python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'progress': 0.38; 'feedback': 0.38; 'several': 0.38; 'files': 0.38; 'why': 0.39; 'data': 0.39; 'does': 0.39; 'to:addr:python.org': 0.40; 'subject:with': 0.40; 'received:de': 0.40; 'your': 0.60; 'limit': 0.65; 'reuse': 0.66; 'offer': 0.66; 'yourself': 0.73; 'cater': 0.84; 'otten': 0.84
X-Injected-Via-Gmane http://gmane.org/
X-Gmane-NNTP-Posting-Host p57bd8e07.dip0.t-ipconnect.de
User-Agent KNode/4.13.3
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:98603

Show key headers only | View raw


Ulli Horlacher wrote:

> Ulli Horlacher <framstag@rus.uni-stuttgart.de> wrote:
>> Peter Otten <__peter__@web.de> wrote:
> 
>> > - consider shutil.copyfileobj to limit memory usage when dealing with
>> > data
>> >   of arbitrary size.
>> > 
>> > Putting it together:
>> > 
>> >     with open(sz, "wb") as szo:
>> >         shutil.copyfileobj(u, szo)
>> 
>> This writes the http stream binary to the file. without handling it
>> manually chunk by chunk?
> 
> I have a problem with it: There is no feedback for the user about the
> progress of the transfer, which can last several hours.
> 
> For small files shutil.copyfileobj() is a good idea, but not for huge
> ones.

Indeed. Have a look at the source code:

def copyfileobj(fsrc, fdst, length=16*1024):
    """copy data from file-like object fsrc to file-like object fdst"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)

As simple as can be. I suggested the function as an alternative to writing 
the loop yourself when your example code basically showed

dest.write(source.read())

For the huge downloads that you intend to cater to you probably want your 
script not just to print a dot on every iteration, you need expected 
remaining time, checksums, ability to stop and resume a download and 
whatnot.

Does the Perl code offer that? Then why rewrite?

Or are there Python libraries that do that out of the box? Can you reuse 
them?


Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 13:08 +0000
  Re: corrupt download with urllib2 Peter Otten <__peter__@web.de> - 2015-11-10 14:20 +0100
    Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 13:43 +0000
  Re: corrupt download with urllib2 Peter Otten <__peter__@web.de> - 2015-11-10 14:40 +0100
    Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 13:59 +0000
      Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 15:51 +0000
        Re: corrupt download with urllib2 Peter Otten <__peter__@web.de> - 2015-11-10 17:48 +0100
          Re: corrupt download with urllib2 Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-11-10 17:21 +0000

csiph-web