Path: csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: Peter Otten <__peter__@web.de> Newsgroups: comp.lang.python Subject: Re: corrupt download with urllib2 Date: Tue, 10 Nov 2015 17:48:48 +0100 Organization: None Lines: 50 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Trace: news.uni-berlin.de spjXXQTBhBoSZWXfpLxfwQNmw9ScP2isQEaF7lO68lWA== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'binary': 0.05; 'that?': 0.05; 'chunk': 0.07; 'remaining': 0.07; 'file-like': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python': 0.10; 'def': 0.13; 'box?': 0.16; 'iteration,': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'received:t-ipconnect.de': 0.16; 'subject:download': 0.16; 'wrote:': 0.16; 'memory': 0.17; 'basically': 0.18; 'suggested': 0.20; 'libraries': 0.22; 'ones.': 0.22; 'file.': 0.22; 'script': 0.25; 'header:User-Agent:1': 0.26; 'example': 0.26; 'header:X-Complaints-To:1': 0.26; 'handling': 0.27; 'function': 0.28; 'idea,': 0.29; 'perl': 0.29; 'them?': 0.29; 'code:': 0.29; 'print': 0.30; 'code': 0.30; 'putting': 0.30; 'writes': 0.30; 'probably': 0.31; 'problem': 0.33; 'source': 0.33; 'http': 0.33; 'stream': 0.33; 'skip:d 20': 0.34; 'downloads': 0.35; 'expected': 0.35; 'but': 0.36; 'there': 0.36; 'to:addr :python-list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'progress': 0.38; 'feedback': 0.38; 'several': 0.38; 'files': 0.38; 'why': 0.39; 'data': 0.39; 'does': 0.39; 'to:addr:python.org': 0.40; 'subject:with': 0.40; 'received:de': 0.40; 'your': 0.60; 'limit': 0.65; 'reuse': 0.66; 'offer': 0.66; 'yourself': 0.73; 'cater': 0.84; 'otten': 0.84 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: p57bd8e07.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:98603 Ulli Horlacher wrote: > Ulli Horlacher wrote: >> Peter Otten <__peter__@web.de> wrote: > >> > - consider shutil.copyfileobj to limit memory usage when dealing with >> > data >> > of arbitrary size. >> > >> > Putting it together: >> > >> > with open(sz, "wb") as szo: >> > shutil.copyfileobj(u, szo) >> >> This writes the http stream binary to the file. without handling it >> manually chunk by chunk? > > I have a problem with it: There is no feedback for the user about the > progress of the transfer, which can last several hours. > > For small files shutil.copyfileobj() is a good idea, but not for huge > ones. Indeed. Have a look at the source code: def copyfileobj(fsrc, fdst, length=16*1024): """copy data from file-like object fsrc to file-like object fdst""" while 1: buf = fsrc.read(length) if not buf: break fdst.write(buf) As simple as can be. I suggested the function as an alternative to writing the loop yourself when your example code basically showed dest.write(source.read()) For the huge downloads that you intend to cater to you probably want your script not just to print a dot on every iteration, you need expected remaining time, checksums, ability to stop and resume a download and whatnot. Does the Perl code offer that? Then why rewrite? Or are there Python libraries that do that out of the box? Can you reuse them?