Groups > comp.lang.python > #93332 > unrolled thread

Python 3 resuma a file download

Started by	zljubisic@gmail.com
First post	2015-06-30 08:34 -0700
Last post	2015-07-04 22:31 -0700
Articles	15 — 8 participants

Back to article view | Back to comp.lang.python

  Python 3 resuma a file download zljubisic@gmail.com - 2015-06-30 08:34 -0700
    Re: Python 3 resuma a file download Cameron Simpson <cs@zip.com.au> - 2015-07-01 09:21 +1000
      Re: Python 3 resuma a file download zljubisic@gmail.com - 2015-07-01 07:18 -0700
        Re: Python 3 resuma a file download Ian Kelly <ian.g.kelly@gmail.com> - 2015-07-01 09:10 -0600
          Re: Python 3 resuma a file download zljubisic@gmail.com - 2015-07-01 12:24 -0700
            Re: Python 3 resuma a file download Chris Angelico <rosuav@gmail.com> - 2015-07-02 05:28 +1000
              Re: Python 3 resuma a file download zljubisic@gmail.com - 2015-07-01 12:31 -0700
                Re: Python 3 resuma a file download Peter Otten <__peter__@web.de> - 2015-07-01 21:51 +0200
                Re: Python 3 resuma a file download zljubisic@gmail.com - 2015-07-01 12:59 -0700
                  Re: Python 3 resuma a file download Peter Otten <__peter__@web.de> - 2015-07-01 22:06 +0200
                Re: Python 3 resuma a file download Tim Chase <python.list@tim.thechases.com> - 2015-07-01 15:04 -0500
                  Re: Python 3 resuma a file download zljubisic@gmail.com - 2015-07-02 13:27 -0700
                    Re: Python 3 resuma a file download Irmen de Jong <irmen.NOSPAM@xs4all.nl> - 2015-07-02 23:31 +0200
                    Re: Python 3 resuma a file download MRAB <python@mrabarnett.plus.com> - 2015-07-02 23:07 +0100
                      Re: Python 3 resuma a file download zljubisic@gmail.com - 2015-07-04 22:31 -0700

#93332 — Python 3 resuma a file download

From	zljubisic@gmail.com
Date	2015-06-30 08:34 -0700
Subject	Python 3 resuma a file download
Message-ID	<9a629cf3-e256-494a-8ff8-3f1f6fc2218c@googlegroups.com>

Hi,

I would like to download a file (http://video.hrt.hr/2906/otv296.mp4)

If the connection is OK, I can download the file with:

import urllib.request
urllib.request.urlretrieve(remote_file, local_file)

Sometimes when I am connected on week wireless (not mine) network I get WinError 10054 exception (windows 7).

When it happens, I would like to resume download instead of doing everything from very beginning.

How to do that?

I read about Range header and chunks, but this server doesn't have any headers.

What options do I have with this particular file?

Regards.

[toc] | [next] | [standalone]

#93355

From	Cameron Simpson <cs@zip.com.au>
Date	2015-07-01 09:21 +1000
Message-ID	<mailman.204.1435707788.3674.python-list@python.org>
In reply to	#93332

On 30Jun2015 08:34, zljubisic@gmail.com <zljubisic@gmail.com> wrote:
>I would like to download a file (http://video.hrt.hr/2906/otv296.mp4)
>If the connection is OK, I can download the file with:
>
>import urllib.request
>urllib.request.urlretrieve(remote_file, local_file)
>
>Sometimes when I am connected on week wireless (not mine) network I get WinError 10054 exception (windows 7).
>
>When it happens, I would like to resume download instead of doing everything from very beginning.
>
>How to do that?
>
>I read about Range header and chunks, but this server doesn't have any headers.
>
>What options do I have with this particular file?

You need to use a Range: header. I don't know what you mean when you say "this 
server doesn't have any headers". All HTTP requests and responses use headers.  
Possibly you mean you code isn't setting any headers.

What you need to do is separate your call to urlretrieve into a call to 
construct a Request object, add a Range header, then fetch the URL using the 
Request object, appending the results (if successful) to the end of your local 
file.

If you go to:

  https://docs.python.org/3/library/urllib.request.html#urllib.request.urlretrieve

and scroll up you will find example code doing that kind of thing in the 
examples above.

Cheers,
Cameron Simpson <cs@zip.com.au>

The British Interplanetary Society? How many planets are members then?
        - G. Robb

[toc] | [prev] | [next] | [standalone]

#93373

From	zljubisic@gmail.com
Date	2015-07-01 07:18 -0700
Message-ID	<ef677bbd-5b40-40b7-8f56-e6a7d39f9ce0@googlegroups.com>
In reply to	#93355

On Wednesday, 1 July 2015 01:43:19 UTC+2, Cameron Simpson  wrote:
> On 30Jun2015 08:34, zljubisic@gmail.com <zljubisic@gmail.com> wrote:
> >I would like to download a file (http://video.hrt.hr/2906/otv296.mp4)
> >If the connection is OK, I can download the file with:
> >
> >import urllib.request
> >urllib.request.urlretrieve(remote_file, local_file)
> >
> >Sometimes when I am connected on week wireless (not mine) network I get WinError 10054 exception (windows 7).
> >
> >When it happens, I would like to resume download instead of doing everything from very beginning.
> >
> >How to do that?
> >
> >I read about Range header and chunks, but this server doesn't have any headers.
> >
> >What options do I have with this particular file?
> 
> You need to use a Range: header. I don't know what you mean when you say "this 
> server doesn't have any headers". All HTTP requests and responses use headers.  
> Possibly you mean you code isn't setting any headers.
> 
> What you need to do is separate your call to urlretrieve into a call to 
> construct a Request object, add a Range header, then fetch the URL using the 
> Request object, appending the results (if successful) to the end of your local 
> file.
> 
> If you go to:
> 
>   https://docs.python.org/3/library/urllib.request.html#urllib.request.urlretrieve
> 
> and scroll up you will find example code doing that kind of thing in the 
> examples above.
> 
> Cheers,
> Cameron Simpson <cs@zip.com.au>
> 
> The British Interplanetary Society? How many planets are members then?
>         - G. Robb

Hi,

if I understood you correctly (I am not sure about which example you are refering), I should do the following:
1. check already downloaded file size in bytes = downloaded
2. url = 'http://video.hrt.hr/2906/otv296.mp4'
3. req = urllib.request.Request(url)
4. req.add_header('Range', downloaded)
5. urllib.request.urlretrieve(url, 'otv296.mp4')

Is that what you were saying?

Regards.

[toc] | [prev] | [next] | [standalone]

#93376

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2015-07-01 09:10 -0600
Message-ID	<mailman.219.1435763456.3674.python-list@python.org>
In reply to	#93373

On Wed, Jul 1, 2015 at 8:18 AM,  <zljubisic@gmail.com> wrote:
> if I understood you correctly (I am not sure about which example you are refering), I should do the following:
> 1. check already downloaded file size in bytes = downloaded
> 2. url = 'http://video.hrt.hr/2906/otv296.mp4'
> 3. req = urllib.request.Request(url)
> 4. req.add_header('Range', downloaded)

You need to use the correct format for the Range header; see RFC 7233.
If you have 500 bytes and want the rest of the file, then the value
for the Range header would be "bytes=500-", not just "500". You can
build that string using string formatting, e.g.
"bytes={}-".format(downloaded)

> 5. urllib.request.urlretrieve(url, 'otv296.mp4')

A couple of problems with this. One is that it doesn't use the Request
object that you just constructed, so it wouldn't pass the Range
header. The other is that it will overwrite that file, not append to
it. You should use the urllib.request.urlopen function, and pass it
the Request object rather than the URL. You can then open your local
file in append mode, read the file data from the HTTPResponse object
returned by urlopen, and write it to the local file.

[toc] | [prev] | [next] | [standalone]

#93384

From	zljubisic@gmail.com
Date	2015-07-01 12:24 -0700
Message-ID	<4a7fae78-276e-42c6-9d84-1fddbeb99853@googlegroups.com>
In reply to	#93376

Currently I am executing the following code:

import os
import urllib.request

def Download(rfile, lfile):

    retval = False

    if os.path.isfile(lfile):
        lsize = os.stat(lfile).st_size
    else:
        lsize = 0

    req = urllib.request.Request(rfile)
    req.add_header('Range', "bytes={}-".format(lsize))


    with urllib.request.urlopen(req) as response, open(lfile, 'ab') as out_file:
        data = response.read() # a `bytes` object
        out_file.write(data)

    if response.headers.headers['Content-Length'] == os.stat(lfile).st_size:
        retval = True

    return retval

Download('http://video.hrt.hr/2906/otv296.mp4', 'otv296.mp4')

The internet connection here is very slow so execution will last for say an hour.

In meantime, can I upgrade the procedure in a way to write to file chunk by chunk instead of the whole file?

Furthermore, how to know whether the file is fully completed or not?
I trid with "if response.headers.headers['Content-Length'] == os.stat(lfile).st_size:", but I believe there is a better way.

Big regards.

[toc] | [prev] | [next] | [standalone]

#93385

From	Chris Angelico <rosuav@gmail.com>
Date	2015-07-02 05:28 +1000
Message-ID	<mailman.225.1435778925.3674.python-list@python.org>
In reply to	#93384

On Thu, Jul 2, 2015 at 5:24 AM,  <zljubisic@gmail.com> wrote:
>     with urllib.request.urlopen(req) as response, open(lfile, 'ab') as out_file:
>         data = response.read() # a `bytes` object
>         out_file.write(data)
>

If a file is big enough to want to resume the download once, you
almost certainly want to be able to resume it a second time. I would
recommend not attempting to do the entire read and write as a single
operation - read chunks and write them to the disk. This will attempt
to read the entire response into a gigantic in-memory bytestring, and
only then start writing.

ChrisA

[toc] | [prev] | [next] | [standalone]

#93386

From	zljubisic@gmail.com
Date	2015-07-01 12:31 -0700
Message-ID	<36d1cf6d-9b99-4d38-8f07-83443bcd2f60@googlegroups.com>
In reply to	#93385

But how to read chunks?

[toc] | [prev] | [next] | [standalone]

#93387

From	Peter Otten <__peter__@web.de>
Date	2015-07-01 21:51 +0200
Message-ID	<mailman.226.1435780326.3674.python-list@python.org>
In reply to	#93386

zljubisic@gmail.com wrote:

> But how to read chunks?

Instead of

>         data = response.read() # a `bytes` object
>         out_file.write(data)
 
use a loop:

CHUNKSIZE = 16*1024 # for example
while True:
   data =  response.read(CHUNKSIZE)
   if not data:
       break
   out_file.write(data)


This can be simplified:

shutil.copyfileobj(response, out_file)

https://docs.python.org/dev/library/shutil.html#shutil.copyfileobj

[toc] | [prev] | [next] | [standalone]

#93388

From	zljubisic@gmail.com
Date	2015-07-01 12:59 -0700
Message-ID	<482382f0-8ea4-4085-ae73-0d24e4fd8917@googlegroups.com>
In reply to	#93386

New version with chunks:

import os
import urllib.request

def Download(rfile, lfile):

    retval = False

    if os.path.isfile(lfile):
        lsize = os.stat(lfile).st_size
    else:
        lsize = 0

    req = urllib.request.Request(rfile)
    req.add_header('Range', "bytes={}-".format(lsize))


    response = urllib.request.urlopen(req)

    with open(lfile, 'ab') as out_file:
        while True:
            try:
                chunk = response.read(8192)
                if not chunk: break
                out_file.write(chunk)
            except ConnectionResetError as e:
                print('Exception ConnectionResetError {0}'.format(os.stat(lfile).st_size))



    if response.headers.headers['Content-Length'] == os.stat(lfile).st_size:
        retval = True

    return retval

Download('http://video.hrt.hr/2906/otv296.mp4', 'c:\\Users\\zoran\\hrt\\sync\\otv296.mp4')

[toc] | [prev] | [next] | [standalone]

#93389

From	Peter Otten <__peter__@web.de>
Date	2015-07-01 22:06 +0200
Message-ID	<mailman.227.1435781221.3674.python-list@python.org>
In reply to	#93388

zljubisic@gmail.com wrote:

> New version with chunks:
> 
> import os
> import urllib.request
> 
> def Download(rfile, lfile):
> 
>     retval = False
> 
>     if os.path.isfile(lfile):
>         lsize = os.stat(lfile).st_size
>     else:
>         lsize = 0
> 
>     req = urllib.request.Request(rfile)
>     req.add_header('Range', "bytes={}-".format(lsize))
> 
> 
>     response = urllib.request.urlopen(req)
> 
>     with open(lfile, 'ab') as out_file:
>         while True:
>             try:
>                 chunk = response.read(8192)
>                 if not chunk: break
>                 out_file.write(chunk)
>             except ConnectionResetError as e:
>                 print('Exception ConnectionResetError
>                 {0}'.format(os.stat(lfile).st_size))

Catching the exception inside the while-True loop is not a good idea.

>     if response.headers.headers['Content-Length'] ==
>     os.stat(lfile).st_size:
>         retval = True
> 
>     return retval
> 
> Download('http://video.hrt.hr/2906/otv296.mp4',
> 'c:\\Users\\zoran\\hrt\\sync\\otv296.mp4')

[toc] | [prev] | [next] | [standalone]

#93408

From	Tim Chase <python.list@tim.thechases.com>
Date	2015-07-01 15:04 -0500
Message-ID	<mailman.232.1435805143.3674.python-list@python.org>
In reply to	#93386

On 2015-07-01 21:51, Peter Otten wrote:
> use a loop:
> 
> CHUNKSIZE = 16*1024 # for example
> while True:
>    data =  response.read(CHUNKSIZE)
>    if not data:
>        break
>    out_file.write(data)
> 
> 
> This can be simplified:
> 
> shutil.copyfileobj(response, out_file)

It's these little corners of Python that make me wonder how many
times I've written that `while` loop when I could have just used a
tested function from the stdlib.  Sigh.

-tkc

[toc] | [prev] | [next] | [standalone]

#93449

From	zljubisic@gmail.com
Date	2015-07-02 13:27 -0700
Message-ID	<e37f20c5-1baf-48fa-a5c7-762616d18e5c@googlegroups.com>
In reply to	#93408

This is my final version which doesn't work. :(
Actually, it works with another file on another server, but doesn't work with mp4 files on this particular server.

I really don't know what to do?

Regards.

import os
import urllib.request

def Download(rfile, lfile):

    retval = False

    if os.path.isfile(lfile):
        lsize = os.stat(lfile).st_size
    else:
        lsize = 0

    req = urllib.request.Request(rfile)

    rfsize = urllib.request.urlopen(req).length

    req.add_header('Range', "bytes={}-".format(lsize))

    response = urllib.request.urlopen(req)

    with open(lfile, 'ab') as out_file:
        while True:
            try:
                chunk = response.read(64 * 1024)
                # chunk = response.read(128)
                if not chunk: break
                out_file.write(chunk)
                out_file.flush()


                lfsize = os.stat(lfile).st_size

                 print("{0:.2f}".format(lfsize / rfsize * 100))
            except ConnectionResetError as e:
                print('Exception ConnectionResetError {0}'.format(os.stat(lfile).st_size))
                response = urllib.request.urlopen(req)


    if lfsize == rfsize:
        retval = True

    return retval


while not Download('http://video.hrt.hr/2906/otv296.mp4', 'otv296.mp4'):
    print('1')

[toc] | [prev] | [next] | [standalone]

#93451

From	Irmen de Jong <irmen.NOSPAM@xs4all.nl>
Date	2015-07-02 23:31 +0200
Message-ID	<5595ad9b$0$2917$e4fe514c@news.xs4all.nl>
In reply to	#93449

On 2-7-2015 22:27, zljubisic@gmail.com wrote:
> This is my final version which doesn't work. :(
> Actually, it works with another file on another server, but doesn't work with mp4 files on this particular server.
> 
> I really don't know what to do?

Do you really need to download these files using Python?
Why not use a tool such as curl or wget:

$ wget --continue http://video.hrt.hr/2906/otv296.mp4

will download it and you'll be able to resume a partial download with that command as well.

Irmen

[toc] | [prev] | [next] | [standalone]

#93452

From	MRAB <python@mrabarnett.plus.com>
Date	2015-07-02 23:07 +0100
Message-ID	<mailman.262.1435874866.3674.python-list@python.org>
In reply to	#93449

On 2015-07-02 21:27, zljubisic@gmail.com wrote:
> This is my final version which doesn't work. :(
> Actually, it works with another file on another server, but doesn't work with mp4 files on this particular server.
>
> I really don't know what to do?
>
> Regards.
>
> import os
> import urllib.request
>
> def Download(rfile, lfile):
>
>      retval = False
>
>      if os.path.isfile(lfile):
>          lsize = os.stat(lfile).st_size
>      else:
>          lsize = 0
>
>      req = urllib.request.Request(rfile)
>
>      rfsize = urllib.request.urlopen(req).length
>
>      req.add_header('Range', "bytes={}-".format(lsize))
>
>      response = urllib.request.urlopen(req)
>
>      with open(lfile, 'ab') as out_file:
>          while True:
>              try:
>                  chunk = response.read(64 * 1024)
>                  # chunk = response.read(128)
>                  if not chunk: break
>                  out_file.write(chunk)
>                  out_file.flush()
>
>
>                  lfsize = os.stat(lfile).st_size
>
>                   print("{0:.2f}".format(lfsize / rfsize * 100))
>              except ConnectionResetError as e:
>                  print('Exception ConnectionResetError {0}'.format(os.stat(lfile).st_size))
>                  response = urllib.request.urlopen(req)
>
>
>      if lfsize == rfsize:
>          retval = True
>
>      return retval
>
>
> while not Download('http://video.hrt.hr/2906/otv296.mp4', 'otv296.mp4'):
>      print('1')
>
If a ConnectionResetError is raised, it'll restart the request from the
beginning, but continue appending to the same file. You should reset
the writing too.

[toc] | [prev] | [next] | [standalone]

#93500

From	zljubisic@gmail.com
Date	2015-07-04 22:31 -0700
Message-ID	<f221f214-aa91-404f-a4d8-a847eede2c75@googlegroups.com>
In reply to	#93452

I have a working solution. :)
The function below will download a file securely.
Thank anyone who helped. I wouldn't be able to write this function without your help.
I hope, someone else will benefit from our united work.

Best regards.

import os
import urllib.request

def Download(rfile, lfile):

    lsize = -1
    rsize = -2

    while True:
            try:
                if os.path.isfile(lfile):
                    lsize = os.stat(lfile).st_size
                else:
                    lsize = 0

                req = urllib.request.Request(rfile)

                rsize = urllib.request.urlopen(req).length

                if lsize == rsize:
                    break

                req.add_header('Range', "bytes={}-".format(lsize))

                response = urllib.request.urlopen(req)

                with open(lfile, 'ab') as out_file:
                    chunk = response.read(64 * 1024)

                    if not chunk:
                        break

                    out_file.write(chunk)
                    out_file.flush()

                    lsize = os.stat(lfile).st_size
                    prc_dloaded = round(lsize / rsize * 100, 2)

                    print(prc_dloaded)
                    if prc_dloaded == 100:
                        break
            except ConnectionResetError as e:
                print('Exception ConnectionResetError {0} %'.format(prc_dloaded))

    if lsize == rsize:
        retval = True
    else:
        retval = False

    return retval


while not Download('http://video.hrt.hr/2906/otv296.mp4', 'otv296.mp4'):
    print('1')

[toc] | [prev] | [standalone]

csiph-web

Python 3 resuma a file download

Contents

#93332 — Python 3 resuma a file download

#93355

#93373

#93376

#93384

#93385

#93386

#93387

#93388

#93389

#93408

#93449

#93451

#93452

#93500