Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #107997

Re: Fastest way to retrieve and write html contents to file

From Chris Angelico <rosuav@gmail.com>
Newsgroups comp.lang.python
Subject Re: Fastest way to retrieve and write html contents to file
Date 2016-05-02 17:19 +1000
Message-ID <mailman.311.1462173544.32212.python-list@python.org> (permalink)
References (2 earlier) <mailman.298.1462164614.32212.python-list@python.org> <ng6mn7$8nv$1@dont-email.me> <5726ee33$0$1617$c3e8da3$5496439d@news.astraweb.com> <ng6sul$o85$1@dont-email.me> <CAPTjJmqO3jMMi57EZEAhFDD3OMbAeG9XVat2aAX529kEbpA7JQ@mail.gmail.com>

Show all headers | View raw


On Mon, May 2, 2016 at 4:47 PM, DFS <nospam@dfs.com> wrote:
> I'm not specifying a local web cache with either (wouldn't know how or where
> to look).  If you have Windows, you can try it.
> -------------------------------------------------------------------
> Option Explicit
> Dim xmlHTTP, fso, fOut, startTime, endTime, webpage, webfile,i
> webpage = "http://econpy.pythonanywhere.com/ex/001.html"
> webfile  = "D:\econpy001.html"
> startTime = Timer
> For i = 1 to 10
>  Set xmlHTTP = CreateObject("MSXML2.serverXMLHTTP")
>  xmlHTTP.Open "GET", webpage
>  xmlHTTP.Send
>  Set fso = CreateObject("Scripting.FileSystemObject")
>  Set fOut = fso.CreateTextFile(webfile, True)
>   fOut.WriteLine xmlHTTP.ResponseText
>  fOut.Close
>  Set fOut    = Nothing
>  Set fso     = Nothing
>  Set xmlHTTP = Nothing
> Next
> endTime = Timer
> wscript.echo "Finished VBScript in " & FormatNumber(endTime - startTime,3) &
> " seconds"
> -------------------------------------------------------------------

There's an easier way to test if there's caching happening. Just crank
the iterations up from 10 to 100 and see what happens to the times. If
your numbers are perfectly fair, they should be perfectly linear in
the iteration count; eg a 1.8 second ten-iteration loop should become
an 18 second hundred-iteration loop. Obviously they won't be exactly
that, but I would expect them to be reasonably close (eg 17-19
seconds, but not 2 seconds).

Then the next thing to test would be to create a deliberately-slow web
server, and connect to that. Put a two-second delay into it, to
simulate a distant or overloaded server, and see if your logs show the
correct result. Something like this:

--------

import time
try:
    import http.server as BaseHTTPServer # Python 3
except ImportError:
    import BaseHTTPServer # Python 2

class SlowHTTP(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header("Content-type","text/html")
        self.end_headers()
        self.wfile.write(b"Hello, ")
        time.sleep(2)
        self.wfile.write(b"world!")

server = BaseHTTPServer.HTTPServer(("", 1234), SlowHTTP)
server.serve_forever()

-------

Test that with a web browser or command-line downloader (go to
http://127.0.0.1:1234/), and make sure that (a) it produces "Hello,
world!", and (b) it takes two seconds. Then set your test scripts to
downloading that URL. (Be sure to set them back to low iteration
counts first!) If the times are true and fair, they should all come
out pretty much the same - ten iterations, twenty seconds. And since
all that's changed is the server, this will be an accurate
demonstration of what happens in the real world: network requests
aren't always fast. Incidentally, you can also watch the server's log
to see if it's getting the appropriate number of requests.

It may turn out that changing the web server actually materially
changes your numbers. Comment out the sleep call and try it again -
you might find that your numbers come closer together, because this
naive server doesn't send back 204 NOT MODIFIED responses or anything.
Again, though, this would prove that you're not actually measuring
language performance, because the tests are more dependent on the
server than the client.

Even if the files themselves aren't being cached, you might find that
DNS is. So if you truly want to eliminate variables, replace the name
in your URL with an IP address. It's another thing that might mess
with your timings, without actually being a language feature.

Networking has about four billion variables in it. You're messing with
one of the least significant: the programming language :)

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 00:06 -0400
  Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 21:34 -0700
  Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 14:40 +1000
    Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 00:50 -0400
      Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:00 -0700
        Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 01:04 -0400
          Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 15:12 +1000
          Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:17 -0700
          Re: Fastest way to retrieve and write html contents to file Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-05-02 15:57 +1000
  Re: Fastest way to retrieve and write html contents to file Ben Finney <ben+python@benfinney.id.au> - 2016-05-02 14:49 +1000
    Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 01:00 -0400
      Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:15 -0700
        Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 01:59 -0400
          Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 23:27 -0700
            Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 03:37 -0400
              Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-02 00:58 -0700
              Re: Fastest way to retrieve and write html contents to file Michael Torrie <torriem@gmail.com> - 2016-05-02 22:06 -0600
                Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-03 00:24 -0400
                Re: Fastest way to retrieve and write html contents to file Tim Chase <python.list@tim.thechases.com> - 2016-05-03 10:28 -0500
                Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-03 13:00 -0400
                Re: Fastest way to retrieve and write html contents to file Tim Chase <python.list@tim.thechases.com> - 2016-05-03 13:41 -0500
                Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-04 02:10 -0400
      Re: Fastest way to retrieve and write html contents to file Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-05-02 16:05 +1000
        Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 02:47 -0400
          Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 17:19 +1000
            Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 21:51 -0400
              Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-03 12:00 +1000
                Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 22:01 -0400
          Re: Fastest way to retrieve and write html contents to file Peter Otten <__peter__@web.de> - 2016-05-02 10:42 +0200
            Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 21:52 -0400
  Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 14:53 +1000
  Re: Fastest way to retrieve and write html contents to file Tim Chase <python.list@tim.thechases.com> - 2016-05-02 07:38 -0500

csiph-web