Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #108000

Re: Fastest way to retrieve and write html contents to file

From Stephen Hansen <me+python@ixokai.io>
Newsgroups comp.lang.python
Subject Re: Fastest way to retrieve and write html contents to file
Date 2016-05-02 00:58 -0700
Message-ID <mailman.312.1462175886.32212.python-list@python.org> (permalink)
References (6 earlier) <ng6q67$gnm$1@dont-email.me> <1462170452.1180117.595306673.68B64F02@webmail.messagingengine.com> <mailman.308.1462170455.32212.python-list@python.org> <ng6vsl$1jc$1@dont-email.me> <1462175882.1196948.595349401.3B7C84C9@webmail.messagingengine.com>

Show all headers | View raw


On Mon, May 2, 2016, at 12:37 AM, DFS wrote:
> On 5/2/2016 2:27 AM, Stephen Hansen wrote:
> > I'm again going back to the point of: its fast enough. When comparing
> > two small numbers, "twice as slow" is meaningless.
> 
> Speed is always meaningful.
> 
> I know python is relatively slow, but it's a cool, concise, powerful 
> language.  I'm extremely impressed by how tight the code can get.

I'm sorry, but no. Speed is not always meaningful. 

It's not even usually meaningful, because you can't quantify what "speed
is". In context, you're claiming this is twice as slow (even though my
tests show dramatically better performance), but what details are
different?

You're ignoring the fact that Python might have a constant overhead --
meaning, for a 1k download, it might have X speed cost. For a 1meg
download, it might still have the exact same X cost.

Looking narrowly, that overhead looks like "twice as slow", but that's
not meaningful at all. Looking larger, that overhead is a pittance.

You aren't measuring that.

> > You have an assumption you haven't answered, that downloading a 10 meg
> > file will be twice as slow as downloading this tiny file. You haven't
> > proven that at all.
> 
> True.  And it has been my assumption - tho not with 10MB file.

And that assumption is completely invalid.

> I noticed urllib and curl returned the html as is, but urllib2 and 
> requests added enhancements that should make the data easier to parse. 
> Based on speed and functionality and documentation, I believe I'll be 
> using the requests HTTP library (I will actually be doing a small amount 
> of web scraping).

The requests library's added-value is ease-of-use, and its overhead is
likely tiny: so using it means you spend less effort making a thing
happen. I recommend you embrace this. 

> VBScript
> 1st run: 7.70 seconds
> 2nd run: 5.38
> 3rd run: 7.71
> 
> So python matches or beats VBScript at this much larger file.  Kewl.

This is what I'm talking about: Python might have a constant overhead,
but looking at larger operations, its probably comparable. Not fast,
mind you. Python isn't the fastest language out there. But in real world
work, its usually fast enough.

-- 
Stephen Hansen
  m e @ i x o k a i . i o

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 00:06 -0400
  Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 21:34 -0700
  Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 14:40 +1000
    Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 00:50 -0400
      Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:00 -0700
        Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 01:04 -0400
          Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 15:12 +1000
          Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:17 -0700
          Re: Fastest way to retrieve and write html contents to file Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-05-02 15:57 +1000
  Re: Fastest way to retrieve and write html contents to file Ben Finney <ben+python@benfinney.id.au> - 2016-05-02 14:49 +1000
    Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 01:00 -0400
      Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:15 -0700
        Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 01:59 -0400
          Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 23:27 -0700
            Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 03:37 -0400
              Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-02 00:58 -0700
              Re: Fastest way to retrieve and write html contents to file Michael Torrie <torriem@gmail.com> - 2016-05-02 22:06 -0600
                Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-03 00:24 -0400
                Re: Fastest way to retrieve and write html contents to file Tim Chase <python.list@tim.thechases.com> - 2016-05-03 10:28 -0500
                Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-03 13:00 -0400
                Re: Fastest way to retrieve and write html contents to file Tim Chase <python.list@tim.thechases.com> - 2016-05-03 13:41 -0500
                Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-04 02:10 -0400
      Re: Fastest way to retrieve and write html contents to file Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-05-02 16:05 +1000
        Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 02:47 -0400
          Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 17:19 +1000
            Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 21:51 -0400
              Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-03 12:00 +1000
                Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 22:01 -0400
          Re: Fastest way to retrieve and write html contents to file Peter Otten <__peter__@web.de> - 2016-05-02 10:42 +0200
            Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 21:52 -0400
  Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 14:53 +1000
  Re: Fastest way to retrieve and write html contents to file Tim Chase <python.list@tim.thechases.com> - 2016-05-02 07:38 -0500

csiph-web