Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #107998
| From | DFS <nospam@dfs.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Fastest way to retrieve and write html contents to file |
| Date | 2016-05-02 03:37 -0400 |
| Organization | A noiseless patient Spider |
| Message-ID | <ng6vsl$1jc$1@dont-email.me> (permalink) |
| References | (4 earlier) <1462166136.1167243.595273897.291B0865@webmail.messagingengine.com> <mailman.303.1462166138.32212.python-list@python.org> <ng6q67$gnm$1@dont-email.me> <1462170452.1180117.595306673.68B64F02@webmail.messagingengine.com> <mailman.308.1462170455.32212.python-list@python.org> |
On 5/2/2016 2:27 AM, Stephen Hansen wrote: > On Sun, May 1, 2016, at 10:59 PM, DFS wrote: >> startTime = time.clock() >> for i in range(loops): >> r = urllib2.urlopen(webpage) >> f = open(webfile,"w") >> f.write(r.read()) >> f.close >> endTime = time.clock() >> print "Finished urllib2 in %.2g seconds" %(endTime-startTime) > > Yeah on my system I get 1.8 out of this, amounting to 0.18s. You get 1.8 seconds total for the 10 loops? That's less than half as fast as my results. Surprising. > I'm again going back to the point of: its fast enough. When comparing > two small numbers, "twice as slow" is meaningless. Speed is always meaningful. I know python is relatively slow, but it's a cool, concise, powerful language. I'm extremely impressed by how tight the code can get. > You have an assumption you haven't answered, that downloading a 10 meg > file will be twice as slow as downloading this tiny file. You haven't > proven that at all. True. And it has been my assumption - tho not with 10MB file. > I suspect you have a constant overhead of X, and in this toy example, > that makes it seem twice as slow. But when downloading a file of size, > you'll have the same constant factor, at which point the difference is > irrelevant. Good point. Test below. > If you believe otherwise, demonstrate it. http://www.usdirectory.com/ypr.aspx?fromform=qsearch&qs=ga&wqhqn=2&qc=Atlanta&rg=30&qhqn=restaurant&sb=zipdisc&ap=2 It's a 58854 byte file when saved to disk (smaller file was 3546 bytes), so this is 16.6x larger. So I would expect python to linearly run in 16.6 * 0.88 = 14.6 seconds. 10 loops per run 1st run $ python timeGetHTML.py Finished urllib in 8.5 seconds Finished urllib2 in 5.6 seconds Finished requests in 7.8 seconds Finished pycurl in 6.5 seconds wait a couple minutes, then 2nd run $ python timeGetHTML.py Finished urllib in 5.6 seconds Finished urllib2 in 5.7 seconds Finished requests in 5.2 seconds Finished pycurl in 6.4 seconds It's a little more than 1/3 of my estimate - so good news. (when I was doing these tests, some of the python results were 0.75 seconds - way too fast, so I checked and no data was written to file, and I couldn't even open the webpage with a browser. Looks like I had been temporarily blocked from the site. After a couple minutes, I was able to access it again). I noticed urllib and curl returned the html as is, but urllib2 and requests added enhancements that should make the data easier to parse. Based on speed and functionality and documentation, I believe I'll be using the requests HTTP library (I will actually be doing a small amount of web scraping). VBScript 1st run: 7.70 seconds 2nd run: 5.38 3rd run: 7.71 So python matches or beats VBScript at this much larger file. Kewl.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 00:06 -0400
Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 21:34 -0700
Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 14:40 +1000
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 00:50 -0400
Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:00 -0700
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 01:04 -0400
Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 15:12 +1000
Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:17 -0700
Re: Fastest way to retrieve and write html contents to file Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-05-02 15:57 +1000
Re: Fastest way to retrieve and write html contents to file Ben Finney <ben+python@benfinney.id.au> - 2016-05-02 14:49 +1000
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 01:00 -0400
Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:15 -0700
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 01:59 -0400
Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-01 23:27 -0700
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 03:37 -0400
Re: Fastest way to retrieve and write html contents to file Stephen Hansen <me+python@ixokai.io> - 2016-05-02 00:58 -0700
Re: Fastest way to retrieve and write html contents to file Michael Torrie <torriem@gmail.com> - 2016-05-02 22:06 -0600
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-03 00:24 -0400
Re: Fastest way to retrieve and write html contents to file Tim Chase <python.list@tim.thechases.com> - 2016-05-03 10:28 -0500
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-03 13:00 -0400
Re: Fastest way to retrieve and write html contents to file Tim Chase <python.list@tim.thechases.com> - 2016-05-03 13:41 -0500
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-04 02:10 -0400
Re: Fastest way to retrieve and write html contents to file Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-05-02 16:05 +1000
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 02:47 -0400
Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 17:19 +1000
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 21:51 -0400
Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-03 12:00 +1000
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 22:01 -0400
Re: Fastest way to retrieve and write html contents to file Peter Otten <__peter__@web.de> - 2016-05-02 10:42 +0200
Re: Fastest way to retrieve and write html contents to file DFS <nospam@dfs.com> - 2016-05-02 21:52 -0400
Re: Fastest way to retrieve and write html contents to file Chris Angelico <rosuav@gmail.com> - 2016-05-02 14:53 +1000
Re: Fastest way to retrieve and write html contents to file Tim Chase <python.list@tim.thechases.com> - 2016-05-02 07:38 -0500
csiph-web