Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Ben Finney Newsgroups: comp.lang.python Subject: Re: Fastest way to retrieve and write html contents to file Date: Mon, 02 May 2016 14:49:57 +1000 Lines: 32 Message-ID: References: <85vb2xgj2i.fsf@benfinney.id.au> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: news.uni-berlin.de CJTudhqLxzeh86VeYg9KYgmaVE6KELJxlIGuuoZO9NPA== Cancel-Lock: sha1:u4zbUDQrAhRlN6OSD+G5MqFbNq8= Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.011 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'subject:file': 0.07; 'urllib2': 0.07; '2004': 0.09; 'alter': 0.09; 'compute': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'runs,': 0.09; '\xe2\x80\x94': 0.09; 'python.': 0.11; 'dfs': 0.16; 'duration,': 0.16; 'loops': 0.16; 'received:80.91.229.3': 0.16; 'received:io': 0.16; 'received:plane.gmane.org': 0.16; 'received:psf.io': 0.16; 'so)': 0.16; 'machine': 0.21; 'controlled': 0.22; 'delay': 0.22; 'requests': 0.25; 'header:User-Agent:1': 0.26; 'header:X -Complaints-To:1': 0.26; 'external': 0.27; 'points': 0.27; 'program,': 0.29; 'code': 0.30; 'run': 0.33; 'to:addr:python- list': 0.36; 'subject:: ': 0.37; 'received:org': 0.37; 'end': 0.39; 'test': 0.39; 'whatever': 0.39; 'to:addr:python.org': 0.40; 'skip:u 10': 0.61; 'wide': 0.61; 'more': 0.63; 'different': 0.63; 'great': 0.63; 'times': 0.63; 'believe': 0.66; 'god': 0.67; 'subject': 0.70; '_o__)': 0.84; 'comparable': 0.84; 'received:125': 0.84; 'subject:write': 0.84; 'timing.': 0.84; '\xe2\x80\x9cwe': 0.84; 'adopt': 0.91; 'average': 0.93 X-Injected-Via-Gmane: http://gmane.org/ X-Gmane-NNTP-Posting-Host: jigong.madmonks.org X-Public-Key-ID: 0xAC128405 X-Public-Key-Fingerprint: 517C F14B B2F3 98B0 CB35 4855 B8B2 4C06 AC12 8405 X-Public-Key-URL: http://www.benfinney.id.au/contact/bfinney-pubkey.asc X-Post-From: Ben Finney User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <85vb2xgj2i.fsf@benfinney.id.au> X-Mailman-Original-References: Xref: csiph.com comp.lang.python:107971 DFS writes: > Then I tested them in loops - the VBScript is MUCH faster: 0.44 for 10 > iterations, vs 0.88 for python. > > […] > > urllib2 and requests were about the same speed as urllib.urlretrieve, > while pycurl was significantly slower (1.2 seconds). Network access is notoriously erratic in its timing. The program, and the machine on which it runs, is subject to a great many external effects once the request is sent — effects which will significantly alter the delay before a response is completed. How have you controlled for the wide variability in the duration, for even a given request by the *same code on the same machine*, at different points in time? One simple way to do that: Run the exact same test many times (say, 10 000 or so) on the same machine, and then compute the average of all the durations. Do the same for each different program, and then you may have more meaningfully comparable measurements. -- \ “We are no more free to believe whatever we want about God than | `\ we are free to adopt unjustified beliefs about science or | _o__) history […].” —Sam Harris, _The End of Faith_, 2004 | Ben Finney