Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'subject:Python': 0.04; 'folks': 0.04; 'chunk': 0.07; 'concurrently': 0.07; 'option,': 0.07; 'python': 0.07; 'fetch': 0.09; 'it;': 0.09; 'sockets': 0.09; 'spawn': 0.09; 'pm,': 0.11; 'request,': 0.14; 'say,': 0.14; 'wrote:': 0.14; 'extracting': 0.16; 'threading': 0.16; 'functions,': 0.19; 'loading': 0.19; 'perl': 0.19; 'code': 0.22; 'header:In-Reply-To:1': 0.22; 'help,': 0.22; 'received:209.85.214.174': 0.23; 'received:mail- iw0-f174.google.com': 0.23; "what's": 0.24; 'calling': 0.25; 'easiest': 0.25; 'pages,': 0.25; 'chris': 0.27; 'message- id:@mail.gmail.com': 0.28; 'fri,': 0.29; 'certainly': 0.29; 'functions.': 0.29; 'probably': 0.30; 'all.': 0.30; "won't": 0.30; 'asynchronous': 0.31; 'i/o': 0.31; 'threads': 0.31; 'separate': 0.31; 'url:library': 0.31; 'to:addr:python-list': 0.32; 'url:docs': 0.33; 'module': 0.33; 'using': 0.34; 'there': 0.35; 'option.': 0.35; 'should': 0.37; 'received:209.85': 0.37; 'url:python': 0.37; 'run': 0.37; 'apr': 0.38; 'thread': 0.38; 'received:google.com': 0.38; 'but': 0.38; 'url:org': 0.38; 'so,': 0.38; 'received:209.85.214': 0.39; 'to:addr:python.org': 0.39; 'received:209': 0.39; "it's": 0.40; 'header:Received:5': 0.40; 'waiting': 0.61; 'results': 0.61; '2011': 0.62; 'ever': 0.65; 'fan': 0.67; 'met': 0.67; 'details:': 0.72; '1993': 0.84; 'isolate': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=rkfgQOihyshdvE1BkVgEDMmE3mDtZKegRvWscdckRH4=; b=SthQqPbWbAl7ZAUcPrxD1euoe37USq5YZg4scgPtohKp5ae6XGQmUydKQe62fvKLLj dGVVuzOGLCiKVyRArvcdbejypjHNrKzd6Ed/Hbz/RMh/Prg1WbCKi09Hm+Qz0Wg+ziML DbTCmlzGWiveAuLa3hLbjpy29bYOn82xTBU/E= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=wDG4vYCEn7RW8p4VjSom0LP7QPd+Lk6OezbcLFfN1Zo4BYWNNldSjcsLsWsLSTJLpT 2CyLDpPMOfWE0dX/ud6dmmaTKRQJstoH/PvOSg8izkT+BcVb1dmzo3oRmM/EIKw3lePM 3ew8MQp2axn9CDDhl/DpN22WGkk14QMjCfrWo= MIME-Version: 1.0 In-Reply-To: References: Date: Fri, 8 Apr 2011 17:25:20 +1000 Subject: Re: Tips on Speeding up Python Execution From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 27 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1302247524 news.xs4all.nl 81483 [::ffff:82.94.164.166]:56544 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:2828 On Fri, Apr 8, 2011 at 5:04 PM, Abhijeet Mahagaonkar wrote: > I was able to isolate that major chunk of run time is eaten up in opening a > webpages, reading from them and extracting text. > I wanted to know if there is a way to concurrently calling the functions. So, to clarify: you have code that's loading lots of separate pages, and the time is spent waiting for the internet? If you're saturating your connection, then this won't help, but if they're all small pages and they're coming over the internet, then yes, you certainly CAN fetch them concurrently. As the Perl folks say, There's More Than One Way To Do It; one is to spawn a thread for each request, then collect up all the results at the end. Look up the 'threading' module for details: http://docs.python.org/library/threading.html It should also be possible to directly use asynchronous I/O and select(), but I couldn't see a way to do that with urllib/urllib2. If you're using sockets directly, this ought to be an option. I don't know what's the most Pythonesque option, but if you already have specific Python code for each of your functions, it's probably going to be easiest to spawn threads for them all. Chris Angelico Threading fan ever since he met OS/2 in 1993 or so