Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python.': 0.02; 'received:209.85.223': 0.03; 'diff': 0.05; 'memory.': 0.05; '21,': 0.07; 'matches': 0.07; 'subject:code': 0.07; 'subject:help': 0.07; 'sql,': 0.09; 'to:addr:comp.lang.python': 0.09; 'cc:addr:python- list': 0.10; 'resulting': 0.13; 'dec': 0.15; '11:19': 0.16; 'run.': 0.16; 'subject:making': 0.16; 'wrote:': 0.17; 'typical': 0.17; 'cc:2**0': 0.23; 'elements': 0.23; 'originally': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header :In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; '(which': 0.26; 'am,': 0.27; 'chris': 0.28; 'run': 0.28; 'selecting': 0.29; 'e.g.': 0.30; 'fri,': 0.30; 'query': 0.30; 'thursday,': 0.30; 'code': 0.31; 'december': 0.32; 'could': 0.32; 'getting': 0.33; 'anyone': 0.33; 'another': 0.33; 'changed': 0.34; 'received:google.com': 0.34; 'doing': 0.35; 'received:209.85': 0.35; 'but': 0.36; 'wanted': 0.36; 'subject:with': 0.36; 'too': 0.36; 'enough': 0.36; 'execute': 0.37; 'received:209': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'takes': 0.39; 'email addr:gmail.com': 0.63; 'within': 0.64; 'taking': 0.65; '20,': 0.65; 'hours': 0.66; 'sounds': 0.71; 'wanted,': 0.84 Newsgroups: comp.lang.python Date: Thu, 20 Dec 2012 16:43:34 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=68.84.146.219; posting-account=aFD2wgkAAACT3OnBYoNKQGBzyOZ_PB2h References: User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-IP: 68.84.146.219 MIME-Version: 1.0 Subject: Re: help with making my code more efficient From: "Larry.Martell@gmail.com" To: comp.lang.python@googlegroups.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Message-ID: Lines: 30 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1356053559 news.xs4all.nl 6934 [2001:888:2000:d::a6]:42907 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:35254 On Thursday, December 20, 2012 5:38:03 PM UTC-7, Chris Angelico wrote: > On Fri, Dec 21, 2012 at 11:19 AM, Larry.Martell@gmail.com >=20 > wrote: >=20 > > This code works, but it takes way too long to run - e.g. when cdata has= 600,000 elements (which is typical for my app) it takes 2 hours for this t= o run. >=20 > > >=20 > > Can anyone give me some suggestions on speeding this up? >=20 > > >=20 >=20 >=20 > It sounds like you may have enough data to want to not keep it all in >=20 > memory. Have you considered switching to a database? You could then >=20 > execute SQL queries against it. It came from a database. Originally I was getting just the data I wanted us= ing SQL, but that was taking too long also. I was selecting just the messag= es I wanted, then for each one of those doing another query to get the data= within the time diff of each. That was resulting in tens of thousands of q= ueries. So I changed it to pull all the potential matches at once and then = process it in python.=20