Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'attribute': 0.05; 'diff': 0.05; 'executed': 0.07; 'list?': 0.07; 'matches': 0.07; 'subject:code': 0.07; 'tool,': 0.07; 'subject:help': 0.07; 'dict': 0.09; 'exists,': 0.09; 'keyed': 0.09; 'logic': 0.09; 'loop.': 0.09; 'matched': 0.09; 'def': 0.10; 'value.': 0.15; '(1),': 0.16; 'clues': 0.16; 'dictionary.': 0.16; 'loops': 0.16; 'row': 0.16; 'run.': 0.16; 'subject:making': 0.16; 'tool.': 0.16; 'top-level': 0.16; 'tup': 0.16; 'wrote:': 0.17; 'instance': 0.17; 'items.': 0.17; 'string,': 0.17; 'typical': 0.17; 'code,': 0.18; 'fixing': 0.22; 'object.': 0.22; 'tuples': 0.22; 'elements': 0.23; 'matching': 0.23; 'specified': 0.23; 'this:': 0.23; 'header:In- Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; '(which': 0.26; 'select': 0.26; 'question': 0.27; 'convention': 0.27; 'run': 0.28; 'record': 0.28; 'comparison': 0.29; 'way?': 0.29; 'no,': 0.29; 'e.g.': 0.30; 'function': 0.30; 'code': 0.31; 'gets': 0.32; 'not.': 0.32; 'could': 0.32; 'message.': 0.33; 'anyone': 0.33; 'to:addr:python-list': 0.33; 'likely': 0.33; 'themselves': 0.33; 'times.': 0.33; 'list': 0.35; 'false': 0.35; 'doing': 0.35; 'pm,': 0.35; 'there': 0.35; 'list.': 0.35; 'tool': 0.36; 'but': 0.36; 'anything': 0.36; 'subject:with': 0.36; 'too': 0.36; 'enough': 0.36; 'why': 0.37; 'item': 0.37; 'far': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'performance': 0.39; 'to:addr:python.org': 0.39; 'takes': 0.39; 'received:192': 0.39; 'called': 0.39; 'where': 0.40; 'received:192.168': 0.40; 'your': 0.60; 'map': 0.61; 'first': 0.61; 'time,': 0.62; 'email addr:gmail.com': 0.63; 'more': 0.63; 'within': 0.64; 'making': 0.64; 'here': 0.65; 'importantly,': 0.65; 'hours': 0.66; 'header:Reply-To:1': 0.68; 'hour': 0.69; 'cut': 0.71; 'received:74.208': 0.71; 'reply-to:no real name:2**0': 0.72; 'correctly?': 0.84; 'judicious': 0.84; 'nearby': 0.84; '360': 0.91; 'fragment': 0.91 Date: Thu, 20 Dec 2012 20:17:04 -0500 From: Dave Angel User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: python-list@python.org Subject: Re: help with making my code more efficient References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:2X/HtjaDbhHvl1J5yJhvYhjWuX4keU2KGwRPcoLQqgP 1lxW+7DlAOU5DZW4kBTClARUgdjiut7kM/5fEBuBC6NG1Q5/Ez A76d82JuWFqsnM8YHmKdVx76cl3GUa8A1C4poSaNSTbS7BJmdG 0x84ZghKpemAT2R8PFKNxuN/DGsuwqWwxCYxC4DQ5MYaPgYOHV j7AChImZjc2OZ/SZAWAJ0I3IVySVMPr2SyaezZEFlwDZuQncoL bDjMGCxYTo0K1zgHRd36ya7hDdk02xgQ0G3Mec5pyH1o7KaJfE GSefQw3o/lgt+FU6twqDPoyf4N+V31T70wbzZIvAXXzvtu8Xw= = X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: d@davea.name List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 73 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1356052648 news.xs4all.nl 6940 [2001:888:2000:d::a6]:36108 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:35252 On 12/20/2012 07:19 PM, Larry.Martell@gmail.com wrote: > I have a list of tuples that contains a tool_id, a time, and a message. I want to select from this list all the elements where the message matches some string, and all the other elements where the time is within some diff of any matching message for that tool. > > Here is how I am currently doing this: No, it's not. This is a fragment of code, without enough clues as to what else is going. We can guess, but that's likely to make a mess. First question is whether this code works exactly correctly? Are you only concerned about speed, not fixing features? As far as I can tell, the logic that includes the time comparison is bogus. You don't do anything there to worry about the value of tup[2], just whether some item has a nearby time. Of course, I could misunderstand the spec. Are you making a global called 'self' ? That name is by convention only used in methods to designate the instance object. What's the attribute self? Can cdata have duplicates, and are they significant? Are you including the time building that as part of your 2 hour measurement? Is the list sorted in any way? Chances are your performance bottleneck is the doubly-nested loop. You have a list comprehension at top-level code, and inside it calls a function that also loops over the 600,000 items. So the inner loop gets executed 360 billion times. You can cut this down drastically by some judicious sorting, as well as by having a map of lists, where the map is keyed by the tool. > > # record time for each message matching the specified message for each tool > messageTimes = {} You're building a dictionary; are you actually using the value (1), or is only the key relevant? A set is a dict without a value. But more importantly, you never look up anything in this dictionary. So why isn't it a list? For that matter, why don't you just use the messageTimes list? > for row in cdata: # tool, time, message > if self.message in row[2]: > messageTimes[row[0], row[1]] = 1 > > # now pull out each message that is within the time diff for each matched message > # as well as the matched messages themselves > > def determine(tup): > if self.message in tup[2]: return True # matched message > > for (tool, date_time) in messageTimes: > if tool == tup[0]: > if abs(date_time-tup[1]) <= tdiff: > return True > > return False > > cdata[:] = [tup for tup in cdata if determine(tup)] As the code exists, there's no need to copy the list. Just do a simple bind. > > This code works, but it takes way too long to run - e.g. when cdata has 600,000 elements (which is typical for my app) it takes 2 hours for this to run. > > Can anyone give me some suggestions on speeding this up? > > TIA! -- DaveA