Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35262

Re: help with making my code more efficient

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!eweka.nl!lightspeed.eweka.nl!194.134.4.91.MISMATCH!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <msirenef@lightbird.net>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'attribute': 0.05; 'diff': 0.05; 'executed': 0.07; 'list?': 0.07; 'matches': 0.07; 'subject:code': 0.07; 'tool,': 0.07; 'works.': 0.07; 'subject:help': 0.07; 'python': 0.09; '(1,': 0.09; 'dict': 0.09; 'exists,': 0.09; 'keyed': 0.09; 'logic': 0.09; 'loop.': 0.09; 'matched': 0.09; 'optimizing': 0.09; 'python:': 0.09; 'rows': 0.09; 'def': 0.10; 'value.': 0.15; '(1),': 0.16; 'better:': 0.16; 'clues': 0.16; 'dictionary.': 0.16; 'keys.': 0.16; 'loops': 0.16; 'parts.': 0.16; 'received:74.55.86': 0.16; 'received:74.55.86.74': 0.16; 'received:smtp.webfaction.com': 0.16; 'received:webfaction.com': 0.16; 'row': 0.16; 'run.': 0.16; 'set,': 0.16; 'subject:making': 0.16; 'tool.': 0.16; 'top-level': 0.16; 'tup': 0.16; 'string': 0.17; 'wrote:': 0.17; 'instance': 0.17; 'items.': 0.17; 'string,': 0.17; 'typical': 0.17; '>>>': 0.18; 'code,': 0.18; 'thanks.': 0.21; 'fixing': 0.22; 'object.': 0.22; 'simpler': 0.22; 'tuples': 0.22; 'elements': 0.23; 'matching': 0.23; 'specified': 0.23; 'statement': 0.23; 'this:': 0.23; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; '(which': 0.26; 'select': 0.26; 'question': 0.27; 'convention': 0.27; 'all.': 0.28; 'run': 0.28; 'record': 0.28; '>>>>': 0.29; 'changes:': 0.29; 'comparison': 0.29; 'long.': 0.29; 'way?': 0.29; 'no,': 0.29; 'probably': 0.29; "i'm": 0.29; 'that.': 0.30; 'e.g.': 0.30; 'query': 0.30; 'thursday,': 0.30; 'function': 0.30; 'code': 0.31; 'gets': 0.32; 'december': 0.32; 'not.': 0.32; 'could': 0.32; 'message.': 0.33; 'function.': 0.33; 'anyone': 0.33; 'to:addr :python-list': 0.33; 'likely': 0.33; 'themselves': 0.33; 'times.': 0.33; 'skip:d 20': 0.34; 'version': 0.34; 'self': 0.34; 'list': 0.35; 'false': 0.35; 'faster': 0.35; 'doing': 0.35; 'pm,': 0.35; 'too.': 0.35; 'there': 0.35; 'list.': 0.35; 'really': 0.36; 'tool': 0.36; 'but': 0.36; 'anything': 0.36; 'subject:with': 0.36; 'should': 0.36; 'too': 0.36; 'enough': 0.36; 'does': 0.37; 'why': 0.37; 'item': 0.37; 'far': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'easier': 0.38; 'mean': 0.38; 'some': 0.38; 'performance': 0.39; 'to:addr:python.org': 0.39; 'takes': 0.39; 'received:192': 0.39; 'called': 0.39; 'map': 0.61; 'first': 0.61; 'interest': 0.62; 'time,': 0.62; 'email addr:gmail.com': 0.63; 'more': 0.63; 'within': 0.64; 'making': 0.64; 'here': 0.65; '20,': 0.65; 'want,': 0.65; 'hours': 0.66; 'hour': 0.69; 'cut': 0.71; 'sounds': 0.71; 'actually,': 0.84; 'column.': 0.84; 'correctly?': 0.84; 'experiment': 0.84; 'judicious': 0.84; 'nearby': 0.84; '360': 0.91; 'faster.': 0.91; 'fragment': 0.91; 'angel': 0.93
Date Thu, 20 Dec 2012 21:49:43 -0500
From Mitya Sirenef <msirenef@lightbird.net>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1
MIME-Version 1.0
To python-list@python.org
Subject Re: help with making my code more efficient
References <bd5ae6b7-2440-42e4-a93c-eb877feebcfe@googlegroups.com> <mailman.1126.1356052648.29569.python-list@python.org> <d6aaa5b5-7d21-4018-ba9a-ea354b15b6c5@googlegroups.com> <50D3CBCD.4080203@lightbird.net>
In-Reply-To <50D3CBCD.4080203@lightbird.net>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1132.1356058187.29569.python-list@python.org> (permalink)
Lines 145
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1356058187 news.xs4all.nl 6889 [2001:888:2000:d::a6]:42865
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:35262

Show key headers only | View raw


On 12/20/2012 09:39 PM, Mitya Sirenef wrote:
> On 12/20/2012 08:46 PM, Larry.Martell@gmail.com wrote:
>> On Thursday, December 20, 2012 6:17:04 PM UTC-7, Dave Angel wrote:
>>> On 12/20/2012 07:19 PM, Larry.Martell@gmail.com wrote:
>>>
>>>> I have a list of tuples that contains a tool_id, a time, and a 
>>>> message. I want to select from this list all the elements where the 
>>>> message matches some string, and all the other elements where the 
>>>> time is within some diff of any matching message for that tool.
>>>> Here is how I am currently doing this:
>>> No, it's not.  This is a fragment of code, without enough clues as to
>>>
>>> what else is going.  We can guess, but that's likely to make a mess.
>> Of course it's a fragment - it's part of a large program and I was 
>> just showing the relevant parts.
>>
>>> First question is whether this code works exactly correctly?
>> Yes, the code works. I end up with just the rows I want.
>>
>>> Are you only concerned about speed, not fixing features?
>> Don't know what you mean by 'fixing features'. The code does what I 
>> want, it just takes too long.
>>
>>> As far as I can tell, the logic that includes the time comparison is 
>>> bogus.
>> Not at all.
>>
>>> You don't do  anything there to worry about the value of tup[2], 
>>> just whether some
>>> item has a nearby time.  Of course, I could misunderstand the spec.
>> The data comes from a database. tup[2] is a datetime column. tdiff 
>> comes from a datetime.timedelta()
>>
>>> Are you making a global called 'self' ? That name is by convention only
>>> used in methods to designate the instance object.  What's the attribute
>>> self?
>> Yes, self is my instance object. self.message contains the string of 
>> interest that I need to look for.
>>
>>> Can cdata have duplicates, and are they significant?
>> No, it will not have duplicates.
>>
>>> Are you including  the time building that as part of your 2 hour 
>>> measurement?
>> No, the 2 hours is just the time to run the
>>
>> cdata[:] = [tup for tup in cdata if determine(tup)]
>>
>>> Is the list sorted in any way?
>> Yes, the list is sorted by tool and datetime.
>>
>>> Chances are your performance bottleneck is the doubly-nested loop.  You
>>> have a list comprehension at top-level code, and inside it calls a
>>> function that also loops over the 600,000 items.  So the inner loop 
>>> gets
>>> executed 360 billion times.  You can cut this down drastically by some
>>> judicious sorting, as well as by having a map of lists, where the 
>>> map is
>>> keyed by the tool.
>> Thanks. I will try that.
>>
>>>> # record time for each message matching the specified message for 
>>>> each tool
>>>> messageTimes = {}
>>> You're building a dictionary;  are you actually using the value (1), or
>>>   is only the key relevant?
>> Only the keys.
>>
>>> A set is a dict without a value.
>> Yes, I could use a set, but I don't think that would make it 
>> measurably faster.
>>
>>> But more mportantly, you never look up anything in this dictionary.  
>>> So why
>>> isn't it a list?  For that matter, why don't you just use the
>>> messageTimes list?
>> Yes, it could be a list too.
>>>> for row in cdata:   # tool, time, message
>>>>      if self.message in row[2]:
>>>>          messageTimes[row[0], row[1]] = 1
>>>> # now pull out each message that is within the time diff for each 
>>>> matched message
>>>> # as well as the matched messages themselves
>>>> def determine(tup):
>>>>      if self.message in tup[2]: return True      # matched message
>>>>      for (tool, date_time) in messageTimes:
>>>>          if tool == tup[0]:
>>>>              if abs(date_time-tup[1]) <= tdiff:
>>>>                 return True
>>>>      return False
>>>>          cdata[:] = [tup for tup in cdata if determine(tup)]
>>>
>>>
>>> As the code exists, there's no need to copy the list.  Just do a simple
>>> bind.
>> This statement is to remove the items from cdata that I don't want. I 
>> don't know what you mean by bind. I'm not familiar with that python 
>> function.
>>
>>>
>>>
>>>> This code works, but it takes way too long to run - e.g. when cdata 
>>>> has 600,000 elements (which is typical for my app) it takes 2 hours 
>>>> for this to run.
>>>> Can anyone give me some suggestions on speeding this up?
>
>
> This code probably is not faster but it's simpler and may be easier 
> for you to work with
> to experiment with speed-improving changes:
>
>
> diffrng = 1
>
> L = [
>      # id, time, string
>      (1, 5, "ok"),
>      (1, 6, "ok"),
>      (1, 7, "no"),
>      (1, 8, "no"),
>      ]
>
> match_times = [t[1] for t in L if "ok" in t[2]]
>
> def in_range(timeval):
>     return bool( min([abs(timeval-v) for v in match_times]) <= diffrng )
>
> print([t for t in L if in_range(t[1])])
>
>
> But it really sounds like you could look into optimizing the db
> query and db indexes, etc.
>
>

Actually, it might be slower.. this version of in_range should be better:

def in_range(timeval):
     return any( abs(timeval-v) <= diffrng for v in match_times )



-- 
Lark's Tongue Guide to Python: http://lightbird.net/larks/

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-20 16:19 -0800
  Re: help with making my code more efficient Chris Angelico <rosuav@gmail.com> - 2012-12-21 11:38 +1100
    Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-20 16:43 -0800
      Re: help with making my code more efficient Chris Angelico <rosuav@gmail.com> - 2012-12-21 13:56 +1100
      Re: help with making my code more efficient Roy Smith <roy@panix.com> - 2012-12-20 22:30 -0500
    Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-20 16:43 -0800
  Re: help with making my code more efficient Dave Angel <d@davea.name> - 2012-12-20 20:17 -0500
    Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-20 17:46 -0800
      Re: help with making my code more efficient Mitya Sirenef <msirenef@lightbird.net> - 2012-12-20 21:39 -0500
      Re: help with making my code more efficient Mitya Sirenef <msirenef@lightbird.net> - 2012-12-20 21:49 -0500
      Re: help with making my code more efficient Dave Angel <d@davea.name> - 2012-12-20 22:31 -0500
        Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 09:57 -0800
        Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 09:57 -0800
          Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 12:36 -0800
            Re: help with making my code more efficient Dave Angel <d@davea.name> - 2012-12-21 22:19 -0500
              Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 20:47 -0800
                Re: help with making my code more efficient Dave Angel <d@davea.name> - 2012-12-22 01:47 -0500
                Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-24 09:57 -0800
                Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-24 09:57 -0800
              Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 20:47 -0800
          Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 12:36 -0800
    Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-20 17:46 -0800
  Re: help with making my code more efficient MRAB <python@mrabarnett.plus.com> - 2012-12-21 02:08 +0000

csiph-web