Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #35262
| Path | csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!eweka.nl!lightspeed.eweka.nl!194.134.4.91.MISMATCH!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <msirenef@lightbird.net> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'attribute': 0.05; 'diff': 0.05; 'executed': 0.07; 'list?': 0.07; 'matches': 0.07; 'subject:code': 0.07; 'tool,': 0.07; 'works.': 0.07; 'subject:help': 0.07; 'python': 0.09; '(1,': 0.09; 'dict': 0.09; 'exists,': 0.09; 'keyed': 0.09; 'logic': 0.09; 'loop.': 0.09; 'matched': 0.09; 'optimizing': 0.09; 'python:': 0.09; 'rows': 0.09; 'def': 0.10; 'value.': 0.15; '(1),': 0.16; 'better:': 0.16; 'clues': 0.16; 'dictionary.': 0.16; 'keys.': 0.16; 'loops': 0.16; 'parts.': 0.16; 'received:74.55.86': 0.16; 'received:74.55.86.74': 0.16; 'received:smtp.webfaction.com': 0.16; 'received:webfaction.com': 0.16; 'row': 0.16; 'run.': 0.16; 'set,': 0.16; 'subject:making': 0.16; 'tool.': 0.16; 'top-level': 0.16; 'tup': 0.16; 'string': 0.17; 'wrote:': 0.17; 'instance': 0.17; 'items.': 0.17; 'string,': 0.17; 'typical': 0.17; '>>>': 0.18; 'code,': 0.18; 'thanks.': 0.21; 'fixing': 0.22; 'object.': 0.22; 'simpler': 0.22; 'tuples': 0.22; 'elements': 0.23; 'matching': 0.23; 'specified': 0.23; 'statement': 0.23; 'this:': 0.23; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; '(which': 0.26; 'select': 0.26; 'question': 0.27; 'convention': 0.27; 'all.': 0.28; 'run': 0.28; 'record': 0.28; '>>>>': 0.29; 'changes:': 0.29; 'comparison': 0.29; 'long.': 0.29; 'way?': 0.29; 'no,': 0.29; 'probably': 0.29; "i'm": 0.29; 'that.': 0.30; 'e.g.': 0.30; 'query': 0.30; 'thursday,': 0.30; 'function': 0.30; 'code': 0.31; 'gets': 0.32; 'december': 0.32; 'not.': 0.32; 'could': 0.32; 'message.': 0.33; 'function.': 0.33; 'anyone': 0.33; 'to:addr :python-list': 0.33; 'likely': 0.33; 'themselves': 0.33; 'times.': 0.33; 'skip:d 20': 0.34; 'version': 0.34; 'self': 0.34; 'list': 0.35; 'false': 0.35; 'faster': 0.35; 'doing': 0.35; 'pm,': 0.35; 'too.': 0.35; 'there': 0.35; 'list.': 0.35; 'really': 0.36; 'tool': 0.36; 'but': 0.36; 'anything': 0.36; 'subject:with': 0.36; 'should': 0.36; 'too': 0.36; 'enough': 0.36; 'does': 0.37; 'why': 0.37; 'item': 0.37; 'far': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'easier': 0.38; 'mean': 0.38; 'some': 0.38; 'performance': 0.39; 'to:addr:python.org': 0.39; 'takes': 0.39; 'received:192': 0.39; 'called': 0.39; 'map': 0.61; 'first': 0.61; 'interest': 0.62; 'time,': 0.62; 'email addr:gmail.com': 0.63; 'more': 0.63; 'within': 0.64; 'making': 0.64; 'here': 0.65; '20,': 0.65; 'want,': 0.65; 'hours': 0.66; 'hour': 0.69; 'cut': 0.71; 'sounds': 0.71; 'actually,': 0.84; 'column.': 0.84; 'correctly?': 0.84; 'experiment': 0.84; 'judicious': 0.84; 'nearby': 0.84; '360': 0.91; 'faster.': 0.91; 'fragment': 0.91; 'angel': 0.93 |
| Date | Thu, 20 Dec 2012 21:49:43 -0500 |
| From | Mitya Sirenef <msirenef@lightbird.net> |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 |
| MIME-Version | 1.0 |
| To | python-list@python.org |
| Subject | Re: help with making my code more efficient |
| References | <bd5ae6b7-2440-42e4-a93c-eb877feebcfe@googlegroups.com> <mailman.1126.1356052648.29569.python-list@python.org> <d6aaa5b5-7d21-4018-ba9a-ea354b15b6c5@googlegroups.com> <50D3CBCD.4080203@lightbird.net> |
| In-Reply-To | <50D3CBCD.4080203@lightbird.net> |
| Content-Type | text/plain; charset=ISO-8859-1; format=flowed |
| Content-Transfer-Encoding | 7bit |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1132.1356058187.29569.python-list@python.org> (permalink) |
| Lines | 145 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1356058187 news.xs4all.nl 6889 [2001:888:2000:d::a6]:42865 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:35262 |
Show key headers only | View raw
On 12/20/2012 09:39 PM, Mitya Sirenef wrote:
> On 12/20/2012 08:46 PM, Larry.Martell@gmail.com wrote:
>> On Thursday, December 20, 2012 6:17:04 PM UTC-7, Dave Angel wrote:
>>> On 12/20/2012 07:19 PM, Larry.Martell@gmail.com wrote:
>>>
>>>> I have a list of tuples that contains a tool_id, a time, and a
>>>> message. I want to select from this list all the elements where the
>>>> message matches some string, and all the other elements where the
>>>> time is within some diff of any matching message for that tool.
>>>> Here is how I am currently doing this:
>>> No, it's not. This is a fragment of code, without enough clues as to
>>>
>>> what else is going. We can guess, but that's likely to make a mess.
>> Of course it's a fragment - it's part of a large program and I was
>> just showing the relevant parts.
>>
>>> First question is whether this code works exactly correctly?
>> Yes, the code works. I end up with just the rows I want.
>>
>>> Are you only concerned about speed, not fixing features?
>> Don't know what you mean by 'fixing features'. The code does what I
>> want, it just takes too long.
>>
>>> As far as I can tell, the logic that includes the time comparison is
>>> bogus.
>> Not at all.
>>
>>> You don't do anything there to worry about the value of tup[2],
>>> just whether some
>>> item has a nearby time. Of course, I could misunderstand the spec.
>> The data comes from a database. tup[2] is a datetime column. tdiff
>> comes from a datetime.timedelta()
>>
>>> Are you making a global called 'self' ? That name is by convention only
>>> used in methods to designate the instance object. What's the attribute
>>> self?
>> Yes, self is my instance object. self.message contains the string of
>> interest that I need to look for.
>>
>>> Can cdata have duplicates, and are they significant?
>> No, it will not have duplicates.
>>
>>> Are you including the time building that as part of your 2 hour
>>> measurement?
>> No, the 2 hours is just the time to run the
>>
>> cdata[:] = [tup for tup in cdata if determine(tup)]
>>
>>> Is the list sorted in any way?
>> Yes, the list is sorted by tool and datetime.
>>
>>> Chances are your performance bottleneck is the doubly-nested loop. You
>>> have a list comprehension at top-level code, and inside it calls a
>>> function that also loops over the 600,000 items. So the inner loop
>>> gets
>>> executed 360 billion times. You can cut this down drastically by some
>>> judicious sorting, as well as by having a map of lists, where the
>>> map is
>>> keyed by the tool.
>> Thanks. I will try that.
>>
>>>> # record time for each message matching the specified message for
>>>> each tool
>>>> messageTimes = {}
>>> You're building a dictionary; are you actually using the value (1), or
>>> is only the key relevant?
>> Only the keys.
>>
>>> A set is a dict without a value.
>> Yes, I could use a set, but I don't think that would make it
>> measurably faster.
>>
>>> But more mportantly, you never look up anything in this dictionary.
>>> So why
>>> isn't it a list? For that matter, why don't you just use the
>>> messageTimes list?
>> Yes, it could be a list too.
>>>> for row in cdata: # tool, time, message
>>>> if self.message in row[2]:
>>>> messageTimes[row[0], row[1]] = 1
>>>> # now pull out each message that is within the time diff for each
>>>> matched message
>>>> # as well as the matched messages themselves
>>>> def determine(tup):
>>>> if self.message in tup[2]: return True # matched message
>>>> for (tool, date_time) in messageTimes:
>>>> if tool == tup[0]:
>>>> if abs(date_time-tup[1]) <= tdiff:
>>>> return True
>>>> return False
>>>> cdata[:] = [tup for tup in cdata if determine(tup)]
>>>
>>>
>>> As the code exists, there's no need to copy the list. Just do a simple
>>> bind.
>> This statement is to remove the items from cdata that I don't want. I
>> don't know what you mean by bind. I'm not familiar with that python
>> function.
>>
>>>
>>>
>>>> This code works, but it takes way too long to run - e.g. when cdata
>>>> has 600,000 elements (which is typical for my app) it takes 2 hours
>>>> for this to run.
>>>> Can anyone give me some suggestions on speeding this up?
>
>
> This code probably is not faster but it's simpler and may be easier
> for you to work with
> to experiment with speed-improving changes:
>
>
> diffrng = 1
>
> L = [
> # id, time, string
> (1, 5, "ok"),
> (1, 6, "ok"),
> (1, 7, "no"),
> (1, 8, "no"),
> ]
>
> match_times = [t[1] for t in L if "ok" in t[2]]
>
> def in_range(timeval):
> return bool( min([abs(timeval-v) for v in match_times]) <= diffrng )
>
> print([t for t in L if in_range(t[1])])
>
>
> But it really sounds like you could look into optimizing the db
> query and db indexes, etc.
>
>
Actually, it might be slower.. this version of in_range should be better:
def in_range(timeval):
return any( abs(timeval-v) <= diffrng for v in match_times )
--
Lark's Tongue Guide to Python: http://lightbird.net/larks/
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-20 16:19 -0800
Re: help with making my code more efficient Chris Angelico <rosuav@gmail.com> - 2012-12-21 11:38 +1100
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-20 16:43 -0800
Re: help with making my code more efficient Chris Angelico <rosuav@gmail.com> - 2012-12-21 13:56 +1100
Re: help with making my code more efficient Roy Smith <roy@panix.com> - 2012-12-20 22:30 -0500
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-20 16:43 -0800
Re: help with making my code more efficient Dave Angel <d@davea.name> - 2012-12-20 20:17 -0500
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-20 17:46 -0800
Re: help with making my code more efficient Mitya Sirenef <msirenef@lightbird.net> - 2012-12-20 21:39 -0500
Re: help with making my code more efficient Mitya Sirenef <msirenef@lightbird.net> - 2012-12-20 21:49 -0500
Re: help with making my code more efficient Dave Angel <d@davea.name> - 2012-12-20 22:31 -0500
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 09:57 -0800
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 09:57 -0800
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 12:36 -0800
Re: help with making my code more efficient Dave Angel <d@davea.name> - 2012-12-21 22:19 -0500
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 20:47 -0800
Re: help with making my code more efficient Dave Angel <d@davea.name> - 2012-12-22 01:47 -0500
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-24 09:57 -0800
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-24 09:57 -0800
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 20:47 -0800
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-21 12:36 -0800
Re: help with making my code more efficient "Larry.Martell@gmail.com" <Larry.Martell@gmail.com> - 2012-12-20 17:46 -0800
Re: help with making my code more efficient MRAB <python@mrabarnett.plus.com> - 2012-12-21 02:08 +0000
csiph-web