Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #12596 > unrolled thread

List comprehension timing difference.

Started byBart Kastermans <bkasterm@gmail.com>
First post2011-09-01 18:35 -0600
Last post2011-09-02 20:15 -0600
Articles 5 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  List comprehension timing difference. Bart Kastermans <bkasterm@gmail.com> - 2011-09-01 18:35 -0600
    Re: List comprehension timing difference. MRAB <python@mrabarnett.plus.com> - 2011-09-02 02:12 +0100
      Re: List comprehension timing difference. Bart Kastermans <bkasterm@gmail.com> - 2011-09-02 07:54 -0600
        Re: List comprehension timing difference. ting@thsu.org - 2011-09-02 07:50 -0700
          Re: List comprehension timing difference. Bart Kastermans <bkasterm@gmail.com> - 2011-09-02 20:15 -0600

#12596 — List comprehension timing difference.

FromBart Kastermans <bkasterm@gmail.com>
Date2011-09-01 18:35 -0600
SubjectList comprehension timing difference.
Message-ID<87zkinkcht.fsf@gmail.com>
In the following code I create the graph with vertices
sgb-words.txt (the file of 5 letter words from the
stanford graphbase), and an edge if two words differ 
by one letter.  The two methods I wrote seem to me to 
likely perform the same computations, the list comprehension 
is faster though (281 seconds VS 305 seconds on my dell mini).

Is the right interpretation of this timing difference
that the comprehension is performed in the lower level
C code?

As this time I have no other conjecture about the cause.

---------------------------------------------------------
import time
import copy

data = map (lambda x: x.strip(), open('sgb-words.txt').readlines())

def d (w1, w2):
    count = 0
    for idx in range(0,5):
        if w1[idx] != w2[idx]:
            count += 1
    return count

print "creating graph"
t0 = time.clock ()
graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a < b]
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."

t0 = time.clock ()
graph2 = []
for i in range (0, len(data)):
    for j in range(0,len(data)):
        if d(data[i],data[j]) == 1 and i < j:
            graph2.append ([i,j])
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."

[toc] | [next] | [standalone]


#12598

FromMRAB <python@mrabarnett.plus.com>
Date2011-09-02 02:12 +0100
Message-ID<mailman.676.1314925972.27778.python-list@python.org>
In reply to#12596
On 02/09/2011 01:35, Bart Kastermans wrote:
>
> In the following code I create the graph with vertices
> sgb-words.txt (the file of 5 letter words from the
> stanford graphbase), and an edge if two words differ
> by one letter.  The two methods I wrote seem to me to
> likely perform the same computations, the list comprehension
> is faster though (281 seconds VS 305 seconds on my dell mini).
>
> Is the right interpretation of this timing difference
> that the comprehension is performed in the lower level
> C code?
>
> As this time I have no other conjecture about the cause.
>
> ---------------------------------------------------------
> import time
> import copy
>
> data = map (lambda x: x.strip(), open('sgb-words.txt').readlines())
>
> def d (w1, w2):
>      count = 0
>      for idx in range(0,5):
>          if w1[idx] != w2[idx]:
>              count += 1
>      return count
>
> print "creating graph"
> t0 = time.clock ()
> graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a<  b]
> t1 = time.clock ()
> print "took " + str (t1 - t0) + " seconds."
>
> t0 = time.clock ()
> graph2 = []
> for i in range (0, len(data)):
>      for j in range(0,len(data)):
>          if d(data[i],data[j]) == 1 and i<  j:
>              graph2.append ([i,j])
> t1 = time.clock ()
> print "took " + str (t1 - t0) + " seconds."

Are they actually equivalent? Does graph == graph2?

The first version (list comprehension) creates a list of pairs of
values:

     [a, b]

whereas the second version (for loops) creates a list of pairs of
indexes:

     [i, j]

The second version has subscripting ("data[i]" and "data[j]"), which
will slow it down.

[toc] | [prev] | [next] | [standalone]


#12642

FromBart Kastermans <bkasterm@gmail.com>
Date2011-09-02 07:54 -0600
Message-ID<87sjofjbh1.fsf@gmail.com>
In reply to#12598
MRAB <python@mrabarnett.plus.com> writes:

> On 02/09/2011 01:35, Bart Kastermans wrote:

>> graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a<  b]

>> graph2 = []
>> for i in range (0, len(data)):
>>      for j in range(0,len(data)):
>>          if d(data[i],data[j]) == 1 and i<  j:
>>              graph2.append ([i,j])

>
> Are they actually equivalent? Does graph == graph2?
>
> The first version (list comprehension) creates a list of pairs of
> values:
>
>     [a, b]
>
> whereas the second version (for loops) creates a list of pairs of
> indexes:
>
>     [i, j]
>
> The second version has subscripting ("data[i]" and "data[j]"), which
> will slow it down.

You are absolutely right.  I had changed the code from the
equivalent:

    graph2 = []
    for i in range (0, len(data)):
        for j in range(0,len(data)):
            if d(data[i],data[j]) == 1 and i < j:
                graph2.append ([data[i],data[j]])

But then also tried the equivalent

    for a in data:
        for b in data:
            if d(a,b) == 1 and a < b:
                graph2.append([a,b])

Which does away with the indexing, and is just about exactly as
fast as the list comprehension.


That'll teach me; I almost didn't ask the question thinking it might 
be silly.  And it was, but I thought it for the wrong reason.  I tell my
students there are no stupid questions, I should listen to myself
more when I do.  Thanks!

[toc] | [prev] | [next] | [standalone]


#12643

Fromting@thsu.org
Date2011-09-02 07:50 -0700
Message-ID<07906043-d732-43ca-8313-bee4746eca75@fi7g2000vbb.googlegroups.com>
In reply to#12642
On Sep 2, 9:54 am, Bart Kastermans <bkast...@gmail.com> wrote:
> if d(a,b) == 1 and a < b:

It will probably be faster if you reverse the evaluation order of that
expression.

if a<b and d(a,b)==1:

That way the d() function is called less than half the time. Of course
this assumes that a<b is a faster evaluation than d(a,b), but I think
that's true for your example.
--
// T.Hsu

[toc] | [prev] | [next] | [standalone]


#12686

FromBart Kastermans <bkasterm@gmail.com>
Date2011-09-02 20:15 -0600
Message-ID<87bov29xsg.fsf@gmail.com>
In reply to#12643
ting@thsu.org writes:

> On Sep 2, 9:54 am, Bart Kastermans <bkast...@gmail.com> wrote:
>> if d(a,b) == 1 and a < b:
>
> It will probably be faster if you reverse the evaluation order of that
> expression.
>
> if a<b and d(a,b)==1:
>
> That way the d() function is called less than half the time. Of course
> this assumes that a<b is a faster evaluation than d(a,b), but I think
> that's true for your example.
> --
> // T.Hsu

Indeed makes quite a difference, goes from 275 seconds down to
153 seconds.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web