Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.dougwise.org!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!news.wiretrip.org!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Date: Fri, 02 Sep 2011 02:12:47 +0100
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0.1) Gecko/20110830 Thunderbird/6.0.1
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: List comprehension timing difference.
References: <87zkinkcht.fsf@gmail.com>
In-Reply-To: <87zkinkcht.fsf@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Reply-To: python-list@python.org
Newsgroups: comp.lang.python
Message-ID: <mailman.676.1314925972.27778.python-list@python.org>
Lines: 57
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:12598

On 02/09/2011 01:35, Bart Kastermans wrote:
>
> In the following code I create the graph with vertices
> sgb-words.txt (the file of 5 letter words from the
> stanford graphbase), and an edge if two words differ
> by one letter.  The two methods I wrote seem to me to
> likely perform the same computations, the list comprehension
> is faster though (281 seconds VS 305 seconds on my dell mini).
>
> Is the right interpretation of this timing difference
> that the comprehension is performed in the lower level
> C code?
>
> As this time I have no other conjecture about the cause.
>
> ---------------------------------------------------------
> import time
> import copy
>
> data = map (lambda x: x.strip(), open('sgb-words.txt').readlines())
>
> def d (w1, w2):
>      count = 0
>      for idx in range(0,5):
>          if w1[idx] != w2[idx]:
>              count += 1
>      return count
>
> print "creating graph"
> t0 = time.clock ()
> graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a<  b]
> t1 = time.clock ()
> print "took " + str (t1 - t0) + " seconds."
>
> t0 = time.clock ()
> graph2 = []
> for i in range (0, len(data)):
>      for j in range(0,len(data)):
>          if d(data[i],data[j]) == 1 and i<  j:
>              graph2.append ([i,j])
> t1 = time.clock ()
> print "took " + str (t1 - t0) + " seconds."

Are they actually equivalent? Does graph == graph2?

The first version (list comprehension) creates a list of pairs of
values:

     [a, b]

whereas the second version (for loops) creates a list of pairs of
indexes:

     [i, j]

The second version has subscripting ("data[i]" and "data[j]"), which
will slow it down.