Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #101891 > unrolled thread

Why generators take long time?

Started byArshpreet Singh <arsh840@gmail.com>
First post2016-01-18 23:27 -0800
Last post2016-01-19 15:25 -0500
Articles 8 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Why generators take long time? Arshpreet Singh <arsh840@gmail.com> - 2016-01-18 23:27 -0800
    Re: Why generators take long time? Arshpreet Singh <arsh840@gmail.com> - 2016-01-19 00:06 -0800
    Re: Why generators take long time? Steven D'Aprano <steve@pearwood.info> - 2016-01-19 21:11 +1100
      Re: Why generators take long time? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-01-19 10:24 +0000
        Re: Why generators take long time? Steven D'Aprano <steve@pearwood.info> - 2016-01-20 10:58 +1100
      Re: Why generators take long time? Arshpreet Singh <arsh840@gmail.com> - 2016-01-19 11:35 -0800
    Re: Why generators take long time? Jason Swails <jason.swails@gmail.com> - 2016-01-19 15:19 -0500
    Re: Why generators take long time? Jason Swails <jason.swails@gmail.com> - 2016-01-19 15:25 -0500

#101891 — Why generators take long time?

FromArshpreet Singh <arsh840@gmail.com>
Date2016-01-18 23:27 -0800
SubjectWhy generators take long time?
Message-ID<1e092318-5052-4383-aabe-dd979d72645d@googlegroups.com>
I was playing with Generators and found that using Generators time is bit more than list-comprehensions or I am doing it wrong?


Function with List comprehensions:

def sum_text(number_range):
    return sum([i*i for i in xrange(number_range)])

%timeit sum_text(100000000)
1 loops, best of 3: 14.8 s per loop

Using generator Expressions:

def sum_text(number_range):

    return sum((i*i for i in xrange(number_range))) 

%timeit sum_text(100000000)

1 loops, best of 3: 16.4 s per loop

[toc] | [next] | [standalone]


#101892

FromArshpreet Singh <arsh840@gmail.com>
Date2016-01-19 00:06 -0800
Message-ID<aed75e44-e6b4-4c0b-b9f0-bfcd26d5ba00@googlegroups.com>
In reply to#101891
On Tuesday, 19 January 2016 12:58:28 UTC+5:30, Arshpreet Singh  wrote:
> I was playing with Generators and found that using Generators time is bit more than list-comprehensions or I am doing it wrong?
> 
> 
> Function with List comprehensions:
> 
> def sum_text(number_range):
>     return sum([i*i for i in xrange(number_range)])
> 
> %timeit sum_text(100000000)
> 1 loops, best of 3: 14.8 s per loop
> 
> Using generator Expressions:
> 
> def sum_text(number_range):
> 
>     return sum((i*i for i in xrange(number_range))) 
> 
> %timeit sum_text(100000000)
> 
> 1 loops, best of 3: 16.4 s per loop

Or even when I use generator function:

def sum_text(number_range):
    yield sum((i*i for i in xrange(number_range)))

another function to get value:

def get_sum():
    for i in sum_text(100000000):
        return i

%timeit get_sum()
1 loops, best of 3: 16.1 s per loop

[toc] | [prev] | [next] | [standalone]


#101897

FromSteven D'Aprano <steve@pearwood.info>
Date2016-01-19 21:11 +1100
Message-ID<569e0beb$0$1621$c3e8da3$5496439d@news.astraweb.com>
In reply to#101891
On Tue, 19 Jan 2016 06:27 pm, Arshpreet Singh wrote:

> 
> I was playing with Generators and found that using Generators time is bit
> more than list-comprehensions or I am doing it wrong?

Generators may have slightly more overhead than iterating over a list. They
save memory, not time.

On my machine, I see a small difference of about 15% between summing a list
comp or a generator expression:

[steve@ando ~]$ python -m timeit "sum([i for i in xrange(1000)])"
10000 loops, best of 3: 102 usec per loop

[steve@ando ~]$ python -m timeit "sum(i for i in xrange(1000))"
10000 loops, best of 3: 117 usec per loop



But both are MUCH slower than summing the xrange object itself:

[steve@ando ~]$ python -m timeit "sum(xrange(1000))"
10000 loops, best of 3: 21.4 usec per loop



The generator has to pause execution of the code, then start it up again,
over and over again, while the list comprehension can run without
interruption, then iterate over that list.

This is, I think, the fastest way to iterate over the objects, which
eliminates as much of the overhead as possible:


[steve@ando ~]$ python -m timeit -s "from collections import deque" 
    -s "it = iter([i for i in xrange(1000)])" "deque(it, maxlen=0)"
1000000 loops, best of 3: 0.913 usec per loop


[steve@ando ~]$ python -m timeit -s "from collections import deque" 
    -s "it = (i for i in xrange(1000))" "deque(it, maxlen=0)"
1000000 loops, best of 3: 0.965 usec per loop


which is about a 6% difference. So if you eliminate everything else, and
just iterate over a pre-populated list as fast as possible, versus a simple
generator expression, the list is about 6% faster.

BUT there is a lot of variation in these timings. For the list comprehension
version, I ran it four times and got these four results:

1000000 loops, best of 3: 0.913 usec per loop
1000000 loops, best of 3: 0.911 usec per loop
1000000 loops, best of 3: 0.977 usec per loop
1000000 loops, best of 3: 0.938 usec per loop

and for the generator expression:

1000000 loops, best of 3: 0.965 usec per loop
1000000 loops, best of 3: 0.914 usec per loop
1000000 loops, best of 3: 0.981 usec per loop
1000000 loops, best of 3: 0.931 usec per loop

So there's plenty of random variation in the times.




-- 
Steven

[toc] | [prev] | [next] | [standalone]


#101899

FromOscar Benjamin <oscar.j.benjamin@gmail.com>
Date2016-01-19 10:24 +0000
Message-ID<mailman.105.1453199051.15297.python-list@python.org>
In reply to#101897
On 19 Jan 2016 10:16, "Steven D'Aprano" <steve@pearwood.info> wrote:
>
> [steve@ando ~]$ python -m timeit -s "from collections import deque"
>     -s "it = iter([i for i in xrange(1000)])" "deque(it, maxlen=0)"
> 1000000 loops, best of 3: 0.913 usec per loop
>
>
> [steve@ando ~]$ python -m timeit -s "from collections import deque"
>     -s "it = (i for i in xrange(1000))" "deque(it, maxlen=0)"
> 1000000 loops, best of 3: 0.965 usec per loop

Surely the iterator in the above benchmarks is consumed during the first of
1000000 loops. Subsequent loops are just calling next on an exhausted
iterator.

--
Oscar

[toc] | [prev] | [next] | [standalone]


#101912

FromSteven D'Aprano <steve@pearwood.info>
Date2016-01-20 10:58 +1100
Message-ID<569ecdb4$0$1600$c3e8da3$5496439d@news.astraweb.com>
In reply to#101899
On Tue, 19 Jan 2016 09:24 pm, Oscar Benjamin wrote:

> On 19 Jan 2016 10:16, "Steven D'Aprano" <steve@pearwood.info> wrote:
>>
>> [steve@ando ~]$ python -m timeit -s "from collections import deque"
>>     -s "it = iter([i for i in xrange(1000)])" "deque(it, maxlen=0)"
>> 1000000 loops, best of 3: 0.913 usec per loop
>>
>>
>> [steve@ando ~]$ python -m timeit -s "from collections import deque"
>>     -s "it = (i for i in xrange(1000))" "deque(it, maxlen=0)"
>> 1000000 loops, best of 3: 0.965 usec per loop
> 
> Surely the iterator in the above benchmarks is consumed during the first
> of 1000000 loops. Subsequent loops are just calling next on an exhausted
> iterator.

You know, I think you're right. Damn.

I keep getting bitten by that aspects of iterators and timeit.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#101904

FromArshpreet Singh <arsh840@gmail.com>
Date2016-01-19 11:35 -0800
Message-ID<8aca5094-8690-41db-8fa6-92243ed39712@googlegroups.com>
In reply to#101897
On Tuesday, 19 January 2016 15:42:16 UTC+5:30, Steven D'Aprano  wrote:
 
> [steve@ando ~]$ python -m timeit -s "from collections import deque" 
>     -s "it = iter([i for i in xrange(1000)])" "deque(it, maxlen=0)"
> 1000000 loops, best of 3: 0.913 usec per loop
> 
> 
> [steve@ando ~]$ python -m timeit -s "from collections import deque" 
>     -s "it = (i for i in xrange(1000))" "deque(it, maxlen=0)"
> 1000000 loops, best of 3: 0.965 usec per loop

Thanks Steven, it was really helpful answer.  

[toc] | [prev] | [next] | [standalone]


#101906

FromJason Swails <jason.swails@gmail.com>
Date2016-01-19 15:19 -0500
Message-ID<mailman.111.1453234780.15297.python-list@python.org>
In reply to#101891
On Tue, Jan 19, 2016 at 2:27 AM, Arshpreet Singh <arsh840@gmail.com> wrote:

>
> I was playing with Generators and found that using Generators time is bit
> more than list-comprehensions or I am doing it wrong?
>
>
> Function with List comprehensions:
>
> def sum_text(number_range):
>     return sum([i*i for i in xrange(number_range)])
>
> %timeit sum_text(100000000)
> 1 loops, best of 3: 14.8 s per loop
>
> Using generator Expressions:
>
> def sum_text(number_range):
>
>     return sum((i*i for i in xrange(number_range)))
>
> %timeit sum_text(100000000)
>
> 1 loops, best of 3: 16.4 s per loop
>

​Steven already pointed out the additional overhead in a generator
expression vs. a list comprehension.  In addition to the memory savings you
get via generator expressions, though, you can also get significant time
savings when generator expressions have the ability to short-circuit.

For instance, have a look at the following:

In [1]: import random

In [2]: %timeit all(random.random() < 0.5 for i in range(1000))
The slowest run took 4.85 times longer than the fastest. This could mean
that an intermediate result is being cached
100000 loops, best of 3: 3.57 µs per loop

In [3]: %timeit all([random.random() < 0.5 for i in range(1000)])
1000 loops, best of 3: 422 µs per loop

In [4]: %timeit any(random.random() < 0.5 for i in range(1000))
100000 loops, best of 3: 3.18 µs per loop

In [5]: %timeit any([random.random() < 0.5 for i in range(1000)])
1000 loops, best of 3: 408 µs per loop

This is using IPython with Python 3.5.  The difference here is that for
functions that short-circuit (like any and all), the generator expression
does not have to exhaust all of its elements (particularly since for each
element there's a 50-50 chance of being True or False in each case).  In
this case, the difference is a couple orders of magnitude.  The larger the
range argument is, the bigger this difference.

Also, in Python 2, the generator expression does not leak into the global
namespace, while the list comprehension does:

Python 2.7.10 (default, Jul 14 2015, 19:46:27)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> list(i for i in range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> i
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'i' is not defined
>>> [i for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> i
9

Python 3 does not leak the iterator variable in either case.  However, it
would be madness to have code actually relying on this behavior :).

At the end of the day, I use list comprehensions in the following
circumstances:

- I *know* I won't blow memory with a too-large list
- I want to iterate over the object multiple times or I want/may want
non-sequential access
- I know I want all the elements I'm creating (i.e., no chance of
short-circuiting)

I use generator expressions when

- I *might* want to

All the best,
Jason

P.S. There is a "cross-over" point where the memory requirements of the
list comp passes the generator overhead.  For instance:


In [17]: %timeit sum(i for i in range(10000000))
1 loops, best of 3: 2.08 s per loop

In [18]: %timeit sum([i for i in range(10000000)])
1 loops, best of 3: 1.86 s per loop

In [19]: %timeit sum(i for i in range(100000000))
1 loops, best of 3: 21.8 s per loop

In [20]: %timeit sum([i for i in range(100000000)])
1 loops, best of 3: 26.1 s per loop

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher

[toc] | [prev] | [next] | [standalone]


#101907

FromJason Swails <jason.swails@gmail.com>
Date2016-01-19 15:25 -0500
Message-ID<mailman.112.1453235154.15297.python-list@python.org>
In reply to#101891
On Tue, Jan 19, 2016 at 3:19 PM, Jason Swails <jason.swails@gmail.com>
wrote:

>
> I use generator expressions when
>
> - I *might* want to
>

​I forgot to finish my thought here.  I use generator expressions when I
don't want to worry about memory, there's a decent chance of
short-circuiting​, or I just want to do some simple iteration where the
iterator will not be the bottleneck (which is almost always).

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web