Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #101906

Re: Why generators take long time?

Path csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From Jason Swails <jason.swails@gmail.com>
Newsgroups comp.lang.python
Subject Re: Why generators take long time?
Date Tue, 19 Jan 2016 15:19:37 -0500
Lines 113
Message-ID <mailman.111.1453234780.15297.python-list@python.org> (permalink)
References <1e092318-5052-4383-aabe-dd979d72645d@googlegroups.com>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
X-Trace news.uni-berlin.de TOdAHbEtdri4m2x/x9H/CwwQ/iy107WIZgv1iQnmzcmw==
Return-Path <jason.swails@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.001
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'expressions': 0.07; 'cc:addr:python-list': 0.09; '[0,': 0.09; '[1]:': 0.09; '[2]:': 0.09; '[3]:': 0.09; 'cached': 0.09; 'generators': 0.09; 'iterate': 0.09; 'nameerror:': 0.09; 'subject:Why': 0.09; 'python': 0.10; 'jan': 0.11; 'def': 0.13; 'argument': 0.15; 'intermediate': 0.15; '2016': 0.16; '6.0': 0.16; '[4]:': 0.16; 'cc:name:python list': 0.16; 'comp': 0.16; 'expressions,': 0.16; 'instance:': 0.16; 'iterator': 0.16; 'magnitude.': 0.16; 'namespace,': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'slowest': 0.16; 'wrote:': 0.16; 'memory': 0.17; '&lt;': 0.18; 'case.': 0.18; 'element': 0.18; 'instance,': 0.18; 'passes': 0.18; 'pointed': 0.18; 'email addr:gmail.com&gt;': 0.18; 'variable': 0.18; '>>>': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; '"",': 0.22; 'am,': 0.23; 'defined': 0.23; 'bit': 0.23; '(like': 0.23; 'bigger': 0.23; 'elements': 0.23; 'import': 0.24; '(most': 0.24; 'header:In-Reply-To:1': 0.24; 'compatible': 0.27; 'message- id:@mail.gmail.com': 0.27; 'function': 0.28; '0.5': 0.29; '100000': 0.29; 'case).': 0.29; 'random': 0.29; "i'm": 0.30; 'creating': 0.30; 'code': 0.30; '(i.e.,': 0.30; 'though,': 0.32; 'run': 0.33; 'point': 0.33; 'traceback': 0.33; 'case,': 0.34; 'tue,': 0.34; 'file': 0.34; 'list': 0.34; 'requirements': 0.35; 'skip:& 20': 0.35; 'received:google.com': 0.35; 'best,': 0.35; 'could': 0.35; 'false': 0.35; 'jason': 0.35; 'subject:time': 0.35; 'skip:i 20': 0.36; 'there': 0.36; 'received:209.85': 0.36; 'apple': 0.36; 'subject:?': 0.36; 'subject:: ': 0.37; 'being': 0.37; 'skip:& 10': 0.37; 'received:209.85.213': 0.37; 'skip:9 10': 0.37; 'doing': 0.38; 'difference': 0.38; "won't": 0.38; 'received:209': 0.38; 'mean': 0.38; 'end': 0.39; 'does': 0.39; 'where': 0.40; 'chance': 0.60; 'behavior': 0.61; 'per': 0.62; 'skip:n 10': 0.62; 'more': 0.63; 'times': 0.63; 'p.s.': 0.65; 'day,': 0.65; 'here': 0.66; 'jul': 0.72; 'savings': 0.79; '4.2.1': 0.84; '408': 0.84; '50-50': 0.84; 'blow': 0.84; 'darwin': 0.84; 'difference.': 0.84; 'madness': 0.84; 'subject:long': 0.84; ':).': 0.91; 'leak': 0.91
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ZoMOgQMwGZl2F+tkUdb/IL+5SMEi37lJHscRYHkYHyM=; b=ffQc7jNhAHFSYbiyKSv8NGFdBFBjXKKudSiTz8mb21MRo7cN562zCYJgzohbqtE2HQ 2PxdmWkQd5+Ai/2jdknhTj4Y1dwUhxWPjaQvmyiYhJRsffalNxCU4o32LqM7rp8UhS8n V0Fl1y3UH3vg/IEwNSfuuyWGEh1QRPzNKxIXsdjAs5B4DdMBscW3HI0nT+N6tZJktxqo zNCYvsdS1JnT8MVwnZu0+CNfziK7gTXyluogxmJGUwMF+NhoRcc7uPogUlYsbLjzkyIO 3unOTLqlOru4gafe6IEoP2UuOHKqBcA/JBQV1MEaFDJnbZSAUUApzqWKG3yE5HyyCo41 wT3A==
X-Google-DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=ZoMOgQMwGZl2F+tkUdb/IL+5SMEi37lJHscRYHkYHyM=; b=YT7Inf5K2MPqmLPoJ/jR+Z8vajRs+j+JmltYyItdKG0Bw6SlulnhEmmZDA9Xg4i5os Sn9+RKtroNpAU1fuT3t3qyC5cArAybg7RvEDQCvQmTz1dCbL6SWNm5xaBK/AjK6xUNgn 8U979EOoRSF9FKGNysbDT6HDje8lpgOCLvXX1Y6Ml7Ih4+U9QDYgiUfUzh6pYla857VR yPfYscKsDtr7tqH5IVf+/adVJ5xJ2PhJkaHHCywK1MZb8X5VNH+dOWAnRfALBvier85T 1HJe3u820WRaMKZP0TfM+gRoP0noSxQByLbgnqaF5KfGE0Mtsw24m8r3OfZsqiNeBrES LGXA==
X-Gm-Message-State ALoCoQk8Qv+sbuNiWLsICG052iyRszIyGZx69mshxm96cOUlRyKRctsO1+daszZOF4FmEygMWouuyH4fOE/bpOMlLURBd1ztMA==
X-Received by 10.31.155.23 with SMTP id d23mr19268261vke.146.1453234777359; Tue, 19 Jan 2016 12:19:37 -0800 (PST)
In-Reply-To <1e092318-5052-4383-aabe-dd979d72645d@googlegroups.com>
X-Content-Filtered-By Mailman/MimeDel 2.1.20+
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Xref csiph.com comp.lang.python:101906

Show key headers only | View raw


On Tue, Jan 19, 2016 at 2:27 AM, Arshpreet Singh <arsh840@gmail.com> wrote:

>
> I was playing with Generators and found that using Generators time is bit
> more than list-comprehensions or I am doing it wrong?
>
>
> Function with List comprehensions:
>
> def sum_text(number_range):
>     return sum([i*i for i in xrange(number_range)])
>
> %timeit sum_text(100000000)
> 1 loops, best of 3: 14.8 s per loop
>
> Using generator Expressions:
>
> def sum_text(number_range):
>
>     return sum((i*i for i in xrange(number_range)))
>
> %timeit sum_text(100000000)
>
> 1 loops, best of 3: 16.4 s per loop
>

​Steven already pointed out the additional overhead in a generator
expression vs. a list comprehension.  In addition to the memory savings you
get via generator expressions, though, you can also get significant time
savings when generator expressions have the ability to short-circuit.

For instance, have a look at the following:

In [1]: import random

In [2]: %timeit all(random.random() < 0.5 for i in range(1000))
The slowest run took 4.85 times longer than the fastest. This could mean
that an intermediate result is being cached
100000 loops, best of 3: 3.57 µs per loop

In [3]: %timeit all([random.random() < 0.5 for i in range(1000)])
1000 loops, best of 3: 422 µs per loop

In [4]: %timeit any(random.random() < 0.5 for i in range(1000))
100000 loops, best of 3: 3.18 µs per loop

In [5]: %timeit any([random.random() < 0.5 for i in range(1000)])
1000 loops, best of 3: 408 µs per loop

This is using IPython with Python 3.5.  The difference here is that for
functions that short-circuit (like any and all), the generator expression
does not have to exhaust all of its elements (particularly since for each
element there's a 50-50 chance of being True or False in each case).  In
this case, the difference is a couple orders of magnitude.  The larger the
range argument is, the bigger this difference.

Also, in Python 2, the generator expression does not leak into the global
namespace, while the list comprehension does:

Python 2.7.10 (default, Jul 14 2015, 19:46:27)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> list(i for i in range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> i
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'i' is not defined
>>> [i for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> i
9

Python 3 does not leak the iterator variable in either case.  However, it
would be madness to have code actually relying on this behavior :).

At the end of the day, I use list comprehensions in the following
circumstances:

- I *know* I won't blow memory with a too-large list
- I want to iterate over the object multiple times or I want/may want
non-sequential access
- I know I want all the elements I'm creating (i.e., no chance of
short-circuiting)

I use generator expressions when

- I *might* want to

All the best,
Jason

P.S. There is a "cross-over" point where the memory requirements of the
list comp passes the generator overhead.  For instance:


In [17]: %timeit sum(i for i in range(10000000))
1 loops, best of 3: 2.08 s per loop

In [18]: %timeit sum([i for i in range(10000000)])
1 loops, best of 3: 1.86 s per loop

In [19]: %timeit sum(i for i in range(100000000))
1 loops, best of 3: 21.8 s per loop

In [20]: %timeit sum([i for i in range(100000000)])
1 loops, best of 3: 26.1 s per loop

-- 
Jason M. Swails
BioMaPS,
Rutgers University
Postdoctoral Researcher

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Why generators take long time? Arshpreet Singh <arsh840@gmail.com> - 2016-01-18 23:27 -0800
  Re: Why generators take long time? Arshpreet Singh <arsh840@gmail.com> - 2016-01-19 00:06 -0800
  Re: Why generators take long time? Steven D'Aprano <steve@pearwood.info> - 2016-01-19 21:11 +1100
    Re: Why generators take long time? Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2016-01-19 10:24 +0000
      Re: Why generators take long time? Steven D'Aprano <steve@pearwood.info> - 2016-01-20 10:58 +1100
    Re: Why generators take long time? Arshpreet Singh <arsh840@gmail.com> - 2016-01-19 11:35 -0800
  Re: Why generators take long time? Jason Swails <jason.swails@gmail.com> - 2016-01-19 15:19 -0500
  Re: Why generators take long time? Jason Swails <jason.swails@gmail.com> - 2016-01-19 15:25 -0500

csiph-web