Groups > comp.lang.python > #87505 > unrolled thread

Dict comprehensions - improvement to docs?

Started by	"Frank Millman" <frank@chagford.com>
First post	2015-03-16 07:25 +0200
Last post	2015-03-18 02:39 -0600
Articles	12 — 4 participants

Back to article view | Back to comp.lang.python

  Dict comprehensions - improvement to docs? "Frank Millman" <frank@chagford.com> - 2015-03-16 07:25 +0200
    Re: Dict comprehensions - improvement to docs? Paul Rubin <no.email@nospam.invalid> - 2015-03-15 23:22 -0700
      Re: Dict comprehensions - improvement to docs? "Frank Millman" <frank@chagford.com> - 2015-03-16 10:30 +0200
        Re: Dict comprehensions - improvement to docs? Paul Rubin <no.email@nospam.invalid> - 2015-03-16 01:38 -0700
          Re: Dict comprehensions - improvement to docs? "Frank Millman" <frank@chagford.com> - 2015-03-16 11:01 +0200
          Re: Dict comprehensions - improvement to docs? Ian Kelly <ian.g.kelly@gmail.com> - 2015-03-16 09:57 -0600
            Re: Dict comprehensions - improvement to docs? Paul Rubin <no.email@nospam.invalid> - 2015-03-16 15:41 -0700
          Re: Dict comprehensions - improvement to docs? "Frank Millman" <frank@chagford.com> - 2015-03-17 08:44 +0200
          Re: Dict comprehensions - improvement to docs? Lele Gaifax <lele@metapensiero.it> - 2015-03-17 15:06 +0100
          Re: Dict comprehensions - improvement to docs? Ian Kelly <ian.g.kelly@gmail.com> - 2015-03-17 09:48 -0600
          Re: Dict comprehensions - improvement to docs? "Frank Millman" <frank@chagford.com> - 2015-03-18 10:01 +0200
          Re: Dict comprehensions - improvement to docs? Ian Kelly <ian.g.kelly@gmail.com> - 2015-03-18 02:39 -0600

#87505 — Dict comprehensions - improvement to docs?

From	"Frank Millman" <frank@chagford.com>
Date	2015-03-16 07:25 +0200
Subject	Dict comprehensions - improvement to docs?
Message-ID	<mailman.410.1426483550.21433.python-list@python.org>

Hi all

I like dict comprehensions, but I don't use them very often, so when I do I 
need to look up the format.

I always struggle to find the information in the Library Reference. The 
obvious location, Mapping Types, shows various constructors, but not the 
comprehension.

https://docs.python.org/3/library/stdtypes.html#mapping-types-dict

So I turn to Google. It shows a couple of StackOverflow questions, and then 
a link to the Data Structures section of the Tutorial, which explains it 
succinctly.

https://docs.python.org/3/tutorial/datastructures.html#dictionaries

I feel that the Library Reference should be updated to include dict 
comprehensions.

Just checking here first before I raise a Documentation issue.

Frank Millman

[toc] | [next] | [standalone]

#87510

From	Paul Rubin <no.email@nospam.invalid>
Date	2015-03-15 23:22 -0700
Message-ID	<87fv95fom0.fsf@jester.gateway.sonic.net>
In reply to	#87505

"Frank Millman" <frank@chagford.com> writes:
> I like dict comprehensions, but I don't use them very often, so when I do I 
> need to look up the format.

I never felt a need for them.  Do they generate better code than

   d = dict((k,v) for k,v in [('name','paul'),('language','python')])  ?

Anyway, since they are syntax, they'd be in the language refrence rather
than the library reference.

[toc] | [prev] | [next] | [standalone]

#87531

From	"Frank Millman" <frank@chagford.com>
Date	2015-03-16 10:30 +0200
Message-ID	<mailman.427.1426494675.21433.python-list@python.org>
In reply to	#87510

"Paul Rubin" <no.email@nospam.invalid> wrote in message 
news:87fv95fom0.fsf@jester.gateway.sonic.net...
> "Frank Millman" <frank@chagford.com> writes:
>> I like dict comprehensions, but I don't use them very often, so when I do 
>> I
>> need to look up the format.
>
> I never felt a need for them.  Do they generate better code than
>
>   d = dict((k,v) for k,v in [('name','paul'),('language','python')])  ?
>

I ran timeit, and dict comps are quite a bit quicker. This is with Python 
3.4.1 -

C:\>python -m timeit -s "x=range(65, 91); y=(chr(z) for z in x)" "dict((a, 
b) for a, b in zip(x, y))"
100000 loops, best of 3: 16.1 usec per loop

C:\>python -m timeit -s "x=range(65, 91); y=(chr(z) for z in x)" "{a: b for 
a, b in zip(x, y)}"
100000 loops, best of 3: 6.38 usec per loop

Frank

[toc] | [prev] | [next] | [standalone]

#87534

From	Paul Rubin <no.email@nospam.invalid>
Date	2015-03-16 01:38 -0700
Message-ID	<87wq2hfibu.fsf@jester.gateway.sonic.net>
In reply to	#87531

"Frank Millman" <frank@chagford.com> writes:
> dict((a, b) for a, b in zip(x, y))
> 100000 loops, best of 3: 16.1 usec per loop
> {a: b for a, b in zip(x, y)}"
> 100000 loops, best of 3: 6.38 usec per loop

Hmm, I bet the difference is from the (a,b) consing all those tuples.

Can you try just dict(zip(x,y))  ?

[toc] | [prev] | [next] | [standalone]

#87539

From	"Frank Millman" <frank@chagford.com>
Date	2015-03-16 11:01 +0200
Message-ID	<mailman.432.1426496495.21433.python-list@python.org>
In reply to	#87534

"Paul Rubin" <no.email@nospam.invalid> wrote in message 
news:87wq2hfibu.fsf@jester.gateway.sonic.net...
> "Frank Millman" <frank@chagford.com> writes:
>> dict((a, b) for a, b in zip(x, y))
>> 100000 loops, best of 3: 16.1 usec per loop
>> {a: b for a, b in zip(x, y)}"
>> 100000 loops, best of 3: 6.38 usec per loop
>
> Hmm, I bet the difference is from the (a,b) consing all those tuples.
>
> Can you try just dict(zip(x,y))  ?

C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)" 
"dict(zip(x, y))"
100000 loops, best of 3: 11.9 usec per loop

C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)" "{a: b 
for a, b in zip(x, y)}"
100000 loops, best of 3: 7.24 usec per loop

I ran the dict comp again in case the machine load had changed - it is a bit 
slower, but not much.

Frank

[toc] | [prev] | [next] | [standalone]

#87570

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2015-03-16 09:57 -0600
Message-ID	<mailman.450.1426521470.21433.python-list@python.org>
In reply to	#87534

On Mon, Mar 16, 2015 at 3:01 AM, Frank Millman <frank@chagford.com> wrote:
> C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
> "dict(zip(x, y))"
> 100000 loops, best of 3: 11.9 usec per loop
>
> C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)" "{a: b
> for a, b in zip(x, y)}"
> 100000 loops, best of 3: 7.24 usec per loop

Since the setup code is only run once, the generator expression used
for y is only iterated over once. On every subsequent loop, zip is
producing an empty result. So this measurement is really just
capturing the overhead of the dict construction. Compare:

$ python3 -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
"dict(zip(x,y))"
1000000 loops, best of 3: 0.9 usec per loop
$ python3 -m timeit -s "x = range(65, 91); y = [chr(z) for z in x]"
"dict(zip(x,y))"
100000 loops, best of 3: 2.69 usec per loop
$ python3 -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
"{a:b for a,b in zip(x,y)}"
1000000 loops, best of 3: 0.837 usec per loop
$ python3 -m timeit -s "x = range(65, 91); y = [chr(z) for z in x]"
"{a:b for a,b in zip(x,y)}"
100000 loops, best of 3: 2.67 usec per loop

[toc] | [prev] | [next] | [standalone]

#87586

From	Paul Rubin <no.email@nospam.invalid>
Date	2015-03-16 15:41 -0700
Message-ID	<87bnjso99q.fsf@jester.gateway.sonic.net>
In reply to	#87570

Ian Kelly <ian.g.kelly@gmail.com> writes:
> Since the setup code is only run once, the generator expression used
> for y is only iterated over once. 

Ah, thanks, I'd been wondering what was going on with Frank's example
but hadn't gotten around to trying to analyze it.

[toc] | [prev] | [next] | [standalone]

#87625

From	"Frank Millman" <frank@chagford.com>
Date	2015-03-17 08:44 +0200
Message-ID	<mailman.490.1426574679.21433.python-list@python.org>
In reply to	#87534

"Ian Kelly" <ian.g.kelly@gmail.com> wrote in message 
news:CALwzid=u19YMkfJBhbLZi1qh2U9UK4ohY5wco1zO-i3T5AtyOA@mail.gmail.com...
> On Mon, Mar 16, 2015 at 3:01 AM, Frank Millman <frank@chagford.com> wrote:
>> C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
>> "dict(zip(x, y))"
>> 100000 loops, best of 3: 11.9 usec per loop
>>
>> C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)" "{a: 
>> b
>> for a, b in zip(x, y)}"
>> 100000 loops, best of 3: 7.24 usec per loop
>
> Since the setup code is only run once, the generator expression used
> for y is only iterated over once. On every subsequent loop, zip is
> producing an empty result. So this measurement is really just
> capturing the overhead of the dict construction. Compare:
>
> $ python3 -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
> "dict(zip(x,y))"
> 1000000 loops, best of 3: 0.9 usec per loop
> $ python3 -m timeit -s "x = range(65, 91); y = [chr(z) for z in x]"
> "dict(zip(x,y))"
> 100000 loops, best of 3: 2.69 usec per loop
> $ python3 -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
> "{a:b for a,b in zip(x,y)}"
> 1000000 loops, best of 3: 0.837 usec per loop
> $ python3 -m timeit -s "x = range(65, 91); y = [chr(z) for z in x]"
> "{a:b for a,b in zip(x,y)}"
> 100000 loops, best of 3: 2.67 usec per loop

Thanks for the explanation. I'll try not to make that mistake again.

However, to go back to the original example, we want to compare a dict 
comprehension with a dict() constructor using a generator expression.

Let's see if I have got this one right -

C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "dict((a, 
b) for a, b in zip(x, y))"
10000 loops, best of 3: 49.6 usec per loop

C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "{a: b for 
a, b in zip(x, y)}"
10000 loops, best of 3: 25.8 usec per loop

Or to use Paul's original example -

C:\>python -m timeit "d = dict((k, v) for k, v in [('name', 'paul'), 
('language', 'python')])
100000 loops, best of 3: 16.6 usec per loop

C:\>python -m timeit "d = {k: v for k, v in [('name', 'paul'), ('language', 
'python')]}
100000 loops, best of 3: 5.2 usec per loop

It seems that a dict comp is noticeably faster.

Does this sound right, or are there other factors I should be taking into 
account?

Frank

[toc] | [prev] | [next] | [standalone]

#87637

From	Lele Gaifax <lele@metapensiero.it>
Date	2015-03-17 15:06 +0100
Message-ID	<mailman.495.1426601188.21433.python-list@python.org>
In reply to	#87534

"Frank Millman" <frank@chagford.com> writes:

> It seems that a dict comp is noticeably faster.
>
> Does this sound right, or are there other factors I should be taking into 
> account?

The dict comp does not execute any function call. Consider the following:

    $ python3 -m timeit "d=dict()"
    10000000 loops, best of 3: 0.113 usec per loop
    $ python3 -m timeit "d={}"
    10000000 loops, best of 3: 0.0601 usec per loop

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it  |                 -- Fortunato Depero, 1929.

[toc] | [prev] | [next] | [standalone]

#87643

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2015-03-17 09:48 -0600
Message-ID	<mailman.498.1426611876.21433.python-list@python.org>
In reply to	#87534

On Tue, Mar 17, 2015 at 12:44 AM, Frank Millman <frank@chagford.com> wrote:
> Thanks for the explanation. I'll try not to make that mistake again.
>
> However, to go back to the original example, we want to compare a dict
> comprehension with a dict() constructor using a generator expression.
>
> Let's see if I have got this one right -
>
> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "dict((a,
> b) for a, b in zip(x, y))"
> 10000 loops, best of 3: 49.6 usec per loop
>
> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "{a: b for
> a, b in zip(x, y)}"
> 10000 loops, best of 3: 25.8 usec per loop

Why did you revert back to the no-op generator expression in the first
case instead of the more efficient dict(zip(x, y))?

[toc] | [prev] | [next] | [standalone]

#87668

From	"Frank Millman" <frank@chagford.com>
Date	2015-03-18 10:01 +0200
Message-ID	<mailman.513.1426665702.21433.python-list@python.org>
In reply to	#87534

"Ian Kelly" <ian.g.kelly@gmail.com> wrote in message 
news:CALwzidmNDcSvER7S6inEaVZA=DHUrDX1KzL-WRVwhd=o3_LvWA@mail.gmail.com...
> On Tue, Mar 17, 2015 at 12:44 AM, Frank Millman <frank@chagford.com> 
> wrote:
>> Thanks for the explanation. I'll try not to make that mistake again.
>>
>> However, to go back to the original example, we want to compare a dict
>> comprehension with a dict() constructor using a generator expression.
>>
>> Let's see if I have got this one right -
>>
>> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" 
>> "dict((a,
>> b) for a, b in zip(x, y))"
>> 10000 loops, best of 3: 49.6 usec per loop
>>
>> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "{a: b 
>> for
>> a, b in zip(x, y)}"
>> 10000 loops, best of 3: 25.8 usec per loop
>
> Why did you revert back to the no-op generator expression in the first
> case instead of the more efficient dict(zip(x, y))?

Firstly, I just want to emphasise that I am not trying to prove anything 
here, I am trying to learn something.

Why do you call it a no-op? I understood your previous point that a 
generator in the setup statement is exhausted after the first execution, but 
this is in the run-time statement, so I thought it would be executed every 
time.

If one does not need any 'comprehension' functionality, I agree that 
dict(zip(x, y)) is the way to go.

However, if you do need a comprehension, Paul questioned why a special dict 
comprehension was introduced when you can easily use a generator expression.

I ran the timing tests to compare the two, and I was surprised to see such a 
difference. However, you have already pointed out one case where I was not 
comparing apples and apples, and it is quite possible that I have stumbled 
across another one.

I read Lele's post where he pointed out that part of the difference is 
explained by the fact that dict() involves a function call, whereas {} does 
not. However, this does not seem to explain the entire difference.

Frank

[toc] | [prev] | [next] | [standalone]

#87669

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2015-03-18 02:39 -0600
Message-ID	<mailman.514.1426668000.21433.python-list@python.org>
In reply to	#87534

On Wed, Mar 18, 2015 at 2:01 AM, Frank Millman <frank@chagford.com> wrote:
>
> "Ian Kelly" <ian.g.kelly@gmail.com> wrote in message
> news:CALwzidmNDcSvER7S6inEaVZA=DHUrDX1KzL-WRVwhd=o3_LvWA@mail.gmail.com...
>> On Tue, Mar 17, 2015 at 12:44 AM, Frank Millman <frank@chagford.com>
>> wrote:
>>> Thanks for the explanation. I'll try not to make that mistake again.
>>>
>>> However, to go back to the original example, we want to compare a dict
>>> comprehension with a dict() constructor using a generator expression.
>>>
>>> Let's see if I have got this one right -
>>>
>>> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]"
>>> "dict((a,
>>> b) for a, b in zip(x, y))"
>>> 10000 loops, best of 3: 49.6 usec per loop
>>>
>>> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "{a: b
>>> for
>>> a, b in zip(x, y)}"
>>> 10000 loops, best of 3: 25.8 usec per loop
>>
>> Why did you revert back to the no-op generator expression in the first
>> case instead of the more efficient dict(zip(x, y))?
>
> Firstly, I just want to emphasise that I am not trying to prove anything
> here, I am trying to learn something.
>
> Why do you call it a no-op? I understood your previous point that a
> generator in the setup statement is exhausted after the first execution, but
> this is in the run-time statement, so I thought it would be executed every
> time.

I call it a no-op because it does not perform any transformation on
the iterable stream. It takes as input a sequence of 2-tuples, and it
yields the same tuples in the same sequence. It does however add a
non-trivial amount of code to perform that lack of work: for every
tuple in the sequence, it unpacks it, builds a new tuple, and yields
the result.

> I read Lele's post where he pointed out that part of the difference is
> explained by the fact that dict() involves a function call, whereas {} does
> not. However, this does not seem to explain the entire difference.

The dict call may be part of it, but I think it has more to do with
differences in the compiled loop. All the major work done by the
dict() function has to be done just the same by the comprehension. In
my own timing, the version with no comprehension and the dict
comprehension are very similar in speed, whereas the generator
expression is substantially slower. This points to the generator
expression being the bottleneck, not the dict call (which is a
function call, but at least it's implemented in C).

The bytecode for the generator expression looks like this:

>>> dis.dis(compile('((a,b) for a,b in zip(x,y))', '', 'eval').co_consts[0])
  1           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                23 (to 29)
              6 UNPACK_SEQUENCE          2
              9 STORE_FAST               1 (a)
             12 STORE_FAST               2 (b)
             15 LOAD_FAST                1 (a)
             18 LOAD_FAST                2 (b)
             21 BUILD_TUPLE              2
             24 YIELD_VALUE
             25 POP_TOP
             26 JUMP_ABSOLUTE            3
        >>   29 LOAD_CONST               0 (None)
             32 RETURN_VALUE

while the bytecode for the dict comprehension looks like this:

>>> dis.dis(compile('{a:b for a,b in zip(x,y)}', '', 'eval').co_consts[0])
  1           0 BUILD_MAP                0
              3 LOAD_FAST                0 (.0)
        >>    6 FOR_ITER                21 (to 30)
              9 UNPACK_SEQUENCE          2
             12 STORE_FAST               1 (a)
             15 STORE_FAST               2 (b)
             18 LOAD_FAST                2 (b)
             21 LOAD_FAST                1 (a)
             24 MAP_ADD                  2
             27 JUMP_ABSOLUTE            6
        >>   30 RETURN_VALUE

The major difference between the two is that where the former has
BUILD_TUPLE followed by YIELD_VALUE, the latter just has MAP_ADD. That
looks to be more efficient on two counts: one, that it avoids building
a tuple on every iteration; two, that it invokes code to contribute to
the dict directly, versus going through all the machinery of yielding
a value to be consumed by the dict() function.

[toc] | [prev] | [standalone]

csiph-web

Dict comprehensions - improvement to docs?

Contents

#87505 — Dict comprehensions - improvement to docs?

#87510

#87531

#87534

#87539

#87570

#87586

#87625

#87637

#87643

#87668

#87669