Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #87505 > unrolled thread
| Started by | "Frank Millman" <frank@chagford.com> |
|---|---|
| First post | 2015-03-16 07:25 +0200 |
| Last post | 2015-03-18 02:39 -0600 |
| Articles | 12 — 4 participants |
Back to article view | Back to comp.lang.python
Dict comprehensions - improvement to docs? "Frank Millman" <frank@chagford.com> - 2015-03-16 07:25 +0200
Re: Dict comprehensions - improvement to docs? Paul Rubin <no.email@nospam.invalid> - 2015-03-15 23:22 -0700
Re: Dict comprehensions - improvement to docs? "Frank Millman" <frank@chagford.com> - 2015-03-16 10:30 +0200
Re: Dict comprehensions - improvement to docs? Paul Rubin <no.email@nospam.invalid> - 2015-03-16 01:38 -0700
Re: Dict comprehensions - improvement to docs? "Frank Millman" <frank@chagford.com> - 2015-03-16 11:01 +0200
Re: Dict comprehensions - improvement to docs? Ian Kelly <ian.g.kelly@gmail.com> - 2015-03-16 09:57 -0600
Re: Dict comprehensions - improvement to docs? Paul Rubin <no.email@nospam.invalid> - 2015-03-16 15:41 -0700
Re: Dict comprehensions - improvement to docs? "Frank Millman" <frank@chagford.com> - 2015-03-17 08:44 +0200
Re: Dict comprehensions - improvement to docs? Lele Gaifax <lele@metapensiero.it> - 2015-03-17 15:06 +0100
Re: Dict comprehensions - improvement to docs? Ian Kelly <ian.g.kelly@gmail.com> - 2015-03-17 09:48 -0600
Re: Dict comprehensions - improvement to docs? "Frank Millman" <frank@chagford.com> - 2015-03-18 10:01 +0200
Re: Dict comprehensions - improvement to docs? Ian Kelly <ian.g.kelly@gmail.com> - 2015-03-18 02:39 -0600
| From | "Frank Millman" <frank@chagford.com> |
|---|---|
| Date | 2015-03-16 07:25 +0200 |
| Subject | Dict comprehensions - improvement to docs? |
| Message-ID | <mailman.410.1426483550.21433.python-list@python.org> |
Hi all I like dict comprehensions, but I don't use them very often, so when I do I need to look up the format. I always struggle to find the information in the Library Reference. The obvious location, Mapping Types, shows various constructors, but not the comprehension. https://docs.python.org/3/library/stdtypes.html#mapping-types-dict So I turn to Google. It shows a couple of StackOverflow questions, and then a link to the Data Structures section of the Tutorial, which explains it succinctly. https://docs.python.org/3/tutorial/datastructures.html#dictionaries I feel that the Library Reference should be updated to include dict comprehensions. Just checking here first before I raise a Documentation issue. Frank Millman
[toc] | [next] | [standalone]
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Date | 2015-03-15 23:22 -0700 |
| Message-ID | <87fv95fom0.fsf@jester.gateway.sonic.net> |
| In reply to | #87505 |
"Frank Millman" <frank@chagford.com> writes:
> I like dict comprehensions, but I don't use them very often, so when I do I
> need to look up the format.
I never felt a need for them. Do they generate better code than
d = dict((k,v) for k,v in [('name','paul'),('language','python')]) ?
Anyway, since they are syntax, they'd be in the language refrence rather
than the library reference.
[toc] | [prev] | [next] | [standalone]
| From | "Frank Millman" <frank@chagford.com> |
|---|---|
| Date | 2015-03-16 10:30 +0200 |
| Message-ID | <mailman.427.1426494675.21433.python-list@python.org> |
| In reply to | #87510 |
"Paul Rubin" <no.email@nospam.invalid> wrote in message
news:87fv95fom0.fsf@jester.gateway.sonic.net...
> "Frank Millman" <frank@chagford.com> writes:
>> I like dict comprehensions, but I don't use them very often, so when I do
>> I
>> need to look up the format.
>
> I never felt a need for them. Do they generate better code than
>
> d = dict((k,v) for k,v in [('name','paul'),('language','python')]) ?
>
I ran timeit, and dict comps are quite a bit quicker. This is with Python
3.4.1 -
C:\>python -m timeit -s "x=range(65, 91); y=(chr(z) for z in x)" "dict((a,
b) for a, b in zip(x, y))"
100000 loops, best of 3: 16.1 usec per loop
C:\>python -m timeit -s "x=range(65, 91); y=(chr(z) for z in x)" "{a: b for
a, b in zip(x, y)}"
100000 loops, best of 3: 6.38 usec per loop
Frank
[toc] | [prev] | [next] | [standalone]
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Date | 2015-03-16 01:38 -0700 |
| Message-ID | <87wq2hfibu.fsf@jester.gateway.sonic.net> |
| In reply to | #87531 |
"Frank Millman" <frank@chagford.com> writes:
> dict((a, b) for a, b in zip(x, y))
> 100000 loops, best of 3: 16.1 usec per loop
> {a: b for a, b in zip(x, y)}"
> 100000 loops, best of 3: 6.38 usec per loop
Hmm, I bet the difference is from the (a,b) consing all those tuples.
Can you try just dict(zip(x,y)) ?
[toc] | [prev] | [next] | [standalone]
| From | "Frank Millman" <frank@chagford.com> |
|---|---|
| Date | 2015-03-16 11:01 +0200 |
| Message-ID | <mailman.432.1426496495.21433.python-list@python.org> |
| In reply to | #87534 |
"Paul Rubin" <no.email@nospam.invalid> wrote in message
news:87wq2hfibu.fsf@jester.gateway.sonic.net...
> "Frank Millman" <frank@chagford.com> writes:
>> dict((a, b) for a, b in zip(x, y))
>> 100000 loops, best of 3: 16.1 usec per loop
>> {a: b for a, b in zip(x, y)}"
>> 100000 loops, best of 3: 6.38 usec per loop
>
> Hmm, I bet the difference is from the (a,b) consing all those tuples.
>
> Can you try just dict(zip(x,y)) ?
C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
"dict(zip(x, y))"
100000 loops, best of 3: 11.9 usec per loop
C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)" "{a: b
for a, b in zip(x, y)}"
100000 loops, best of 3: 7.24 usec per loop
I ran the dict comp again in case the machine load had changed - it is a bit
slower, but not much.
Frank
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2015-03-16 09:57 -0600 |
| Message-ID | <mailman.450.1426521470.21433.python-list@python.org> |
| In reply to | #87534 |
On Mon, Mar 16, 2015 at 3:01 AM, Frank Millman <frank@chagford.com> wrote:
> C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
> "dict(zip(x, y))"
> 100000 loops, best of 3: 11.9 usec per loop
>
> C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)" "{a: b
> for a, b in zip(x, y)}"
> 100000 loops, best of 3: 7.24 usec per loop
Since the setup code is only run once, the generator expression used
for y is only iterated over once. On every subsequent loop, zip is
producing an empty result. So this measurement is really just
capturing the overhead of the dict construction. Compare:
$ python3 -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
"dict(zip(x,y))"
1000000 loops, best of 3: 0.9 usec per loop
$ python3 -m timeit -s "x = range(65, 91); y = [chr(z) for z in x]"
"dict(zip(x,y))"
100000 loops, best of 3: 2.69 usec per loop
$ python3 -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
"{a:b for a,b in zip(x,y)}"
1000000 loops, best of 3: 0.837 usec per loop
$ python3 -m timeit -s "x = range(65, 91); y = [chr(z) for z in x]"
"{a:b for a,b in zip(x,y)}"
100000 loops, best of 3: 2.67 usec per loop
[toc] | [prev] | [next] | [standalone]
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Date | 2015-03-16 15:41 -0700 |
| Message-ID | <87bnjso99q.fsf@jester.gateway.sonic.net> |
| In reply to | #87570 |
Ian Kelly <ian.g.kelly@gmail.com> writes: > Since the setup code is only run once, the generator expression used > for y is only iterated over once. Ah, thanks, I'd been wondering what was going on with Frank's example but hadn't gotten around to trying to analyze it.
[toc] | [prev] | [next] | [standalone]
| From | "Frank Millman" <frank@chagford.com> |
|---|---|
| Date | 2015-03-17 08:44 +0200 |
| Message-ID | <mailman.490.1426574679.21433.python-list@python.org> |
| In reply to | #87534 |
"Ian Kelly" <ian.g.kelly@gmail.com> wrote in message
news:CALwzid=u19YMkfJBhbLZi1qh2U9UK4ohY5wco1zO-i3T5AtyOA@mail.gmail.com...
> On Mon, Mar 16, 2015 at 3:01 AM, Frank Millman <frank@chagford.com> wrote:
>> C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
>> "dict(zip(x, y))"
>> 100000 loops, best of 3: 11.9 usec per loop
>>
>> C:\>python -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)" "{a:
>> b
>> for a, b in zip(x, y)}"
>> 100000 loops, best of 3: 7.24 usec per loop
>
> Since the setup code is only run once, the generator expression used
> for y is only iterated over once. On every subsequent loop, zip is
> producing an empty result. So this measurement is really just
> capturing the overhead of the dict construction. Compare:
>
> $ python3 -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
> "dict(zip(x,y))"
> 1000000 loops, best of 3: 0.9 usec per loop
> $ python3 -m timeit -s "x = range(65, 91); y = [chr(z) for z in x]"
> "dict(zip(x,y))"
> 100000 loops, best of 3: 2.69 usec per loop
> $ python3 -m timeit -s "x = range(65, 91); y = (chr(z) for z in x)"
> "{a:b for a,b in zip(x,y)}"
> 1000000 loops, best of 3: 0.837 usec per loop
> $ python3 -m timeit -s "x = range(65, 91); y = [chr(z) for z in x]"
> "{a:b for a,b in zip(x,y)}"
> 100000 loops, best of 3: 2.67 usec per loop
Thanks for the explanation. I'll try not to make that mistake again.
However, to go back to the original example, we want to compare a dict
comprehension with a dict() constructor using a generator expression.
Let's see if I have got this one right -
C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "dict((a,
b) for a, b in zip(x, y))"
10000 loops, best of 3: 49.6 usec per loop
C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "{a: b for
a, b in zip(x, y)}"
10000 loops, best of 3: 25.8 usec per loop
Or to use Paul's original example -
C:\>python -m timeit "d = dict((k, v) for k, v in [('name', 'paul'),
('language', 'python')])
100000 loops, best of 3: 16.6 usec per loop
C:\>python -m timeit "d = {k: v for k, v in [('name', 'paul'), ('language',
'python')]}
100000 loops, best of 3: 5.2 usec per loop
It seems that a dict comp is noticeably faster.
Does this sound right, or are there other factors I should be taking into
account?
Frank
[toc] | [prev] | [next] | [standalone]
| From | Lele Gaifax <lele@metapensiero.it> |
|---|---|
| Date | 2015-03-17 15:06 +0100 |
| Message-ID | <mailman.495.1426601188.21433.python-list@python.org> |
| In reply to | #87534 |
"Frank Millman" <frank@chagford.com> writes:
> It seems that a dict comp is noticeably faster.
>
> Does this sound right, or are there other factors I should be taking into
> account?
The dict comp does not execute any function call. Consider the following:
$ python3 -m timeit "d=dict()"
10000000 loops, best of 3: 0.113 usec per loop
$ python3 -m timeit "d={}"
10000000 loops, best of 3: 0.0601 usec per loop
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it | -- Fortunato Depero, 1929.
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2015-03-17 09:48 -0600 |
| Message-ID | <mailman.498.1426611876.21433.python-list@python.org> |
| In reply to | #87534 |
On Tue, Mar 17, 2015 at 12:44 AM, Frank Millman <frank@chagford.com> wrote:
> Thanks for the explanation. I'll try not to make that mistake again.
>
> However, to go back to the original example, we want to compare a dict
> comprehension with a dict() constructor using a generator expression.
>
> Let's see if I have got this one right -
>
> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "dict((a,
> b) for a, b in zip(x, y))"
> 10000 loops, best of 3: 49.6 usec per loop
>
> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "{a: b for
> a, b in zip(x, y)}"
> 10000 loops, best of 3: 25.8 usec per loop
Why did you revert back to the no-op generator expression in the first
case instead of the more efficient dict(zip(x, y))?
[toc] | [prev] | [next] | [standalone]
| From | "Frank Millman" <frank@chagford.com> |
|---|---|
| Date | 2015-03-18 10:01 +0200 |
| Message-ID | <mailman.513.1426665702.21433.python-list@python.org> |
| In reply to | #87534 |
"Ian Kelly" <ian.g.kelly@gmail.com> wrote in message
news:CALwzidmNDcSvER7S6inEaVZA=DHUrDX1KzL-WRVwhd=o3_LvWA@mail.gmail.com...
> On Tue, Mar 17, 2015 at 12:44 AM, Frank Millman <frank@chagford.com>
> wrote:
>> Thanks for the explanation. I'll try not to make that mistake again.
>>
>> However, to go back to the original example, we want to compare a dict
>> comprehension with a dict() constructor using a generator expression.
>>
>> Let's see if I have got this one right -
>>
>> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]"
>> "dict((a,
>> b) for a, b in zip(x, y))"
>> 10000 loops, best of 3: 49.6 usec per loop
>>
>> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "{a: b
>> for
>> a, b in zip(x, y)}"
>> 10000 loops, best of 3: 25.8 usec per loop
>
> Why did you revert back to the no-op generator expression in the first
> case instead of the more efficient dict(zip(x, y))?
Firstly, I just want to emphasise that I am not trying to prove anything
here, I am trying to learn something.
Why do you call it a no-op? I understood your previous point that a
generator in the setup statement is exhausted after the first execution, but
this is in the run-time statement, so I thought it would be executed every
time.
If one does not need any 'comprehension' functionality, I agree that
dict(zip(x, y)) is the way to go.
However, if you do need a comprehension, Paul questioned why a special dict
comprehension was introduced when you can easily use a generator expression.
I ran the timing tests to compare the two, and I was surprised to see such a
difference. However, you have already pointed out one case where I was not
comparing apples and apples, and it is quite possible that I have stumbled
across another one.
I read Lele's post where he pointed out that part of the difference is
explained by the fact that dict() involves a function call, whereas {} does
not. However, this does not seem to explain the entire difference.
Frank
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2015-03-18 02:39 -0600 |
| Message-ID | <mailman.514.1426668000.21433.python-list@python.org> |
| In reply to | #87534 |
On Wed, Mar 18, 2015 at 2:01 AM, Frank Millman <frank@chagford.com> wrote:
>
> "Ian Kelly" <ian.g.kelly@gmail.com> wrote in message
> news:CALwzidmNDcSvER7S6inEaVZA=DHUrDX1KzL-WRVwhd=o3_LvWA@mail.gmail.com...
>> On Tue, Mar 17, 2015 at 12:44 AM, Frank Millman <frank@chagford.com>
>> wrote:
>>> Thanks for the explanation. I'll try not to make that mistake again.
>>>
>>> However, to go back to the original example, we want to compare a dict
>>> comprehension with a dict() constructor using a generator expression.
>>>
>>> Let's see if I have got this one right -
>>>
>>> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]"
>>> "dict((a,
>>> b) for a, b in zip(x, y))"
>>> 10000 loops, best of 3: 49.6 usec per loop
>>>
>>> C:\>python -m timeit -s "x=range(65, 91); y=[chr(z) for z in x]" "{a: b
>>> for
>>> a, b in zip(x, y)}"
>>> 10000 loops, best of 3: 25.8 usec per loop
>>
>> Why did you revert back to the no-op generator expression in the first
>> case instead of the more efficient dict(zip(x, y))?
>
> Firstly, I just want to emphasise that I am not trying to prove anything
> here, I am trying to learn something.
>
> Why do you call it a no-op? I understood your previous point that a
> generator in the setup statement is exhausted after the first execution, but
> this is in the run-time statement, so I thought it would be executed every
> time.
I call it a no-op because it does not perform any transformation on
the iterable stream. It takes as input a sequence of 2-tuples, and it
yields the same tuples in the same sequence. It does however add a
non-trivial amount of code to perform that lack of work: for every
tuple in the sequence, it unpacks it, builds a new tuple, and yields
the result.
> I read Lele's post where he pointed out that part of the difference is
> explained by the fact that dict() involves a function call, whereas {} does
> not. However, this does not seem to explain the entire difference.
The dict call may be part of it, but I think it has more to do with
differences in the compiled loop. All the major work done by the
dict() function has to be done just the same by the comprehension. In
my own timing, the version with no comprehension and the dict
comprehension are very similar in speed, whereas the generator
expression is substantially slower. This points to the generator
expression being the bottleneck, not the dict call (which is a
function call, but at least it's implemented in C).
The bytecode for the generator expression looks like this:
>>> dis.dis(compile('((a,b) for a,b in zip(x,y))', '', 'eval').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 23 (to 29)
6 UNPACK_SEQUENCE 2
9 STORE_FAST 1 (a)
12 STORE_FAST 2 (b)
15 LOAD_FAST 1 (a)
18 LOAD_FAST 2 (b)
21 BUILD_TUPLE 2
24 YIELD_VALUE
25 POP_TOP
26 JUMP_ABSOLUTE 3
>> 29 LOAD_CONST 0 (None)
32 RETURN_VALUE
while the bytecode for the dict comprehension looks like this:
>>> dis.dis(compile('{a:b for a,b in zip(x,y)}', '', 'eval').co_consts[0])
1 0 BUILD_MAP 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 21 (to 30)
9 UNPACK_SEQUENCE 2
12 STORE_FAST 1 (a)
15 STORE_FAST 2 (b)
18 LOAD_FAST 2 (b)
21 LOAD_FAST 1 (a)
24 MAP_ADD 2
27 JUMP_ABSOLUTE 6
>> 30 RETURN_VALUE
The major difference between the two is that where the former has
BUILD_TUPLE followed by YIELD_VALUE, the latter just has MAP_ADD. That
looks to be more efficient on two counts: one, that it avoids building
a tuple on every iteration; two, that it invokes code to contribute to
the dict directly, versus going through all the machinery of yielding
a value to be consumed by the dict() function.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web