Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #98933 > unrolled thread
| Started by | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| First post | 2015-11-17 16:15 +0000 |
| Last post | 2015-11-17 16:15 +0000 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Which type should be used when testing static structure appartenance Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-11-17 16:15 +0000
| From | Oscar Benjamin <oscar.j.benjamin@gmail.com> |
|---|---|
| Date | 2015-11-17 16:15 +0000 |
| Subject | Re: Which type should be used when testing static structure appartenance |
| Message-ID | <mailman.397.1447776964.16136.python-list@python.org> |
On 17 November 2015 at 14:27, Nicolas Évrard <nicoe@altern.org> wrote:
> Hello,
>
> I saw the following retweet by Raymond Hettinger in this morning:
>
> https://twitter.com/sanityinc/status/666485814214287360
>
> Programming tip: many of those arrays and hashes in your code
> should actually be sets. Match data structures to data
> constraints!
>
> I saw just in time because in a review I wrote something like this:
>
> if operator not in ('where', 'not where')
>
> and my colleague proposed that I should use a list instead of a tuple.
> But reading the mentioned tweet I tend to think that a set would be a
> better choice.
>
> What are your opinion on this issue (I realize it's not something of
> the utmost importance but rather a "philosophical" question).
Conceptually it should be a set. Really it makes little difference though.
I think that what the tip is really referring to is a situation where
you have a larger data structure. For example asking if x in y has
different algorithmic performance when y is a large tuple/list than a
set. This is because a set is a hash-table with O(1) average lookup
cost and a tuple/list needs to be scanned linearly having O(N) average
cost. This can be very costly and can often be seen in beginner code
where someone does something like:
for x in stuff:
if x in alist:
# do stuff
Here the "in" operator loops over alist once for each item in stuff
(until it finds a match). If there are M things in stuff and N things
in alist then this is O(M*N) items to compare. If we convert alist to
a set at the start of the loop then we have:
aset = set(alist)
for xin stuff:
if x in aset:
# do stuff
Here the set constructor scans alist which is O(N) and then for each
of M things in stuff we do an O(1) lookup in aset. So the result is
O(M+N) meaning linear rather than quadratic performance. This can be a
significant difference especially for something so trivially easy to
do.
--
Oscar
Back to top | Article view | comp.lang.python
csiph-web