Groups > comp.lang.python > #98933 > unrolled thread

Re: Which type should be used when testing static structure appartenance

Started by	Oscar Benjamin <oscar.j.benjamin@gmail.com>
First post	2015-11-17 16:15 +0000
Last post	2015-11-17 16:15 +0000
Articles	1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Which type should be used when testing static structure appartenance Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-11-17 16:15 +0000

#98933 — Re: Which type should be used when testing static structure appartenance

From	Oscar Benjamin <oscar.j.benjamin@gmail.com>
Date	2015-11-17 16:15 +0000
Subject	Re: Which type should be used when testing static structure appartenance
Message-ID	<mailman.397.1447776964.16136.python-list@python.org>

On 17 November 2015 at 14:27, Nicolas Évrard <nicoe@altern.org> wrote:
> Hello,
>
> I saw the following retweet by Raymond Hettinger in this morning:
>
>    https://twitter.com/sanityinc/status/666485814214287360
>
>    Programming tip: many of those arrays and hashes in your code
>    should actually be sets. Match data structures to data
>    constraints!
>
> I saw just in time because in a review I wrote something like this:
>
>    if operator not in ('where', 'not where')
>
> and my colleague proposed that I should use a list instead of a tuple.
> But reading the mentioned tweet I tend to think that a set would be a
> better choice.
>
> What are your opinion on this issue (I realize it's not something of
> the utmost importance but rather a "philosophical" question).

Conceptually it should be a set. Really it makes little difference though.

I think that what the tip is really referring to is a situation where
you have a larger data structure. For example asking if x in y has
different algorithmic performance when y is a large tuple/list than a
set. This is because a set is a hash-table with O(1) average lookup
cost and a tuple/list needs to be scanned linearly having O(N) average
cost. This can be very costly and can often be seen in beginner code
where someone does something like:

   for x in stuff:
       if x in alist:
           # do stuff

Here the "in" operator loops over alist once for each item in stuff
(until it finds a match). If there are M things in stuff and N things
in alist then this is O(M*N) items to compare. If we convert alist to
a set at the start of the loop then we have:

    aset = set(alist)
    for xin stuff:
        if x in aset:
            # do stuff

Here the set constructor scans alist which is O(N) and then for each
of M things in stuff we do an O(1) lookup in aset. So the result is
O(M+N) meaning linear rather than quadratic performance. This can be a
significant difference especially for something so trivially easy to
do.

--
Oscar

[toc] | [standalone]

csiph-web

Re: Which type should be used when testing static structure appartenance

Contents

#98933 — Re: Which type should be used when testing static structure appartenance