Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #98933

Re: Which type should be used when testing static structure appartenance

From Oscar Benjamin <oscar.j.benjamin@gmail.com>
Newsgroups comp.lang.python
Subject Re: Which type should be used when testing static structure appartenance
Date 2015-11-17 16:15 +0000
Message-ID <mailman.397.1447776964.16136.python-list@python.org> (permalink)
References <20151117142739.GC20735@localhost.localdomain>

Show all headers | View raw


On 17 November 2015 at 14:27, Nicolas Évrard <nicoe@altern.org> wrote:
> Hello,
>
> I saw the following retweet by Raymond Hettinger in this morning:
>
>    https://twitter.com/sanityinc/status/666485814214287360
>
>    Programming tip: many of those arrays and hashes in your code
>    should actually be sets. Match data structures to data
>    constraints!
>
> I saw just in time because in a review I wrote something like this:
>
>    if operator not in ('where', 'not where')
>
> and my colleague proposed that I should use a list instead of a tuple.
> But reading the mentioned tweet I tend to think that a set would be a
> better choice.
>
> What are your opinion on this issue (I realize it's not something of
> the utmost importance but rather a "philosophical" question).

Conceptually it should be a set. Really it makes little difference though.

I think that what the tip is really referring to is a situation where
you have a larger data structure. For example asking if x in y has
different algorithmic performance when y is a large tuple/list than a
set. This is because a set is a hash-table with O(1) average lookup
cost and a tuple/list needs to be scanned linearly having O(N) average
cost. This can be very costly and can often be seen in beginner code
where someone does something like:

   for x in stuff:
       if x in alist:
           # do stuff

Here the "in" operator loops over alist once for each item in stuff
(until it finds a match). If there are M things in stuff and N things
in alist then this is O(M*N) items to compare. If we convert alist to
a set at the start of the loop then we have:

    aset = set(alist)
    for xin stuff:
        if x in aset:
            # do stuff

Here the set constructor scans alist which is O(N) and then for each
of M things in stuff we do an O(1) lookup in aset. So the result is
O(M+N) meaning linear rather than quadratic performance. This can be a
significant difference especially for something so trivially easy to
do.

--
Oscar

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Which type should be used when testing static structure appartenance Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2015-11-17 16:15 +0000

csiph-web