Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: Oscar Benjamin Newsgroups: comp.lang.python Subject: Re: Which type should be used when testing static structure appartenance Date: Tue, 17 Nov 2015 16:15:37 +0000 Lines: 54 Message-ID: References: <20151117142739.GC20735@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de QQ5mA8wc7JFUdY0rEEHgRA1POYpQ+n6m10vLDwQb+4bw== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.006 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'operator': 0.03; 'constructor': 0.07; 'hettinger': 0.07; 'linear': 0.07; 'lookup': 0.09; 'tuple.': 0.09; '"in"': 0.16; 'algorithmic': 0.16; 'alist': 0.16; 'compare.': 0.16; 'hashes': 0.16; 'loops': 0.16; 'nicolas': 0.16; 'quadratic': 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'structure.': 0.16; 'subject:type': 0.16; 'subject:when': 0.16; 'to:name:python list': 0.16; 'trivially': 0.16; 'wrote:': 0.16; 'beginner': 0.18; 'have:': 0.18; 'url:status': 0.18; '2015': 0.20; 'proposed': 0.20; 'do.': 0.22; 'arrays': 0.22; 'referring': 0.22; 'programming': 0.22; 'wrote': 0.23; 'this:': 0.23; 'header :In-Reply-To:1': 0.24; 'example': 0.26; 'message- id:@mail.gmail.com': 0.27; 'tend': 0.27; 'finds': 0.29; 'sets.': 0.29; 'convert': 0.29; 'code': 0.30; 'raymond': 0.30; 'especially': 0.32; 'realize': 0.32; 'scanned': 0.32; 'though.': 0.33; 'list': 0.34; 'received:google.com': 0.35; 'set.': 0.35; 'something': 0.35; 'asking': 0.35; 'item': 0.35; 'but': 0.36; 'should': 0.36; 'instead': 0.36; 'there': 0.36; 'received:209.85': 0.36; 'structures': 0.36; 'to:addr:python-list': 0.36; 'subject:: ': 0.37; 'really': 0.37; 'things': 0.38; 'difference': 0.38; 'received:209': 0.38; 'stuff': 0.38; 'someone': 0.38; 'data': 0.39; 'does': 0.39; 'rather': 0.39; 'to:addr:python.org': 0.40; 'where': 0.40; 'hello,': 0.40; 'easy': 0.60; 'your': 0.60; 'different': 0.63; 'here': 0.66; 'situation': 0.67; 'saw': 0.77; 'costly': 0.84; 'morning:': 0.84; 'oscar': 0.84; 'stuff:': 0.84; 'utmost': 0.84; 'xin': 0.84; 'average': 0.93; 'choice.': 0.93; 'colleague': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=sbaR+GMgowj7XI5bwiu/mVe8UpKrnVfH1zp0fATx+ew=; b=lN/I7a0H8P7TSlh/XewAB2aabbGV7JO7UCHaxBlbgJ0nrmjv+mIPOGIy6DGg777bhx jgQaxebFwDY4+8Z+zimdWi+DsAdNLPed17rkz8+aT5OwzJlpsWXYfxBwnk/p5bxfC1kb I4TEWI1YIroPJW4gei8uaZLNkEkvS9JcrLl1gtJINvqSrIzJXjNM6zRiZKx0SOJeoC5Z kUje6KclxjCVWnIVWlOx5fxHiJwSN+eO0tjoo54i76prbhWO6qA/jcdSTHGinwRa2aF3 afcQAjloyAEpNGJEWgUJMEZj6hrpguCdiHG+dkzJ5hM7YeUuO60600kvpA8lEDBgJFGe lp+A== X-Received: by 10.25.81.13 with SMTP id f13mr162621lfb.126.1447776957008; Tue, 17 Nov 2015 08:15:57 -0800 (PST) In-Reply-To: <20151117142739.GC20735@localhost.localdomain> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:98933 On 17 November 2015 at 14:27, Nicolas =C3=89vrard wrote: > Hello, > > I saw the following retweet by Raymond Hettinger in this morning: > > https://twitter.com/sanityinc/status/666485814214287360 > > Programming tip: many of those arrays and hashes in your code > should actually be sets. Match data structures to data > constraints! > > I saw just in time because in a review I wrote something like this: > > if operator not in ('where', 'not where') > > and my colleague proposed that I should use a list instead of a tuple. > But reading the mentioned tweet I tend to think that a set would be a > better choice. > > What are your opinion on this issue (I realize it's not something of > the utmost importance but rather a "philosophical" question). Conceptually it should be a set. Really it makes little difference though. I think that what the tip is really referring to is a situation where you have a larger data structure. For example asking if x in y has different algorithmic performance when y is a large tuple/list than a set. This is because a set is a hash-table with O(1) average lookup cost and a tuple/list needs to be scanned linearly having O(N) average cost. This can be very costly and can often be seen in beginner code where someone does something like: for x in stuff: if x in alist: # do stuff Here the "in" operator loops over alist once for each item in stuff (until it finds a match). If there are M things in stuff and N things in alist then this is O(M*N) items to compare. If we convert alist to a set at the start of the loop then we have: aset =3D set(alist) for xin stuff: if x in aset: # do stuff Here the set constructor scans alist which is O(N) and then for each of M things in stuff we do an O(1) lookup in aset. So the result is O(M+N) meaning linear rather than quadratic performance. This can be a significant difference especially for something so trivially easy to do. -- Oscar