Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: Chris Angelico <rosuav@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: Which type should be used when testing static structure appartenance
Date: Thu, 19 Nov 2015 00:42:32 +1100
Lines: 47
Message-ID: <mailman.415.1447854155.16136.python-list@python.org>
References: <mailman.388.1447770859.16136.python-list@python.org> <564c62f3$0$1593$c3e8da3$5496439d@news.astraweb.com> <mailman.411.1447850412.16136.python-list@python.org> <564c7d80$0$1606$c3e8da3$5496439d@news.astraweb.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
In-Reply-To: <564c7d80$0$1606$c3e8da3$5496439d@news.astraweb.com>
Precedence: list
Xref: csiph.com comp.lang.python:98971

On Thu, Nov 19, 2015 at 12:30 AM, Steven D'Aprano <steve@pearwood.info> wrote:
> On Wed, 18 Nov 2015 11:40 pm, Chris Angelico wrote:
>> All the questions of performance should be
>> secondary to code clarity, though;
>
> "All"? Surely not.

The OP's example was checking if a string was equal to either of two
strings. Even if that's in a tight loop, the performance difference
between the various options is negligible.

The "all" is a little misleading (of course there are times when you
warp your code for the sake of performance), but I was talking about
this example, where it's basically coming down to microbenchmarks.

>> so I would say the choices are: Set
>> literal if available, else tuple. Forget the performance.
>
> It seems rather strange to argue that we should ignore performance when the
> whole reason for using sets in the first place is for performance.

They do perform well, but that's not the main point - not when you're
working with just two strings. Of course, when you can get performance
AND readability, it's perfect. That doesn't happen with Py2 sets, but
it does with Python 3:

rosuav@sikorsky:~$ python -m timeit -s "x='asdf'" "x in {'asdf','qwer'}"
10000000 loops, best of 3: 0.12 usec per loop
rosuav@sikorsky:~$ python -m timeit -s "x='asdf'" "x in ('asdf','qwer')"
10000000 loops, best of 3: 0.0344 usec per loop
rosuav@sikorsky:~$ python -m timeit -s "x='asdf'" "x=='asdf' or x=='qwer'"
10000000 loops, best of 3: 0.0392 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "x='asdf'" "x in {'asdf','qwer'}"
10000000 loops, best of 3: 0.0356 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "x='asdf'" "x in ('asdf','qwer')"
10000000 loops, best of 3: 0.0342 usec per loop
rosuav@sikorsky:~$ python3 -m timeit -s "x='asdf'" "x=='asdf' or x=='qwer'"
10000000 loops, best of 3: 0.0418 usec per loop

No set construction in Py3 - the optimizer figures out that you don't
need mutability, and uses a constant frozenset. (Both versions do this
with list->tuple.) Despite the performance hit from using a set in
Py2, though, I would still advocate its use (assuming you don't need
to support 2.6 or older), because it accurately represents the
*concept* of "is this any one of these".

ChrisA