Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #8027 > unrolled thread

Is there any advantage or disadvantage to using sets over list comps to ensure a list of unique entries?

Started bydeathweaselx86 <deathweasel@gmail.com>
First post2011-06-20 12:43 -0700
Last post2011-06-25 01:05 -0700
Articles 6 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  Is there any advantage or disadvantage to using sets over list comps to ensure a list of unique entries? deathweaselx86 <deathweasel@gmail.com> - 2011-06-20 12:43 -0700
    Re: Is there any advantage or disadvantage to using sets over list comps to ensure a list of unique entries? Ian Kelly <ian.g.kelly@gmail.com> - 2011-06-20 13:59 -0600
    Re: Is there any advantage or disadvantage to using sets over list comps to ensure a list of unique entries? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-06-21 01:10 +0000
      Re: Is there any advantage or disadvantage to using sets over list comps to ensure a list of unique entries? Ethan Furman <ethan@stoneleaf.us> - 2011-06-20 20:29 -0700
    Re: Is there any advantage or disadvantage to using sets over list comps to ensure a list of unique entries? rusi <rustompmody@gmail.com> - 2011-06-20 20:59 -0700
    Re: Is there any advantage or disadvantage to using sets over list comps to ensure a list of unique entries? Raymond Hettinger <python@rcn.com> - 2011-06-25 01:05 -0700

#8027 — Is there any advantage or disadvantage to using sets over list comps to ensure a list of unique entries?

Fromdeathweaselx86 <deathweasel@gmail.com>
Date2011-06-20 12:43 -0700
SubjectIs there any advantage or disadvantage to using sets over list comps to ensure a list of unique entries?
Message-ID<5b73ae60-506f-45e4-a82c-e59571252d47@w4g2000yqm.googlegroups.com>
Howdy guys, I am new.

I've been converting lists to sets, then back to lists again to get
unique lists.
e.g

Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> foo = ['1','2','3']
>>> bar = ['2','5']
>>> foo.extend(bar)
>>> foo = list(set(foo))
>>> foo
['1', '3', '2', '5']

I used to use list comps to do this instead.
>>> foo = ['1','2','3']
>>> bar = ['2','5']
>>> foo.extend([a for a in bar if a not in foo])
>>> foo
['1', '2', '3', '5']

A very long time ago, we all used dictionaries, but I'm not interested
in that ever again. ;-)
Is there any performance hit to using one of these methods over the
other for rather large lists?

[toc] | [next] | [standalone]


#8028

FromIan Kelly <ian.g.kelly@gmail.com>
Date2011-06-20 13:59 -0600
Message-ID<mailman.191.1308600028.1164.python-list@python.org>
In reply to#8027
On Mon, Jun 20, 2011 at 1:43 PM, deathweaselx86 <deathweasel@gmail.com> wrote:
> Howdy guys, I am new.
>
> I've been converting lists to sets, then back to lists again to get
> unique lists.
> e.g
>
> Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
> [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> foo = ['1','2','3']
>>>> bar = ['2','5']
>>>> foo.extend(bar)
>>>> foo = list(set(foo))
>>>> foo
> ['1', '3', '2', '5']
>
> I used to use list comps to do this instead.
>>>> foo = ['1','2','3']
>>>> bar = ['2','5']
>>>> foo.extend([a for a in bar if a not in foo])
>>>> foo
> ['1', '2', '3', '5']
>
> A very long time ago, we all used dictionaries, but I'm not interested
> in that ever again. ;-)
> Is there any performance hit to using one of these methods over the
> other for rather large lists?

Yes.  In the second approach, "if a not in foo" is O(n) if foo is a
list, and since you're doing it m times, that's O(n * m).

The first approach is merely O(n + m).

[toc] | [prev] | [next] | [standalone]


#8048

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-06-21 01:10 +0000
Message-ID<4dffefa1$0$30002$c3e8da3$5496439d@news.astraweb.com>
In reply to#8027
On Mon, 20 Jun 2011 12:43:52 -0700, deathweaselx86 wrote:

> Howdy guys, I am new.
> 
> I've been converting lists to sets, then back to lists again to get
> unique lists.
> e.g
> 
> Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48) [GCC 4.2.4 (Ubuntu
> 4.2.4-1ubuntu3)] on linux2 Type "help", "copyright", "credits" or
> "license" for more information.
>>>> foo = ['1','2','3']
>>>> bar = ['2','5']
>>>> foo.extend(bar)
>>>> foo = list(set(foo))
>>>> foo
> ['1', '3', '2', '5']
> 
> I used to use list comps to do this instead.
>>>> foo = ['1','2','3']
>>>> bar = ['2','5']
>>>> foo.extend([a for a in bar if a not in foo]) foo
> ['1', '2', '3', '5']
> 
> A very long time ago, we all used dictionaries, but I'm not interested
> in that ever again. ;-)
> Is there any performance hit to using one of these methods over the
> other for rather large lists?

Absolutely!

Membership testing in lists is O(N), that is, it gets slower as the 
number of items in the list increases. So if list foo above is huge, "a 
not in foo" may take time proportional to how huge it is.

For sets, membership testing is (roughly) O(1), that it, it takes 
(roughly) constant time no matter how big the set is. There are special 
cases where this does not hold, but you have to work hard to find one, so 
in general you can assume that any lookup in a dict or set will take the 
same amount of time.

However, the cost of that extra power is that sets are limited to 
hashable objects (e.g. ints, strings, but not lists, etc.), they can't 
contain duplicates, and they are unordered. If you care about the order 
of the list, it is better to do something like this:

# pseudo-code
target = ['1', '3', '2', '7', '5', '9']
source = ['6', '2', '1', '0']
# add unique items of source to target, keeping the order
tmp = sorted(target)
for item in source:
    flag = binary search for item in tmp
    if not flag:
        insert item in tmp keeping the list sorted
        append item to the end of target
del tmp

The bisect module will be useful here.

For small lists, it really doesn't matter what you do. This probably only 
matters beyond a few tens of thousands of items.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#8063

FromEthan Furman <ethan@stoneleaf.us>
Date2011-06-20 20:29 -0700
Message-ID<mailman.213.1308627006.1164.python-list@python.org>
In reply to#8048
Steven D'Aprano wrote:
> On Mon, 20 Jun 2011 12:43:52 -0700, deathweaselx86 wrote:
>> I've been converting lists to sets, then back to lists again to get
>> unique lists.
>>
>> I used to use list comps to do this instead.
>>>>> foo = ['1','2','3']
>>>>> bar = ['2','5']
>>>>> foo.extend([a for a in bar if a not in foo]) foo
>> ['1', '2', '3', '5']
>>
>> Is there any performance hit to using one of these methods over the
>> other for rather large lists?
> 
> Absolutely!
> 
> For small lists, it really doesn't matter what you do. This probably only 
> matters beyond a few tens of thousands of items.

Depends on the complexity of the object.  It only took a couple thousand 
dbf records to notice a *huge* slowdown using 'in' tests on regular lists.

~Ethan~

[toc] | [prev] | [next] | [standalone]


#8065

Fromrusi <rustompmody@gmail.com>
Date2011-06-20 20:59 -0700
Message-ID<8f67e8a1-bb2d-4d20-af2c-dfdb93679301@35g2000prp.googlegroups.com>
In reply to#8027
On Jun 21, 12:43 am, deathweaselx86 <deathwea...@gmail.com> wrote:
> Howdy guys, I am new.
>
> I've been converting lists to sets, then back to lists again to get
> unique lists.

Maybe you should consider whether its best to work with sets only and
not use lists at all.
This needs to be said because:
1. Most intros to python show lists (and tuples which is almost the
same) and hardly mention sets
2. Until recent versions python did not have sets

The basic question to ask yourself is: Is order/repetition
significant?
If not you want a set.

[toc] | [prev] | [next] | [standalone]


#8438

FromRaymond Hettinger <python@rcn.com>
Date2011-06-25 01:05 -0700
Message-ID<195cb0f9-1553-47f0-a945-71cfd563c951@u28g2000yqf.googlegroups.com>
In reply to#8027
On Jun 20, 9:43 pm, deathweaselx86 <deathwea...@gmail.com> wrote:
> Howdy guys, I am new.
>
> I've been converting lists to sets, then back to lists again to get
> unique lists.
> e.g
>
> Python 2.5.2 (r252:60911, Jan 20 2010, 21:48:48)
> [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.>>> foo = ['1','2','3']
> >>> bar = ['2','5']
> >>> foo.extend(bar)
> >>> foo = list(set(foo))

That works great and is very clear.

If you want to also preserve order, use an OrderedDict:

>>> list(OrderedDict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']


Raymond

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web