Groups > comp.lang.python > #28754 > unrolled thread

Is there a unique method in python to unique a list?

Started by	Token Type <typetoken@gmail.com>
First post	2012-09-08 22:43 -0700
Last post	2012-09-09 11:36 +0300
Articles	9 — 4 participants

Back to article view | Back to comp.lang.python

  Is there a unique method in python to unique a list? Token Type <typetoken@gmail.com> - 2012-09-08 22:43 -0700
    Re: Is there a unique method in python to unique a list? Chris Angelico <rosuav@gmail.com> - 2012-09-09 15:48 +1000
    Re: Is there a unique method in python to unique a list? Chris Angelico <rosuav@gmail.com> - 2012-09-09 16:32 +1000
      Re: Is there a unique method in python to unique a list? Token Type <typetoken@gmail.com> - 2012-09-08 23:44 -0700
        Re: Is there a unique method in python to unique a list? Paul Rubin <no.email@nospam.invalid> - 2012-09-09 01:41 -0700
          Re: Is there a unique method in python to unique a list? Paul Rubin <no.email@nospam.invalid> - 2012-09-09 02:06 -0700
          Re: Is there a unique method in python to unique a list? Token Type <typetoken@gmail.com> - 2012-09-09 06:44 -0700
            Re: Is there a unique method in python to unique a list? Chris Angelico <rosuav@gmail.com> - 2012-09-10 00:13 +1000
    Re: Is there a unique method in python to unique a list? Serhiy Storchaka <storchaka@gmail.com> - 2012-09-09 11:36 +0300

#28754 — Is there a unique method in python to unique a list?

From	Token Type <typetoken@gmail.com>
Date	2012-09-08 22:43 -0700
Subject	Is there a unique method in python to unique a list?
Message-ID	<c44aff41-71ed-4323-9184-9c87bd0e1119@googlegroups.com>

Is there a unique method in python to unique a list? thanks

[toc] | [next] | [standalone]

#28755

From	Chris Angelico <rosuav@gmail.com>
Date	2012-09-09 15:48 +1000
Message-ID	<mailman.402.1347169718.27098.python-list@python.org>
In reply to	#28754

On Sun, Sep 9, 2012 at 3:43 PM, Token Type <typetoken@gmail.com> wrote:
> Is there a unique method in python to unique a list? thanks

I don't believe there's a method for that, but if you don't care about
order, try turning your list into a set and then back into a list.

ChrisA

[toc] | [prev] | [next] | [standalone]

#28756

From	Chris Angelico <rosuav@gmail.com>
Date	2012-09-09 16:32 +1000
Message-ID	<mailman.405.1347172371.27098.python-list@python.org>
In reply to	#28754

On Sun, Sep 9, 2012 at 4:29 PM, John H. Li <typetoken@gmail.com> wrote:
> However, if I don't put  list(set(lemma_list))  to a variable name, it works
> much faster.

Try backdenting that statement. You're currently doing it at every
iteration of the loop - that's why it's so much slower.

But you'll probably find it better to work with the set directly,
instead of uniquifying a list as a separate operation.

ChrisA

[toc] | [prev] | [next] | [standalone]

#28757

From	Token Type <typetoken@gmail.com>
Date	2012-09-08 23:44 -0700
Message-ID	<mailman.407.1347173102.27098.python-list@python.org>
In reply to	#28756

 
> Try backdenting that statement. You're currently doing it at every
> 
> iteration of the loop - that's why it's so much slower.

Thanks. I works now.

>>> def average_polysemy(pos):
	synset_list = list(wn.all_synsets(pos))
	sense_number = 0
	lemma_list = []
	for synset in synset_list:
		lemma_list.extend(synset.lemma_names)		
	for lemma in list(set(lemma_list)):
		sense_number_new = len(wn.synsets(lemma, pos))
		sense_number = sense_number + sense_number_new
	return sense_number/len(set(lemma_list))

>>> average_polysemy('n')
1

 
> But you'll probably find it better to work with the set directly,
> 
> instead of uniquifying a list as a separate operation.

Yes, the following second methods still runs faster if I don't give a separate variable name to list(set(lemma_list)). Why will this happen?

>>> def average_polysemy(pos):
	synset_list = list(wn.all_synsets(pos))
	sense_number = 0
	lemma_list = []
	for synset in synset_list:
		lemma_list.extend(synset.lemma_names)		
	for lemma in list(set(lemma_list)):
		sense_number_new = len(wn.synsets(lemma, pos))
		sense_number = sense_number + sense_number_new
	return sense_number/len(set(lemma_list))

>>> average_polysemy('n')
1

[toc] | [prev] | [next] | [standalone]

#28759

From	Paul Rubin <no.email@nospam.invalid>
Date	2012-09-09 01:41 -0700
Message-ID	<7xwr03oafu.fsf@ruckus.brouhaha.com>
In reply to	#28757

Token Type <typetoken@gmail.com> writes:
>>>> def average_polysemy(pos):
> 	synset_list = list(wn.all_synsets(pos))
> 	sense_number = 0
> 	lemma_list = []
> 	for synset in synset_list:
> 		lemma_list.extend(synset.lemma_names)		
> 	for lemma in list(set(lemma_list)):
> 		sense_number_new = len(wn.synsets(lemma, pos))
> 		sense_number = sense_number + sense_number_new
> 	return sense_number/len(set(lemma_list))

I think you mean (untested):

     synsets = wn.all_synsets(pos)
     sense_number = 0
     lemma_set = set()
     for synset in synsets:
         lemma_set.add(synset.lemma_names)
     for lemma in lemma_set:
         sense_number += len(wn.synsets(lemma,pos))
     return sense_number / len(lemma_set)

[toc] | [prev] | [next] | [standalone]

#28760

From	Paul Rubin <no.email@nospam.invalid>
Date	2012-09-09 02:06 -0700
Message-ID	<7x7gs34lau.fsf@ruckus.brouhaha.com>
In reply to	#28759

Paul Rubin <no.email@nospam.invalid> writes:
> I think you mean (untested):
>
>      synsets = wn.all_synsets(pos)
>      sense_number = 0
>      lemma_set = set()
>      for synset in synsets:
>          lemma_set.add(synset.lemma_names)
>      for lemma in lemma_set:
>          sense_number += len(wn.synsets(lemma,pos))
>      return sense_number / len(lemma_set)

Or even:

  lemma_set = set(synset for synset in wn.all_synsets(pos))
  sense_number = sum(len(wn.synsets(lemma, pos)) for lemma in lemma_set)
  return sense_number / len(lemma_set)

[toc] | [prev] | [next] | [standalone]

#28773

From	Token Type <typetoken@gmail.com>
Date	2012-09-09 06:44 -0700
Message-ID	<d3107c91-2644-41d1-ba45-1aff0caf59af@googlegroups.com>
In reply to	#28759

Thanks. I try to use set() suggested by you. However, not successful. Please see:
>>> synsets = list(wn.all_synsets('n'))
>>> synsets[:5]
[Synset('entity.n.01'), Synset('physical_entity.n.01'), Synset('abstraction.n.06'), Synset('thing.n.12'), Synset('object.n.01')]
>>> lemma_set = set()
>>> for synset in synsets:
		lemma_set.add(synset.lemma_names)
	

Traceback (most recent call last):
  File "<pyshell#43>", line 2, in <module>
    lemma_set.add(synset.lemma_names)
TypeError: unhashable type: 'list'
>>> for synset in synsets:
		lemma_set.add(set(synset.lemma_names))

Traceback (most recent call last):
  File "<pyshell#45>", line 2, in <module>
    lemma_set.add(set(synset.lemma_names))
TypeError: unhashable type: 'set'

[toc] | [prev] | [next] | [standalone]

#28776

From	Chris Angelico <rosuav@gmail.com>
Date	2012-09-10 00:13 +1000
Message-ID	<mailman.419.1347200000.27098.python-list@python.org>
In reply to	#28773

On Sun, Sep 9, 2012 at 11:44 PM, Token Type <typetoken@gmail.com> wrote:
>               lemma_set.add(synset.lemma_names)

That tries to add the whole list as a single object, which doesn't
work because lists can't go into sets. There are two solutions,
depending on what you want to do.

1) If you want each addition to remain discrete, make a tuple instead:
lemma_set.add(tuple(synset.lemma_names))

2) If you want to add the elements of that list individually into the
set, use update:
lemma_set.update(synset.lemma_names)

I'm thinking you probably want option 2 here.

ChrisA

[toc] | [prev] | [next] | [standalone]

#28758

From	Serhiy Storchaka <storchaka@gmail.com>
Date	2012-09-09 11:36 +0300
Message-ID	<mailman.411.1347179816.27098.python-list@python.org>
In reply to	#28754

On 09.09.12 08:47, Donald Stufft wrote:
> If you don't need to retain order you can just use a set,

Only if elements are hashable.

[toc] | [prev] | [standalone]

csiph-web

Is there a unique method in python to unique a list?

Contents

#28754 — Is there a unique method in python to unique a list?

#28755

#28756

#28757

#28759

#28760

#28773

#28776

#28758