Groups > comp.lang.python > #30925 > unrolled thread

wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value?

Started by	Token Type <typetoken@gmail.com>
First post	2012-10-07 09:15 -0700
Last post	2012-10-07 20:37 -0700
Articles	14 — 6 participants

Back to article view | Back to comp.lang.python

  wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? Token Type <typetoken@gmail.com> - 2012-10-07 09:15 -0700
    Re: wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-10-07 18:49 +0100
      Re: wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? Token Type <typetoken@gmail.com> - 2012-10-08 20:13 -0700
        Re: wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? Ian Kelly <ian.g.kelly@gmail.com> - 2012-10-08 21:24 -0600
        Re: wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? alex23 <wuwei23@gmail.com> - 2012-10-08 20:31 -0700
          Re: wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? Token Type <typetoken@gmail.com> - 2012-10-08 21:16 -0700
            Re: wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? alex23 <wuwei23@gmail.com> - 2012-10-08 22:44 -0700
              Re: wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? Token Type <typetoken@gmail.com> - 2012-10-09 09:00 -0700
      Re: wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? Token Type <typetoken@gmail.com> - 2012-10-08 20:13 -0700
        Re: wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? Token Type <typetoken@gmail.com> - 2012-10-08 20:23 -0700
        Re: wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? Token Type <typetoken@gmail.com> - 2012-10-08 20:23 -0700
    Re: wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value? Terry Reedy <tjreedy@udel.edu> - 2012-10-07 15:53 -0400
    How to control the internet explorer? yujian <yujian4newsgroup@gmail.com> - 2012-10-08 11:02 +0800
      Re: How to control the internet explorer? alex23 <wuwei23@gmail.com> - 2012-10-07 20:37 -0700

#30925 — wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value?

From	Token Type <typetoken@gmail.com>
Date	2012-10-07 09:15 -0700
Subject	wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value?
Message-ID	<2d6d84d4-0f70-4280-96e2-f9fe17d5be8b@googlegroups.com>

In order to solve the following question, http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html:
★ Use one of the predefined similarity measures to score the similarity of each of the following pairs of words. Rank the pairs in order of decreasing similarity. How close is your ranking to the order given here, an order that was established experimentally by (Miller & Charles, 1998): car-automobile, gem-jewel, journey-voyage, boy-lad, coast-shore, asylum-madhouse, magician-wizard, midday-noon, furnace-stove, food-fruit, bird-cock, bird-crane, tool-implement, brother-monk, lad-brother, crane-implement, journey-car, monk-oracle, cemetery-woodland, food-rooster, coast-hill, forest-graveyard, shore-woodland, monk-slave, coast-forest, lad-wizard, chord-smile, glass-magician, rooster-voyage, noon-string.

(1) First, I put the word pairs in a list eg.
pairs = [(car, automobile), (gem, jewel), (journey, voyage) ]. According to http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html, I need to put them in the following format so as to calculate teh semantic similarity : wn.synset('right_whale.n.01').path_similarity(wn.synset('minke_whale.n.01')).

In this case, I need to use loop to iterate each element in the above pairs. How can I refer to each element in the above pairs, i.e. pairs = [(car, automobile), (gem, jewel), (journey, voyage) ]. What's the index for 'car' and for 'automobile'? Thanks for your tips.

(2) Since I can't solve the above index issue. I try to use dictionary as follows:
>>> import nltk
>>> from nltk.corpus import wordnet as wn
>>> pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage'}
>>> for key in pairs:
	word1 = wn.synset(str(key) + '.n.01')
	word2 = wn.synset(str(pairs[key])+'.n.01')
	similarity = word1.path_similarity(word2)
	print key+'-'+pairs[key],similarity

	
car-automobile 1.0
journey-voyage 0.25
gem-jewel 0.125

Now it seems that I can calculate the semantic similarity for each groups in the above dictionary. However, I want to sort according to the similarity value in the result before print the result out. Can sort dictionary elements according to their values? This is one of the requirement in this exercise. How can we make each group of words (e.g. car-automobile, jounrney-voyage, gem-jewel)
sorted according to their similarity value?
Thanks for your tips.

[toc] | [next] | [standalone]

#30928

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2012-10-07 18:49 +0100
Message-ID	<mailman.1928.1349632173.27098.python-list@python.org>
In reply to	#30925

On 07/10/2012 17:15, Token Type wrote:
> In order to solve the following question, http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html:
> ★ Use one of the predefined similarity measures to score the similarity of each of the following pairs of words. Rank the pairs in order of decreasing similarity. How close is your ranking to the order given here, an order that was established experimentally by (Miller & Charles, 1998): car-automobile, gem-jewel, journey-voyage, boy-lad, coast-shore, asylum-madhouse, magician-wizard, midday-noon, furnace-stove, food-fruit, bird-cock, bird-crane, tool-implement, brother-monk, lad-brother, crane-implement, journey-car, monk-oracle, cemetery-woodland, food-rooster, coast-hill, forest-graveyard, shore-woodland, monk-slave, coast-forest, lad-wizard, chord-smile, glass-magician, rooster-voyage, noon-string.
>
> (1) First, I put the word pairs in a list eg.
> pairs = [(car, automobile), (gem, jewel), (journey, voyage) ]. According to http://nltk.googlecode.com/svn/trunk/doc/book/ch02.html, I need to put them in the following format so as to calculate teh semantic similarity : wn.synset('right_whale.n.01').path_similarity(wn.synset('minke_whale.n.01')).
>
> In this case, I need to use loop to iterate each element in the above pairs. How can I refer to each element in the above pairs, i.e. pairs = [(car, automobile), (gem, jewel), (journey, voyage) ]. What's the index for 'car' and for 'automobile'? Thanks for your tips.
>
> (2) Since I can't solve the above index issue. I try to use dictionary as follows:
>>>> import nltk
>>>> from nltk.corpus import wordnet as wn
>>>> pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage'}
>>>> for key in pairs:
> 	word1 = wn.synset(str(key) + '.n.01')
> 	word2 = wn.synset(str(pairs[key])+'.n.01')
> 	similarity = word1.path_similarity(word2)
> 	print key+'-'+pairs[key],similarity
>
> 	
> car-automobile 1.0
> journey-voyage 0.25
> gem-jewel 0.125
>
> Now it seems that I can calculate the semantic similarity for each groups in the above dictionary. However, I want to sort according to the similarity value in the result before print the result out. Can sort dictionary elements according to their values? This is one of the requirement in this exercise. How can we make each group of words (e.g. car-automobile, jounrney-voyage, gem-jewel)
> sorted according to their similarity value?
> Thanks for your tips.
>

In your for loop save the data in a list rather than print it out and 
sort according to this 
http://wiki.python.org/moin/HowTo/Sorting#Operator_Module_Functions

-- 
Cheers.

Mark Lawrence.

[toc] | [prev] | [next] | [standalone]

#30989

From	Token Type <typetoken@gmail.com>
Date	2012-10-08 20:13 -0700
Message-ID	<62e7500b-0d7d-4d5e-8c10-197d1988f364@googlegroups.com>
In reply to	#30928

yes, thanks all your tips. I did try sorted with itemgetter. However, the sorted results are same as follows whether I set reverse=True or reverse= False. Isn't it strange? Thanks.

>>> import nltk
>>> from nltk.corpus import wordnet as wn
>>> pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage'}
>>> for key in pairs:
	list_simi=[]
	from operator import itemgetter
        word1 = wn.synset(str(key) + '.n.01') 
        word2 = wn.synset(str(pairs[key])+'.n.01') 
        similarity = word1.path_similarity(word2) 
        list_simi.append((key+'-'+pairs[key],similarity))
        sorted(list_simi, key=itemgetter(1), reverse=True)

        
[('car-automobile', 1.0)]
[('journey-voyage', 0.25)]
[('gem-jewel', 0.125)]
>>> for key in pairs:
	list_simi=[]
	from operator import itemgetter
        word1 = wn.synset(str(key) + '.n.01') 
        word2 = wn.synset(str(pairs[key])+'.n.01') 
        similarity = word1.path_similarity(word2) 
        list_simi.append((key+'-'+pairs[key],similarity))
        sorted(list_simi, key=itemgetter(1), reverse=False)

        
[('car-automobile', 1.0)]
[('journey-voyage', 0.25)]
[('gem-jewel', 0.125)]

[toc] | [prev] | [next] | [standalone]

#30993

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2012-10-08 21:24 -0600
Message-ID	<mailman.1980.1349753118.27098.python-list@python.org>
In reply to	#30989

On Mon, Oct 8, 2012 at 9:13 PM, Token Type <typetoken@gmail.com> wrote:
> yes, thanks all your tips. I did try sorted with itemgetter. However, the sorted results are same as follows whether I set reverse=True or reverse= False. Isn't it strange? Thanks.

First of all, "sorted" does not sort the list in place as you seem to
be expecting.
It returns a new sorted list.  Since your code does not store the
return value of the sorted call anywhere, the sorted list is discarded
and only the original list is kept.  If you want to sort a list in
place, use the list.sort method instead.

Second, you're not sorting the overall list.  On each iteration your
code: 1) assigns a new empty list to list_simi; 2) processes one of
the pairs; 3) adds the pair to the empty list; and 4) sorts the list.
On the next iteration you then start all over again with a new empty
list, and so when you get to the sorting step you're only sorting one
item each time.  You need to accumulate the list instead of wiping it
out on each iteration, and only sort it after the loop has completed.

[toc] | [prev] | [next] | [standalone]

#30994

From	alex23 <wuwei23@gmail.com>
Date	2012-10-08 20:31 -0700
Message-ID	<748ab20a-8e54-4732-b61d-272dd9717bb0@q7g2000pbj.googlegroups.com>
In reply to	#30989

On Oct 9, 1:13 pm, Token Type <typeto...@gmail.com> wrote:
> yes, thanks all your tips. I did try sorted with itemgetter.
> However, the sorted results are same as follows whether I
> set reverse=True or reverse= False. Isn't it strange? Thanks.

That's because you're sorting each entry individually, not the entire
result. For every key-value pair, you create a new empty list, append
one tuple, and then sort it. The consistent order you're seeing is the
outcome of stepping through the dictionary keys.

This is untested, but it should be closer to what you're after, I
think. First it creates `list_simi` as a generator, then it sorts it.

    import nltk
    from nltk.corpus import wordnet as wn
    from operator import itemgetter

    pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage'}

    def find_similarity(word1, word2):
        as_synset = lambda word: wn.synset( str(word) + '.n.01' )
        return as_synset(word1).path_similarity( as_synset(word2) )

    similarity_value = itemgetter(1)

    list_simi = (
        ('%s-%s' % (word1, word2), find_similarity(word1, word2) )
        for word1, word2 in pairs.iteritems()
    )
    list_simi = sorted(list_simi, key=similarity_value, reverse=True)

[toc] | [prev] | [next] | [standalone]

#30995

From	Token Type <typetoken@gmail.com>
Date	2012-10-08 21:16 -0700
Message-ID	<4d2a3205-ff2a-45f8-9c54-7ed98193aded@googlegroups.com>
In reply to	#30994

Thanks indeed for all your suggestions. When I try my above codes, what puzzles me is that when the data in the dictionary increase, some data become missing in the sorted result. Quite odd. In the pairs, we have {'journey':'voyage'} but in the sorted result no ('journey-voyage',0.25), which did appear in my first post which was a small scale experiment. I am quite puzzled...

>>> pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage','boy':'lad','coast':'shore', 'asylum':'madhouse', 'magician':'wizard', 'midday':'noon', 'furnace':'stove', 'food':'fruit', 'bird':'cock', 'bird':'crane', 'tool':'implement', 'brother':'monk', 'lad':'brother', 'crane':'implement', 'journey':'car', 'monk':'oracle', 'cemetery':'woodland', 'food':'rooster', 'coast':'hill', 'forest':'graveyard', 'shore':'woodland', 'monk':'slave', 'coast':'forest','lad':'wizard', 'chord':'smile', 'glass':'magician', 'rooster':'voyage', 'noon':'string'}
>>> list_simi=[]
>>> for key in pairs:
	word1 = wn.synset(str(key) + '.n.01') 
        word2 = wn.synset(str(pairs[key])+'.n.01') 
        similarity = word1.path_similarity(word2) 
        list_simi.append((key+'-'+pairs[key],similarity))

        
>>> from operator import itemgetter

>>> sorted(list_simi, key=itemgetter(1), reverse=True)
[('midday-noon', 1.0), ('car-automobile', 1.0), ('tool-implement', 0.5), ('boy-lad', 0.3333333333333333), ('lad-wizard', 0.2), ('monk-slave', 0.2), ('shore-woodland', 0.2), ('magician-wizard', 0.16666666666666666), ('brother-monk', 0.125), ('asylum-madhouse', 0.125), ('gem-jewel', 0.125), ('cemetery-woodland', 0.1111111111111111), ('bird-crane', 0.1111111111111111), ('glass-magician', 0.1111111111111111), ('crane-implement', 0.1), ('chord-smile', 0.09090909090909091), ('coast-forest', 0.09090909090909091), ('furnace-stove', 0.07692307692307693), ('forest-graveyard', 0.07142857142857142), ('food-rooster', 0.0625), ('noon-string', 0.058823529411764705), ('journey-car', 0.05), ('rooster-voyage', 0.041666666666666664)]

[toc] | [prev] | [next] | [standalone]

#30996

From	alex23 <wuwei23@gmail.com>
Date	2012-10-08 22:44 -0700
Message-ID	<ebbf2ca7-0933-4d25-9dc7-74d5fb7f9b67@wz4g2000pbc.googlegroups.com>
In reply to	#30995

On Oct 9, 2:16 pm, Token Type <typeto...@gmail.com> wrote:
> When I try my above codes, what puzzles me is that when
> the data in the dictionary increase, some data become
> missing in the sorted result. Quite odd. In the pairs,
> we have {'journey':'voyage'} but in the sorted result no (
> 'journey-voyage',0.25)
>
> >>> pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage','boy':'lad','coast':'shore', 'asylum':'madhouse', 'magician':'wizard', 'midday':'noon', 'furnace':'stove', 'food':'fruit', 'bird':'cock', 'bird':'crane', 'tool':'implement', 'brother':'monk', 'lad':'brother', 'crane':'implement', 'journey':'car', 'monk':'oracle', 'cemetery':'woodland', 'food':'rooster', 'coast':'hill', 'forest':'graveyard', 'shore':'woodland', 'monk':'slave', 'coast':'forest','lad':'wizard', 'chord':'smile', 'glass':'magician', 'rooster':'voyage', 'noon':'string'}

Keys are unique in dictionaries. You have two uses of 'journey'; the
second will overwrite the first.

Do you _need_ these items to be a dictionary? Are you doing any look
up? If not, just make it a list of tuples:

   pairs = [ ('car', 'automobile'), ('gem', 'jewel') ...]

Then make your main loop:

   for word1, word2 in pairs:

If you do need a dictionary for other reasons, you might want to try a
dictionary of lists:

    pairs = {
        'car': ['automobile', 'vehicle'],
        'gem': ['jewel'],
    }

    for word1, synonyms in pairs:
        for word2 in synonyms:
            ...

[toc] | [prev] | [next] | [standalone]

#31044

From	Token Type <typetoken@gmail.com>
Date	2012-10-09 09:00 -0700
Message-ID	<3723738e-62d5-420d-a42c-4f5ee7034a17@googlegroups.com>
In reply to	#30996

Thanks indeed for your tips. Now I understand the difference between tuples and dictionaries deeper.

[toc] | [prev] | [next] | [standalone]

#30990

From	Token Type <typetoken@gmail.com>
Date	2012-10-08 20:13 -0700
Message-ID	<mailman.1978.1349752402.27098.python-list@python.org>
In reply to	#30928

yes, thanks all your tips. I did try sorted with itemgetter. However, the sorted results are same as follows whether I set reverse=True or reverse= False. Isn't it strange? Thanks.

>>> import nltk
>>> from nltk.corpus import wordnet as wn
>>> pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage'}
>>> for key in pairs:
	list_simi=[]
	from operator import itemgetter
        word1 = wn.synset(str(key) + '.n.01') 
        word2 = wn.synset(str(pairs[key])+'.n.01') 
        similarity = word1.path_similarity(word2) 
        list_simi.append((key+'-'+pairs[key],similarity))
        sorted(list_simi, key=itemgetter(1), reverse=True)

        
[('car-automobile', 1.0)]
[('journey-voyage', 0.25)]
[('gem-jewel', 0.125)]
>>> for key in pairs:
	list_simi=[]
	from operator import itemgetter
        word1 = wn.synset(str(key) + '.n.01') 
        word2 = wn.synset(str(pairs[key])+'.n.01') 
        similarity = word1.path_similarity(word2) 
        list_simi.append((key+'-'+pairs[key],similarity))
        sorted(list_simi, key=itemgetter(1), reverse=False)

        
[('car-automobile', 1.0)]
[('journey-voyage', 0.25)]
[('gem-jewel', 0.125)]

[toc] | [prev] | [next] | [standalone]

#30991

From	Token Type <typetoken@gmail.com>
Date	2012-10-08 20:23 -0700
Message-ID	<de950410-af23-440c-b805-09a92a50df1a@googlegroups.com>
In reply to	#30990

Dear all, the problem has been solved as follows. Thanks anyway:
>>> import nltk
>>> from nltk.corpus import wordnet as wn
>>> pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage'}
>>> list_simi=[]
>>> for key in pairs:
	word1 = wn.synset(str(key) + '.n.01') 
        word2 = wn.synset(str(pairs[key])+'.n.01') 
        similarity = word1.path_similarity(word2) 
        list_simi.append((key+'-'+pairs[key],similarity))

        
>>> from operator import itemgetter
>>> sorted(list_simi, key=itemgetter(1), reverse=False)
[('gem-jewel', 0.125), ('journey-voyage', 0.25), ('car-automobile', 1.0)]
>>> sorted(list_simi, key=itemgetter(1), reverse=True)
[('car-automobile', 1.0), ('journey-voyage', 0.25), ('gem-jewel', 0.125)]
>>> sorted(list_simi, key=itemgetter(1))
[('gem-jewel', 0.125), ('journey-voyage', 0.25), ('car-automobile', 1.0)]

[toc] | [prev] | [next] | [standalone]

#30992

From	Token Type <typetoken@gmail.com>
Date	2012-10-08 20:23 -0700
Message-ID	<mailman.1979.1349753013.27098.python-list@python.org>
In reply to	#30990

Dear all, the problem has been solved as follows. Thanks anyway:
>>> import nltk
>>> from nltk.corpus import wordnet as wn
>>> pairs = {'car':'automobile', 'gem':'jewel', 'journey':'voyage'}
>>> list_simi=[]
>>> for key in pairs:
	word1 = wn.synset(str(key) + '.n.01') 
        word2 = wn.synset(str(pairs[key])+'.n.01') 
        similarity = word1.path_similarity(word2) 
        list_simi.append((key+'-'+pairs[key],similarity))

        
>>> from operator import itemgetter
>>> sorted(list_simi, key=itemgetter(1), reverse=False)
[('gem-jewel', 0.125), ('journey-voyage', 0.25), ('car-automobile', 1.0)]
>>> sorted(list_simi, key=itemgetter(1), reverse=True)
[('car-automobile', 1.0), ('journey-voyage', 0.25), ('gem-jewel', 0.125)]
>>> sorted(list_simi, key=itemgetter(1))
[('gem-jewel', 0.125), ('journey-voyage', 0.25), ('car-automobile', 1.0)]

[toc] | [prev] | [next] | [standalone]

#30932

From	Terry Reedy <tjreedy@udel.edu>
Date	2012-10-07 15:53 -0400
Message-ID	<mailman.1932.1349639642.27098.python-list@python.org>
In reply to	#30925

On 10/7/2012 12:15 PM, Token Type wrote:

> In this case, I need to use loop to iterate each element in the above
> pairs. How can I refer to each element in the above pairs, i.e. pairs
> = [(car, automobile), (gem, jewel), (journey, voyage) ]. What's the
> index for 'car' and for 'automobile'? Thanks for your tips.

 >>> pairs = [('car', 'automobile'), ('gem', 'jewel')]
 >>> pairs[0][0]
'car'
 >>> pairs[1][1]
'jewel'
 >>> for a,b in pairs: a,b

('car', 'automobile')
('gem', 'jewel')

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#30948 — How to control the internet explorer?

From	yujian <yujian4newsgroup@gmail.com>
Date	2012-10-08 11:02 +0800
Subject	How to control the internet explorer?
Message-ID	<mailman.1944.1349665387.27098.python-list@python.org>
In reply to	#30925

I want to save all the URLs in current opened windows,  and then close 
all the windows.

[toc] | [prev] | [next] | [standalone]

#30949 — Re: How to control the internet explorer?

From	alex23 <wuwei23@gmail.com>
Date	2012-10-07 20:37 -0700
Subject	Re: How to control the internet explorer?
Message-ID	<303d6d50-0672-4e76-8c3b-081ca433a8e3@p5g2000pbs.googlegroups.com>
In reply to	#30948

On Oct 8, 1:03 pm, yujian <yujian4newsgr...@gmail.com> wrote:
> I want to save all the URLs in current opened windows,  and then close
> all the windows.

Try mechanize or Selenium.

[toc] | [prev] | [standalone]

csiph-web

wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value?

Contents

#30925 — wordnet semantic similarity: how to refer to elements of a pair in a list? can we sort dictionary according to the value?

#30928

#30989

#30993

#30994

#30995

#30996

#31044

#30990

#30991

#30992

#30932

#30948 — How to control the internet explorer?

#30949 — Re: How to control the internet explorer?