Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #91501
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2015-05-29 13:39 -0700 |
| Message-ID | <d7b239fc-c814-4935-8172-5c6a6c782d9c@googlegroups.com> (permalink) |
| Subject | Generating list of unique search sub-phrases |
| From | Nick Mellor <thebalancepro@gmail.com> |
Hi all,
My own solution works but I'm sure it could be simpler or read better. How would you do it?
Say you've got a list of companies:
Aerosonde Ltd
Amcor
ANCA
Austal Ships
Australia Post
Australian Air Express
Australian Defence Industries
Australian Railroad Group
Australian Submarine Corporation
and you need to extract phrases from the company names that uniquely identify that company. The results for the above list of companies should be:
Company: 'Aerosonde Ltd'
Aliases: Aerosonde,Ltd,Aerosonde Ltd
Company: 'Amcor'
Aliases: Amcor
Company: 'ANCA'
Aliases: ANCA
Company: 'Austal Ships'
Aliases: Austal,Ships,Austal Ships
Company: 'Australia Post'
Aliases: Post,Australia Post
Company: 'Australian Air Express'
Aliases: Air,Express,Australian Air,Air Express,Australian Air Express
Company: 'Australian Defence Industries'
Aliases: Defence,Industries,Australian Defence,Defence Industries,Australian Defence Industries
Company: 'Australian Railroad Group'
Aliases: Railroad,Group,Australian Railroad,Railroad Group,Australian Railroad Group
Company: 'Australian Submarine Corporation'
Aliases: Submarine,Corporation,Australian Submarine,Submarine Corporation,Australian Submarine Corporation
Here's my solution:
from itertools import combinations, chain
companies = [
"Aerosonde Ltd",
"Amcor",
"ANCA",
"Austal Ships",
"Australia Post",
"Australian Air Express",
"Australian Defence Industries",
"Australian Railroad Group",
"Australian Submarine Corporation",
]
def flatten(i):
return list(chain.from_iterable(i))
companies_as_text_stream = ' '.join(companies)
for company in companies:
word_combinations = [list(combinations(company.split(), r)) for r in range(1, len(company))]
phrases = [' '.join(phrase) for phrase in flatten(word_combinations)]
unique_phrases = [phrase for phrase in phrases if companies_as_text_stream.count(phrase) == 1]
aliases = ','.join(unique_phrases)
print("Company: '{0}'\n Aliases: {1}\n".format(company, aliases))
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
Generating list of unique search sub-phrases Nick Mellor <thebalancepro@gmail.com> - 2015-05-29 13:39 -0700 Re: Generating list of unique search sub-phrases Peter Otten <__peter__@web.de> - 2015-05-30 01:07 +0200 Re: Generating list of unique search sub-phrases Nick Mellor <thebalancepro@gmail.com> - 2015-06-17 15:55 -0700
csiph-web