Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #92789
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2015-06-17 15:55 -0700 |
| References | <d7b239fc-c814-4935-8172-5c6a6c782d9c@googlegroups.com> |
| Message-ID | <b1b6e40f-85ca-4e0b-8da3-4ef954ebddaa@googlegroups.com> (permalink) |
| Subject | Re: Generating list of unique search sub-phrases |
| From | Nick Mellor <thebalancepro@gmail.com> |
On Saturday, 30 May 2015 06:39:44 UTC+10, Nick Mellor wrote:
> Hi all,
>
> My own solution works but I'm sure it could be simpler or read better. How would you do it?
>
> Say you've got a list of companies:
>
> Aerosonde Ltd
> Amcor
> ANCA
> Austal Ships
> Australia Post
> Australian Air Express
> Australian Defence Industries
> Australian Railroad Group
> Australian Submarine Corporation
>
> and you need to extract phrases from the company names that uniquely identify that company. The results for the above list of companies should be:
>
> Company: 'Aerosonde Ltd'
> Aliases: Aerosonde,Ltd,Aerosonde Ltd
>
> Company: 'Amcor'
> Aliases: Amcor
>
> Company: 'ANCA'
> Aliases: ANCA
>
> Company: 'Austal Ships'
> Aliases: Austal,Ships,Austal Ships
>
> Company: 'Australia Post'
> Aliases: Post,Australia Post
>
> Company: 'Australian Air Express'
> Aliases: Air,Express,Australian Air,Air Express,Australian Air Express
>
> Company: 'Australian Defence Industries'
> Aliases: Defence,Industries,Australian Defence,Defence Industries,Australian Defence Industries
>
> Company: 'Australian Railroad Group'
> Aliases: Railroad,Group,Australian Railroad,Railroad Group,Australian Railroad Group
>
> Company: 'Australian Submarine Corporation'
> Aliases: Submarine,Corporation,Australian Submarine,Submarine Corporation,Australian Submarine Corporation
>
> Here's my solution:
>
> from itertools import combinations, chain
>
> companies = [
> "Aerosonde Ltd",
> "Amcor",
> "ANCA",
> "Austal Ships",
> "Australia Post",
> "Australian Air Express",
> "Australian Defence Industries",
> "Australian Railroad Group",
> "Australian Submarine Corporation",
> ]
>
> def flatten(i):
> return list(chain.from_iterable(i))
>
> companies_as_text_stream = ' '.join(companies)
> for company in companies:
> word_combinations = [list(combinations(company.split(), r)) for r in range(1, len(company))]
> phrases = [' '.join(phrase) for phrase in flatten(word_combinations)]
> unique_phrases = [phrase for phrase in phrases if companies_as_text_stream.count(phrase) == 1]
> aliases = ','.join(unique_phrases)
> print("Company: '{0}'\n Aliases: {1}\n".format(company, aliases))
Great reply, Peter, thank you. Lots to think about.
Cheers,
Nick
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
Generating list of unique search sub-phrases Nick Mellor <thebalancepro@gmail.com> - 2015-05-29 13:39 -0700 Re: Generating list of unique search sub-phrases Peter Otten <__peter__@web.de> - 2015-05-30 01:07 +0200 Re: Generating list of unique search sub-phrases Nick Mellor <thebalancepro@gmail.com> - 2015-06-17 15:55 -0700
csiph-web