Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!news.mixmin.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.070 X-Spam-Evidence: '*H*': 0.86; '*S*': 0.00; "'')": 0.09; 'sentence': 0.09; 'attempted': 0.16; 'from:addr:mrabarnett.plus.com': 0.16; 'from:addr:python': 0.16; 'from:name:mrab': 0.16; 'message- id:@mrabarnett.plus.com': 0.16; 'words.': 0.16; 'wrote:': 0.18; 'appears': 0.22; 'import': 0.22; 'header:User-Agent:1': 0.23; 'this:': 0.26; 'defined': 0.27; 'header:In-Reply-To:1': 0.27; 'beginning': 0.33; 'problem': 0.35; 'but': 0.35; 'there': 0.35; 'i.e.': 0.36; 'hi,': 0.36; 'list': 0.37; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'extended': 0.61; 'full': 0.61; 'become': 0.64; 'within': 0.65; 'header:Reply-To:1': 0.67; 'reply- to:no real name:2**0': 0.71; 'reply-to:addr:python.org': 0.84; 'drink': 0.93 X-CM-Score: 0.00 X-CNFS-Analysis: v=2.1 cv=bKAvfpOZ c=1 sm=1 tr=0 a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17 a=0Bzu9jTXAAAA:8 a=LqG8EI-UCiYA:10 a=ihvODaAuJD4A:10 a=OUOv7kDek9cA:10 a=8nJEP1OIZ-IA:10 a=EBOSESyhAAAA:8 a=8AHkEIZyAAAA:8 a=0BJx7jLGQ-wA:10 a=O0lEaeldPKBt8p68TqcA:9 a=wPNLvfGTeEIA:10 X-AUTH: mrabarnett:2500 Date: Sat, 28 Sep 2013 18:00:29 +0100 From: MRAB User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:24.0) Gecko/20100101 Thunderbird/24.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: replace only full words References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: python-list@python.org List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 25 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1380387623 news.xs4all.nl 15896 [2001:888:2000:d::a6]:35744 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:54957 On 28/09/2013 17:11, cerr wrote: > Hi, > > I have a list of sentences and a list of words. Every full word that appears within sentence shall be extended by i.e. "I drink in the house." Would become "I in the ." (and not "I in the .")I have attempted it like this: > for sentence in sentences: > for noun in nouns: > if " "+noun+" " in sentence or " "+noun+"?" in sentence or " "+noun+"!" in sentence or " "+noun+"." in sentence: > sentence = sentence.replace(noun, '<' + noun + '>') > > print(sentence) > > but what if The word is in the beginning of a sentence and I also don't like the approach using defined word terminations. Also, is there a way to make it faster? > It sounds like a regex problem to me: import re nouns = ["drink", "house"] pattern = re.compile(r"\b(" + "|".join(nouns) + r")\b") for sentence in sentences: sentence = pattern.sub(r"<\g<0>>", sentence) print(sentence)