Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #54961
| From | Jussi Piitulainen <jpiitula@ling.helsinki.fi> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: replace only full words |
| Date | 2013-09-28 20:37 +0300 |
| Organization | University of Helsinki |
| Message-ID | <qotpprsoniq.fsf@ruuvi.it.helsinki.fi> (permalink) |
| References | <bd024ecf-2428-4d6a-bc0c-163112b31842@googlegroups.com> <mailman.423.1380387623.18130.python-list@python.org> |
MRAB writes:
> On 28/09/2013 17:11, cerr wrote:
> > Hi,
> >
> > I have a list of sentences and a list of words. Every full word
> > that appears within sentence shall be extended by <WORD> i.e. "I
> > drink in the house." Would become "I <drink> in the <house>." (and
> > not "I <d<rink> in the <house>.")I have attempted it like this:
>
> > for sentence in sentences:
> > for noun in nouns:
> > if " "+noun+" " in sentence or " "+noun+"?" in sentence or " "+noun+"!" in sentence or " "+noun+"." in sentence:
> > sentence = sentence.replace(noun, '<' + noun + '>')
> >
> > print(sentence)
> >
> > but what if The word is in the beginning of a sentence and I also
> > don't like the approach using defined word terminations. Also, is
> > there a way to make it faster?
> >
> It sounds like a regex problem to me:
>
> import re
>
> nouns = ["drink", "house"]
>
> pattern = re.compile(r"\b(" + "|".join(nouns) + r")\b")
>
> for sentence in sentences:
> sentence = pattern.sub(r"<\g<0>>", sentence)
> print(sentence)
Maybe tokenize by a regex and then join the replacements of all
tokens:
import re
def substitute(token):
if isfullword(token.lower()):
return '<{}>'.format(token)
else:
return token
def tokenize(sentence):
return re.split(r'(\W)', sentence)
sentence = 'This is, like, a test.'
tokens = map(substitute, tokenize(sentence))
sentence = ''.join(tokens)
For better results, both tokenization and substitution need to depend
on context. Doing some of that should be an interesting exercise.
Back to comp.lang.python | Previous | Next — Previous in thread | Find similar | Unroll thread
replace only full words cerr <ron.eggler@gmail.com> - 2013-09-28 09:11 -0700
Re: replace only full words Tim Chase <python.list@tim.thechases.com> - 2013-09-28 11:54 -0500
Re: replace only full words cerr <ron.eggler@gmail.com> - 2013-09-28 10:43 -0700
Re: replace only full words MRAB <python@mrabarnett.plus.com> - 2013-09-28 19:07 +0100
Re: replace only full words cerr <ron.eggler@gmail.com> - 2013-09-28 11:25 -0700
Re: replace only full words Tim Chase <python.list@tim.thechases.com> - 2013-09-28 13:17 -0500
Re: replace only full words cerr <ron.eggler@gmail.com> - 2013-09-28 11:25 -0700
Re: replace only full words MRAB <python@mrabarnett.plus.com> - 2013-09-28 18:00 +0100
Re: replace only full words Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-09-28 20:37 +0300
csiph-web