Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #99790 > unrolled thread

Exclude text within quotation marks and words beginning with a capital letter

Started byKevin Glover <kevingloveruk@gmail.com>
First post2015-12-01 03:17 -0800
Last post2015-12-04 10:38 -0700
Articles 3 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  Exclude text within quotation marks and words beginning with a capital letter Kevin Glover <kevingloveruk@gmail.com> - 2015-12-01 03:17 -0800
    Re: Exclude text within quotation marks and words beginning with a capital letter Peter Otten <__peter__@web.de> - 2015-12-01 13:43 +0100
    Re: Exclude text within quotation marks and words beginning with a capital letter Jason Friedman <jsf80238@gmail.com> - 2015-12-04 10:38 -0700

#99790 — Exclude text within quotation marks and words beginning with a capital letter

FromKevin Glover <kevingloveruk@gmail.com>
Date2015-12-01 03:17 -0800
SubjectExclude text within quotation marks and words beginning with a capital letter
Message-ID<a1da3bd6-5643-46b4-bcc5-69659367d491@googlegroups.com>
I am working on a program that is written in Python 2.7 to be compatible with the POS tagger that I import from Pattern. The tagger identifies all the nouns in a text. I need to exclude from the tagger any text that is within quotation marks, and also any word that begins with an upper case letter (including words at the beginning of sentences).

Any advice on coding that would be gratefully received. Thanks.

Kevin

[toc] | [next] | [standalone]


#99797

FromPeter Otten <__peter__@web.de>
Date2015-12-01 13:43 +0100
Message-ID<mailman.72.1448973862.14615.python-list@python.org>
In reply to#99790
Kevin Glover wrote:

> I am working on a program that is written in Python 2.7 to be compatible
> with the POS tagger that I import from Pattern. The tagger identifies all
> the nouns in a text. I need to exclude from the tagger any text that is
> within quotation marks, and also any word that begins with an upper case
> letter (including words at the beginning of sentences).
> 
> Any advice on coding that would be gratefully received. Thanks.

How about removing them afterwards?

>>> def skip_quoted(pairs):
...     quoted = False
...     for a, b in pairs:
...             if a == '"':
...                     quoted = not quoted
...             elif not quoted:
...                     yield a, b
... 
>>> from pattern.en import tag
>>> [p for p in skip_quoted(tag('Did you say "Hello world"?')) if not p[0]
[0].isupper()]
[(u'you', u'PRP'), (u'say', u'VB'), (u'?', u'.')]

[toc] | [prev] | [next] | [standalone]


#100009

FromJason Friedman <jsf80238@gmail.com>
Date2015-12-04 10:38 -0700
Message-ID<mailman.202.1449250726.14615.python-list@python.org>
In reply to#99790
>
> I am working on a program that is written in Python 2.7 to be compatible
> with the POS tagger that I import from Pattern. The tagger identifies all
> the nouns in a text. I need to exclude from the tagger any text that is
> within quotation marks, and also any word that begins with an upper case
> letter (including words at the beginning of sentences).
>
> Any advice on coding that would be gratefully received. Thanks.
>

Perhaps overkill, but wanted to make sure you knew about the Natural
Language Toolkit:  http://www.nltk.org/.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web