Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #99790 > unrolled thread
| Started by | Kevin Glover <kevingloveruk@gmail.com> |
|---|---|
| First post | 2015-12-01 03:17 -0800 |
| Last post | 2015-12-04 10:38 -0700 |
| Articles | 3 — 3 participants |
Back to article view | Back to comp.lang.python
Exclude text within quotation marks and words beginning with a capital letter Kevin Glover <kevingloveruk@gmail.com> - 2015-12-01 03:17 -0800
Re: Exclude text within quotation marks and words beginning with a capital letter Peter Otten <__peter__@web.de> - 2015-12-01 13:43 +0100
Re: Exclude text within quotation marks and words beginning with a capital letter Jason Friedman <jsf80238@gmail.com> - 2015-12-04 10:38 -0700
| From | Kevin Glover <kevingloveruk@gmail.com> |
|---|---|
| Date | 2015-12-01 03:17 -0800 |
| Subject | Exclude text within quotation marks and words beginning with a capital letter |
| Message-ID | <a1da3bd6-5643-46b4-bcc5-69659367d491@googlegroups.com> |
I am working on a program that is written in Python 2.7 to be compatible with the POS tagger that I import from Pattern. The tagger identifies all the nouns in a text. I need to exclude from the tagger any text that is within quotation marks, and also any word that begins with an upper case letter (including words at the beginning of sentences). Any advice on coding that would be gratefully received. Thanks. Kevin
[toc] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2015-12-01 13:43 +0100 |
| Message-ID | <mailman.72.1448973862.14615.python-list@python.org> |
| In reply to | #99790 |
Kevin Glover wrote:
> I am working on a program that is written in Python 2.7 to be compatible
> with the POS tagger that I import from Pattern. The tagger identifies all
> the nouns in a text. I need to exclude from the tagger any text that is
> within quotation marks, and also any word that begins with an upper case
> letter (including words at the beginning of sentences).
>
> Any advice on coding that would be gratefully received. Thanks.
How about removing them afterwards?
>>> def skip_quoted(pairs):
... quoted = False
... for a, b in pairs:
... if a == '"':
... quoted = not quoted
... elif not quoted:
... yield a, b
...
>>> from pattern.en import tag
>>> [p for p in skip_quoted(tag('Did you say "Hello world"?')) if not p[0]
[0].isupper()]
[(u'you', u'PRP'), (u'say', u'VB'), (u'?', u'.')]
[toc] | [prev] | [next] | [standalone]
| From | Jason Friedman <jsf80238@gmail.com> |
|---|---|
| Date | 2015-12-04 10:38 -0700 |
| Message-ID | <mailman.202.1449250726.14615.python-list@python.org> |
| In reply to | #99790 |
> > I am working on a program that is written in Python 2.7 to be compatible > with the POS tagger that I import from Pattern. The tagger identifies all > the nouns in a text. I need to exclude from the tagger any text that is > within quotation marks, and also any word that begins with an upper case > letter (including words at the beginning of sentences). > > Any advice on coding that would be gratefully received. Thanks. > Perhaps overkill, but wanted to make sure you knew about the Natural Language Toolkit: http://www.nltk.org/.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web