Path: csiph.com!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: Jason Friedman Newsgroups: comp.lang.python Subject: Re: Exclude text within quotation marks and words beginning with a capital letter Date: Fri, 4 Dec 2015 10:38:43 -0700 Lines: 12 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de r/bZbV/9otXMyZiKLiWTNgGmrfJSpgnO6Bo4A5qlkvCw== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.028 X-Spam-Evidence: '*H*': 0.94; '*S*': 0.00; 'subject:text': 0.04; 'cc:addr:python-list': 0.09; 'python': 0.10; '2.7': 0.13; 'subject: \n ': 0.15; 'received:io': 0.16; 'received:psf.io': 0.16; 'thanks.': 0.18; 'language': 0.19; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'pos': 0.22; 'import': 0.24; 'words': 0.24; 'written': 0.24; 'header:In-Reply-To:1': 0.24; 'compatible': 0.27; 'coding': 0.27; 'message-id:@mail.gmail.com': 0.27; 'exclude': 0.29; 'identifies': 0.29; '(including': 0.30; 'advice': 0.35; 'received:google.com': 0.35; 'text': 0.35; 'text.': 0.35; 'received:74.125.82': 0.35; 'but': 0.36; 'url:org': 0.36; 'beginning': 0.36; 'subject:: ': 0.37; 'wanted': 0.37; 'sure': 0.39; 'subject:with': 0.40; 'within': 0.64; 'natural': 0.67; 'upper': 0.76; 'subject:letter': 0.84; 'to:none': 0.91; 'quotation': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=Ng1jP8YtR+gixRM3Bc0iMxWr91880/59ytznmdnrWRE=; b=ZyUpgw8Ieb/7pekVTIwA7qN5EvlXldq2HtLyY/EUZHLPPmTQVz7bRJ/5Kan25P6Db6 secgvu2sw95HGkF5rmhYESmBrq8tVEn5k75QHys4gQ8PrjpJo7OFR6xLXRZLoim1x8hU lRh47BB+SEBMfrVYuDTf5eroaT52/XxPPajcGnWbc4kW3naQv2Wgd8LDWOWN00Q4Ob2B 9JfQd3xGkmhueZ4rGWKAG6VjkNpsmEEVOfdly2QXhl44r8d9qJsA68/fdhn17hVn2Uw/ JJJ5AQy6w1VuDlvxLEpqIgTrj9LAc49790uMxFUyOsI0osED5/RkwLb3lmBuqHyJdIfw 2GBQ== X-Received: by 10.194.250.39 with SMTP id yz7mr20501208wjc.92.1449250723865; Fri, 04 Dec 2015 09:38:43 -0800 (PST) In-Reply-To: X-Content-Filtered-By: Mailman/MimeDel 2.1.20+ X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:100009 > > I am working on a program that is written in Python 2.7 to be compatible > with the POS tagger that I import from Pattern. The tagger identifies all > the nouns in a text. I need to exclude from the tagger any text that is > within quotation marks, and also any word that begins with an upper case > letter (including words at the beginning of sentences). > > Any advice on coding that would be gratefully received. Thanks. > Perhaps overkill, but wanted to make sure you knew about the Natural Language Toolkit: http://www.nltk.org/.