Path: csiph.com!usenet.pasdenom.info!news.etla.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <90b8ca83-fb81-40d6-a864-f1c0e07bca76@googlegroups.com>
References: <90b8ca83-fb81-40d6-a864-f1c0e07bca76@googlegroups.com>
Date: Tue, 1 Oct 2013 16:24:37 +0200
Subject: Re: extraction tool using CRF++
From: Vlastimil Brom <vlastimil.brom@gmail.com>
To: python <python-list@python.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.560.1380637486.18130.python-list@python.org>
Lines: 67
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:55215

2013/10/1 cerr <ron.eggler@gmail.com>:
> Hi,
>
> I want to write an extraction tool using CRF++ (http://crfpp.googlecode.c=
om/svn/trunk/doc/index.html).
> I have written a trainings file and a template:
> training:
> banana  FOOD    B-NP
> bread   FOOD    I-NP
> template:
> U01:%x[0,1]
> U02:%x[1,1]
>
> and now I want to go ahead and extract the foods from a sentence like "ho=
w do I make a banana bread". Also, I'm unsure how I interface to crf++ with=
 python, I compiled and installed it from source as described on the above =
website but I don't have a crf module available in python...
> --
> https://mail.python.org/mailman/listinfo/python-list


Hi,
I have unfortunately no experience with CRF++; if there is no python
wrapper for it available, the usage might not be (easily) possible -
depending on the character of this library, you may try accessing it
e.g. via ctypes.

Alternatively, you may try another packages already available, e.g.
NLTK:  http://nltk.org/

>>> import nltk
>>> any(synset.lexname =3D=3D "noun.food" for synset in nltk.corpus.wordnet=
.synsets("apple"))
True
>>> any(synset.lexname =3D=3D "noun.food" for synset in nltk.corpus.wordnet=
.synsets("bread"))
True
>>> any(synset.lexname =3D=3D "noun.food" for synset in nltk.corpus.wordnet=
.synsets("wine"))
True
>>> any(synset.lexname =3D=3D "noun.food" for synset in nltk.corpus.wordnet=
.synsets("book"))
False
>>> any(synset.lexname =3D=3D "noun.food" for synset in nltk.corpus.wordnet=
.synsets("pencil"))
False

# of course there might be some surprise, probably due to polysemy ore
some specifics of the semantic description...

>>> any(synset.lexname =3D=3D "noun.food" for synset in nltk.corpus.wordnet=
.synsets("dog"))
True
>>> any(synset.lexname =3D=3D "noun.food" for synset in nltk.corpus.wordnet=
.synsets("white"))
True
>>>

cf.
http://nltk.org/
http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html
http://www.velvetcache.org/2010/03/01/looking-up-words-in-a-dictionary-usin=
g-python
http://wordnet.princeton.edu/man/lexnames.5WN.html

hth,
   vbr