Evaluating Lingpipe - Texera/texera GitHub Wiki
Wiki Page Author: Hailey Pan
Reviewed by: Chen Li
LingPipe is a tool kit for processing text using computational linguistics. It can be used to do tasks such as:
- Named-entity recognition
- Automatically classify Twitter search results into categories
- Suggest correct spellings of queries
The JAR is available for download at http://alias-i.com/lingpipe/web/download.html.
LingPipe uses statistically trained models to do extraction for a given query. One trained model can only focus on one kind of extraction. We wrote an example program to use Lingpipe to extract information from a sample data set of MEDLINE abstracts using an English genes model trained for Named-entity recognition. This example can be found in the Texera code under the folder (subject to change)
texera/texera/texera-sandbox/src/main/java/edu/uci/ics/texera/sandbox/lingpipeexample/LingpipeExample.java
This model can only recognize the names of genes, so it tags every chunk from the dataset as GENE. The following is the output:
text= "pmid" type=GENE<br></br>
text= title" type=GENE<br></br>
text= issue" type=GENE<br></br>
text= title" type=GENE<br></br>
text= gentlemen type=GENE<br></br>
text= Epoch type=GENE<br></br>
text= struggles type=GENE<br></br>
text= Gentlemen type=GENE<br></br>
text= "zipf type=GENE<br></br>
http://alias-i.com/lingpipe/demos/tutorial/read-me.html