Evaluating Lingpipe - Texera/texera GitHub Wiki

Wiki Page Author: Hailey Pan

Reviewed by: Chen Li

LingPipe is a tool kit for processing text using computational linguistics. It can be used to do tasks such as:

  • Named-entity recognition
  • Automatically classify Twitter search results into categories
  • Suggest correct spellings of queries

The JAR is available for download at http://alias-i.com/lingpipe/web/download.html.

LingPipe uses statistically trained models to do extraction for a given query. One trained model can only focus on one kind of extraction. We wrote an example program to use Lingpipe to extract information from a sample data set of MEDLINE abstracts using an English genes model trained for Named-entity recognition. This example can be found in the Texera code under the folder (subject to change) texera/texera/texera-sandbox/src/main/java/edu/uci/ics/texera/sandbox/lingpipeexample/LingpipeExample.java

This model can only recognize the names of genes, so it tags every chunk from the dataset as GENE. The following is the output:

text= "pmid" type=GENE<br></br>
text= title" type=GENE<br></br>
text= issue" type=GENE<br></br>
text= title" type=GENE<br></br>
text= gentlemen type=GENE<br></br>
text= Epoch type=GENE<br></br>
text= struggles type=GENE<br></br>
text= Gentlemen type=GENE<br></br>
text= "zipf type=GENE<br></br>

Reference


http://alias-i.com/lingpipe/demos/tutorial/read-me.html
⚠️ **GitHub.com Fallback** ⚠️