Lucene Architecture - dbpedia-spotlight/dbpedia-spotlight GitHub Wiki
The DBpedia Spotlight Architecture is composed by the following modules:
- Web application, a demonstration client (HTML/Javascript interface) that allows users to enter/paste text into a Web browser and visualize the resulting annotated text.
- Web Service, a RESTful/SOAP? Web API that exposes the functionality of annotating and/or disambiguating entities in text.
- Annotation Java/Scala API, exposing the underlying logic that performs the annotation/disambiguation.
- Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.
- Evaluation module, where we test disambiguators, log results and use those to train our system to perform better.
External dependencies:
- DBpedia Extraction Framework, (only for the index module) extracting the necessary data from the Wikipedia dumps.
- Lucene 3.6, providing the low level indexing framework used by DBpedia Spotlight.
- LingPipe 4.0.0, providing the string matching implementation used for the Spotter module.
System Requirements
- Java 1.7+
- Scala 2.9+
- Spotlight JAR
- Spotlight Library JARs
- Lucene disambiguation index
- Spotter dictionary
- large RAM to set the heap size big enough for the Spotter (approx. 8G)
- Maven 3 for the automatic installation of dependencies.
- Indexing Java/Scala API, executing the data processing necessary to enable the annotation/disambiguation algorithms used.