Model implementation - dbpedia-spotlight/dbpedia-spotlight GitHub Wiki
Our latest implementation is based on statistical methods and is available in a number of languages. Data collection can be performed on a Hadoop cluster using our version of PigNLProc. More details on the indexing process of this implementation can be found here and a fully automated indexing tool can be found here.
Open issues and questions
There are still several open issues with this implementation, see the open issues listed in our Issue tracker.
Q: Can the memory footprint be reduced? A: The memory footprint of this implementation is mainly due to context words, there are three ways to reduce it: 1. use disk-based context instead of memory-based context lookup (see Issue #187) 2. do not consider context (en_small.tar.gz) 3. Prune context data (see Issue #167).
Q: I want to pass a parameter to show more or fewer entities depending on their score. A: See Issue #188
Downloads
You can also use Spotlight out of the box on a Linux machine by following this guide.
For the memory requirements of the models, see our paper. As the English model is fairly big, en_small.tar.gz
is a low-memory alternative for the English model that does not consider context words and hence will provide lower accuracy.