Downloads - dbpedia-spotlight/dbpedia-spotlight GitHub Wiki
In order to run DBpedia Spotlight on your server, you need to download our software and required data, which will vary in size depending on the kind of annotations that you need.
The latest source code is available from the project's Git repository and can be browsed online.
Please refer to our installation instructions for more detailed information on how to install DBpedia Spotlight.
Since we rely on data extracted from the entire Wikipedia, we cannot embed the dataset into our software distribution. We therefore provide here a list of required files in different sizes to suit many needs. You can also build these files yourself if you desire (see index module).
As our development progresses, the system may require different datasets to enable more sophisticated algorithms. Therefore, we organized this section in accordance to the software release tags. Please make sure to use the data required by the release that you have downloaded. Since trunk is very cutting edge, make sure to consult the discussion list if your build breaks - it may need some recently generated dataset.
Furthermore, we we rely on some DBpedia datasets for the generation of dictionaries and resource types, among other information. They that can be downloaded at http://dbpedia.org/downloads.
If you would like to run DBpedia Spotlight in your server, you will need data from the two files below:
- Disambiguation index (Lucene) compact (tar.gz), large (tar.gz)
- Spotter lexicon (~LingPipe dictionary) small (gz), medium (gz), large (gz)
- Spot selection model: (tar.gz)
- DBpedia Lexicalizations dataset n-quads.tar.gz
If you are running indexing, then you will also need our stopwords file (or create your own)
- stopwords_en.list
If you would like to run DBpedia Spotlight in your server, you will need data from the two files below:
- Disambiguation index (Lucene) compact (tar.gz), large (tar.gz)
- Spotter lexicon (~LingPipe dictionary) small (gz), medium (gz), large (gz)
- Spot selection cooccurrence model: (tar.gz)
- OpenNLP models for NERSpotter and OpenNLPNGramSpotter (tar.gz)
- stopwords_en.list
Assuming you have already downloaded and decompressed the files below:
wget http://spotlight.dbpedia.org/download/release-0.5/context-index-compact.tgz tar zxvf context-index-compact.tgz wget http://spotlight.dbpedia.org/download/release-0.4/surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary.gz gunzip surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary.gz
Now you just need to change the server.properties file to point to your newly extracted files:
org.dbpedia.spotlight.index.dir = index-withSF-withTypes-compressed org.dbpedia.spotlight.spot.dictionary = surface_forms-Wikipedia-TitRedDis.uriThresh75.tsv.spotterDictionary
More info on how to: