Virtuoso Freebase Setup - sameersingh/nlp_serde GitHub Wiki
Virtuoso Freebase Setup
Adding some notes here to keep track of how Virtuoso Freebase was setup, and how to query it using SPARQL.
Creating the dump
- Install Virtuoso Open-source on Ubuntu using
sudo aptitude install vituoso-opensource
- Ensure
/var/lib/virtuoso-opensource-6.1/db
is linked to HDD with a lot of space. - Get the freebase dump using
wget http://download.freebaseapps.com/
into thedb
folder - Gunzip it (requires ~330G):
gunzip freebase-rdf-*.gz
- Load RDF triples into virtuoso:
isql-vt 1111
- Register load request:
SQL> ld_dir('.', 'freebase-rdf-*', 'http://freebase.com');
- To see if the request registered:
SQL> select * from DB.DBA.load_list;
SQL> rdf_loader_run();
- In another
isql-vt
window:SQL> SPARQL SELECT ?g COUNT(*) { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER BY DESC 2;
Resources
- http://sivareddy.in/load-freebase-dump-into-virtuoso-sparql-sql
- http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtBulkRDFLoader
Querying Freebase Using SPARQL
Based on https://groups.google.com/forum/#!topic/sindicetech-freebase/93PBGJBnnIU.
Basic steps for each query:
- Point browser to http://localhost:8890/sparql.
- Ensure query produces expected output (use http://freebase.com as the Graph IRI)
- Run same query using
curl
with limits off, TSV format, etc.
Queries
For the complete reference, see Freebase types and relations, and Virtuoso SPARQL service.
Get number of triples in the DB.
SELECT COUNT(*) {
?s ?p ?o
}
Get all relations of a mention.
PREFIX ns: <http://rdf.freebase.com/ns/>
select * where {
ns:m.014zcr ?p ?o
}
LIMIT 10
For multi-hop relations, one would do:
PREFIX ns: <http://rdf.freebase.com/ns/>
select * where {
ns:m.014zcr ns:film.actor.film ?film_performance .
?film_performance ns:film.performance.film ?film .
?film ns:type.object.name ?name .
?film ns:film.film.initial_release_date ?initial_release_date .
FILTER(lang(?name) = 'en')
}
LIMIT 1