OwlgresLubm3x - ConstantB/ontop-spatial GitHub Wiki
Owlgres is a prototype of an Triple Store/SPARQL end-point that uses query rewriting technology to support OWL 2 QL semantics. Developed stopped to focus on the development of Stardog, Clark & Parsia native store that imports technology from Owlgres. Rome reported that Owlgres is incomplete w.r.t.
DL-Lite formal semantics because it doesn't rewrite with respect to existential. This is still compatible with the "official" SPARQL semantics over OWL 2 QL ontologies since they removed existential variables.
We used LUBM3x-lite. We checked the ontology using:
sh/expchk --explain --tbox data/lubm3x-lite-rdfxml.owlThis returned errors like this:
FRAGMENT ERROR: No support for axiom OWLClassAssertionAxiom On OWL Axiom: Type(telephone1xx DataProperty)
The problem where the following kind of axioms:
<owl:DataProperty rdf:about="#email"> <rdfs:subPropertyOf rdf:resource="#email1xx"/> </owl:dataproperty>
These generate axioms of the form email1xxx rdf:type DataProperty which are not supported by owlgres. To fix this one needs to remove any references to DataProperty unless needed. We did the following, we replaced the previous examples with:
That had to be replaced with
<rdf:Description rdf:about="#$1"> <rdfs:subPropertyOf rdf:resource="#$2"/> </rdf:description>
and then replaced any string owl:DataProperty with rdf:Description.
To merge each university into a single nt file we used the following bash script:
#sh #!/bin/bash echo "Generating nt files" for i in {0..49} do echo "Doing uni $i to RDFXML" rdfcat -out RDF/XML -t university-data-$i.nt > university-data-$i.rdf done
To load the data we use this script
#sh #!/bin/bash echo "creating database" sh/create --db lubm50owlgres --user postgres --passwd obdaps83 --tbox data/lubm3x-lite-rdfxml.owl echo "Generating nt files" for i in {0..49} do echo "Doing uni $i to RDFXML" time sh/load --db lubm50owlgres --user postgres --passwd password --abox /Users/mariano/Documents/Archive/Work/projects/semantic-index/uba1.7/lubm100/university-data-$i.rdf done
And execute it with this command to get a full log
( time ./loadall ) >loadlog.log 2>&1
The loop for one ontology looks like this:
Doing uni 49 to RDFXML WARN [main] (ABoxOnlyConsumer.java:138) - Ignoring statement with unknown predicate (lit): [http://www.Department11.University49.edu/UndergraduateStudent215, http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#name, UndergraduateStudent215] WARN [main] (ABoxOnlyConsumer.java:138) - Ignoring statement with unknown predicate (lit): [http://www.Department11.University49.edu/UndergraduateStudent215, http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#emailAddress, [email protected]] WARN [main] (ABoxOnlyConsumer.java:138) - Ignoring statement with unknown predicate (lit): [http://www.Department11.University49.edu/UndergraduateStudent215, http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#telephone, xxx-xxx-xxxx] WARN [main] (ABoxOnlyConsumer.java:138) - Ignoring statement with unknown predicate (lit): [http://www.Department5.University49.edu/FullProfessor0, http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#researchInterest, Research16] INFO [main] (ABoxOnlyConsumer.java:148) - 10000 statements read INFO [main] (ABoxOnlyConsumer.java:148) - 20000 statements read INFO [main] (ABoxOnlyConsumer.java:148) - 30000 statements read INFO [main] (ABoxOnlyConsumer.java:148) - 40000 statements read INFO [main] (ABoxOnlyConsumer.java:148) - 50000 statements read INFO [main] (ABoxOnlyConsumer.java:148) - 60000 statements read INFO [main] (ABoxOnlyConsumer.java:148) - 70000 statements read INFO [main] (ABoxOnlyConsumer.java:148) - 80000 statements read INFO [main] (ABoxOnlyConsumer.java:148) - 90000 statements read INFO [main] (ABoxOnlyConsumer.java:148) - 100000 statements read KB is consistent. ABox load successful. real 0m56.819s user 0m11.880s sys 0m2.867s
Note that it is possible that some data assertions where ignored because of the fixes in the TBox, in particular, some properties are not recognized as data properties. To fixed this, we made sure every data property declaration looked like this:
<rdf:Description rdf:about="#emailAddress"> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/> <rdfs:label >can be reached at</rdfs:label> <rdfs:domain rdf:resource="#Person"/> </rdf:description>
Owlgres schema is very similar to that of Quest, it uses "Universal" tables for classes and properties. In particular, it uses 3 of these tables, one for concept assertions, one for object property assertions and one for data property assertions. Additionally, quest stores the TBox assertions in the database, as well as a map from URI string to integer ID. The URI map is done for ALL URIs mentioned in the input, TBOX URI's are processed during the creation of the repository, individual URIs are generated during data loading.
Owlgres includes 2 more additional tables for annotations. Although annotations could be dealt with as simple ABox assertions, Owlgres seems to do this improve performance, avoiding that postgres considers annotation data while answering non-annotation related queries.
The full DDL generated by Owlgres for LUBM3x-lite can [attachment:owlgres.sql].
Owlgres uses B-tree indexes, simple and compound, on all tables. The following are the most important ones:
CREATE INDEX concept_assertion_concept_idx ON concept_assertion USING btree (concept); CREATE INDEX data_role_assertion_individual_role_idx ON data_role_assertion USING btree (individual, data_role); CREATE INDEX data_role_assertion_role_idx ON data_role_assertion USING btree (data_role); CREATE INDEX object_role_assertion_a_role_idx ON object_role_assertion USING btree (a, object_role); CREATE INDEX object_role_assertion_b_role_idx ON object_role_assertion USING btree (b, object_role); CREATE INDEX object_role_assertion_role_idx ON object_role_assertion USING btree (object_role); CREATE UNIQUE INDEX individual_name_name_idx ON individual_name USING btree (name); CREATE INDEX annotation_to_literal_individual_role_idx ON annotation_to_literal USING btree (individual, annotation_role); CREATE INDEX annotation_to_literal_role_idx ON annotation_to_literal USING btree (annotation_role); CREATE INDEX annotation_to_resource_a_role_idx ON annotation_to_resource USING btree (a, annotation_role); CREATE INDEX annotation_to_resource_b_role_idx ON annotation_to_resource USING btree (b, annotation_role);
Note that this doesnt include implicit indexes from the PRIMARY KEY constraints.
Owlgres defines many constraints. We believe the most important ones to be the primary keys, which allow to keep the data repetition free (although at the cost of poor load time) and that generate implicit B-tree indexes, in many cases compound indexes.
ALTER TABLE ONLY concept_assertion ADD CONSTRAINT concept_assertion_pkey PRIMARY KEY (concept, individual); ALTER TABLE ONLY data_role_assertion ADD CONSTRAINT data_role_assertion_pkey PRIMARY KEY (data_role, individual, value, datatype, language); ALTER TABLE ONLY individual_name ADD CONSTRAINT individual_name_pkey PRIMARY KEY (id); ALTER TABLE ONLY object_role_assertion ADD CONSTRAINT object_role_assertion_pkey PRIMARY KEY (object_role, a, b); ALTER TABLE ONLY tbox_concept_inclusion ADD CONSTRAINT tbox_concept_inclusion_pkey PRIMARY KEY (sub, super); ALTER TABLE ONLY tbox_data_role_inclusion ADD CONSTRAINT tbox_data_role_inclusion_pkey PRIMARY KEY (sub, super); ALTER TABLE ONLY tbox_name ADD CONSTRAINT tbox_name_pkey PRIMARY KEY (id); ALTER TABLE ONLY tbox_object_role_inclusion ADD CONSTRAINT tbox_object_role_inclusion_pkey PRIMARY KEY (sub, super); ALTER TABLE ONLY annotation_to_literal ADD CONSTRAINT annotation_to_literal_pkey PRIMARY KEY (annotation_role, individual, value, datatype, language); ALTER TABLE ONLY annotation_to_resource ADD CONSTRAINT annotation_to_resource_pkey PRIMARY KEY (annotation_role, a, b);