OwlgresLubm3x - ConstantB/ontop-spatial GitHub Wiki

Table of Contents Owlgres Input TBOX Input ABoxes Loading Owlgres schema Indexes Constraints

Owlgres

Owlgres is a prototype of an Triple Store/SPARQL end-point that uses query rewriting technology to support OWL 2 QL semantics. Developed stopped to focus on the development of Stardog, Clark & Parsia native store that imports technology from Owlgres. Rome reported that Owlgres is incomplete w.r.t.
DL-Lite formal semantics because it doesn't rewrite with respect to existential. This is still compatible with the "official" SPARQL semantics over OWL 2 QL ontologies since they removed existential variables.

Input TBOX

We used LUBM3x-lite. We checked the ontology using:

sh/expchk --explain --tbox data/lubm3x-lite-rdfxml.owl

This returned errors like this:

FRAGMENT ERROR: No support for axiom OWLClassAssertionAxiom
	On OWL Axiom: Type(telephone1xx DataProperty)

The problem where the following kind of axioms:

<owl:DataProperty rdf:about="#email">
    <rdfs:subPropertyOf rdf:resource="#email1xx"/>
</owl:dataproperty>

These generate axioms of the form email1xxx rdf:type DataProperty which are not supported by owlgres. To fix this one needs to remove any references to DataProperty unless needed. We did the following, we replaced the previous examples with:

That had to be replaced with

<rdf:Description rdf:about="#$1">
    <rdfs:subPropertyOf rdf:resource="#$2"/>
</rdf:description>

and then replaced any string owl:DataProperty with rdf:Description.

Input ABoxes

To merge each university into a single nt file we used the following bash script:

#sh
#!/bin/bash
echo "Generating nt files"
for i in {0..49}
  do
     echo "Doing uni $i to RDFXML"
     rdfcat -out RDF/XML -t university-data-$i.nt  > university-data-$i.rdf
 done

Loading

To load the data we use this script

#sh
#!/bin/bash
echo "creating database"
sh/create --db lubm50owlgres --user postgres --passwd obdaps83 --tbox data/lubm3x-lite-rdfxml.owl 
echo "Generating nt files"
for i in {0..49}
  do
     echo "Doing uni $i to RDFXML"
     time sh/load --db lubm50owlgres --user postgres --passwd password --abox /Users/mariano/Documents/Archive/Work/projects/semantic-index/uba1.7/lubm100/university-data-$i.rdf
 done

And execute it with this command to get a full log

( time ./loadall ) >loadlog.log 2>&1

The loop for one ontology looks like this:

Doing uni 49 to RDFXML
 WARN [main] (ABoxOnlyConsumer.java:138) - Ignoring statement with unknown predicate (lit): [http://www.Department11.University49.edu/UndergraduateStudent215, http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#name, UndergraduateStudent215]
 WARN [main] (ABoxOnlyConsumer.java:138) - Ignoring statement with unknown predicate (lit): [http://www.Department11.University49.edu/UndergraduateStudent215, http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#emailAddress, [email protected]]
 WARN [main] (ABoxOnlyConsumer.java:138) - Ignoring statement with unknown predicate (lit): [http://www.Department11.University49.edu/UndergraduateStudent215, http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#telephone, xxx-xxx-xxxx]
 WARN [main] (ABoxOnlyConsumer.java:138) - Ignoring statement with unknown predicate (lit): [http://www.Department5.University49.edu/FullProfessor0, http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl#researchInterest, Research16]
 INFO [main] (ABoxOnlyConsumer.java:148) - 10000 statements read
 INFO [main] (ABoxOnlyConsumer.java:148) - 20000 statements read
 INFO [main] (ABoxOnlyConsumer.java:148) - 30000 statements read
 INFO [main] (ABoxOnlyConsumer.java:148) - 40000 statements read
 INFO [main] (ABoxOnlyConsumer.java:148) - 50000 statements read
 INFO [main] (ABoxOnlyConsumer.java:148) - 60000 statements read
 INFO [main] (ABoxOnlyConsumer.java:148) - 70000 statements read
 INFO [main] (ABoxOnlyConsumer.java:148) - 80000 statements read
 INFO [main] (ABoxOnlyConsumer.java:148) - 90000 statements read
 INFO [main] (ABoxOnlyConsumer.java:148) - 100000 statements read
KB is consistent.
ABox load successful.

real	0m56.819s
user	0m11.880s
sys	0m2.867s

Note that it is possible that some data assertions where ignored because of the fixes in the TBox, in particular, some properties are not recognized as data properties. To fixed this, we made sure every data property declaration looked like this:

    <rdf:Description rdf:about="#emailAddress">                           
    <rdf:type
    rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/>
        <rdfs:label
            >can be reached at</rdfs:label>
        <rdfs:domain rdf:resource="#Person"/>
    </rdf:description>

Owlgres schema

Owlgres schema is very similar to that of Quest, it uses "Universal" tables for classes and properties. In particular, it uses 3 of these tables, one for concept assertions, one for object property assertions and one for data property assertions. Additionally, quest stores the TBox assertions in the database, as well as a map from URI string to integer ID. The URI map is done for ALL URIs mentioned in the input, TBOX URI's are processed during the creation of the repository, individual URIs are generated during data loading.

Owlgres includes 2 more additional tables for annotations. Although annotations could be dealt with as simple ABox assertions, Owlgres seems to do this improve performance, avoiding that postgres considers annotation data while answering non-annotation related queries.

The full DDL generated by Owlgres for LUBM3x-lite can [attachment:owlgres.sql].

Indexes

Owlgres uses B-tree indexes, simple and compound, on all tables. The following are the most important ones:

CREATE INDEX concept_assertion_concept_idx ON concept_assertion USING btree (concept);
CREATE INDEX data_role_assertion_individual_role_idx ON data_role_assertion USING btree (individual, data_role);
CREATE INDEX data_role_assertion_role_idx ON data_role_assertion USING btree (data_role);

CREATE INDEX object_role_assertion_a_role_idx ON object_role_assertion USING btree (a, object_role);
CREATE INDEX object_role_assertion_b_role_idx ON object_role_assertion USING btree (b, object_role);
CREATE INDEX object_role_assertion_role_idx ON object_role_assertion USING btree (object_role);

CREATE UNIQUE INDEX individual_name_name_idx ON individual_name USING btree (name);

CREATE INDEX annotation_to_literal_individual_role_idx ON annotation_to_literal USING btree (individual, annotation_role);
CREATE INDEX annotation_to_literal_role_idx ON annotation_to_literal USING btree (annotation_role);
CREATE INDEX annotation_to_resource_a_role_idx ON annotation_to_resource USING btree (a, annotation_role);
CREATE INDEX annotation_to_resource_b_role_idx ON annotation_to_resource USING btree (b, annotation_role);

Note that this doesnt include implicit indexes from the PRIMARY KEY constraints.

Constraints

Owlgres defines many constraints. We believe the most important ones to be the primary keys, which allow to keep the data repetition free (although at the cost of poor load time) and that generate implicit B-tree indexes, in many cases compound indexes.

ALTER TABLE ONLY concept_assertion
    ADD CONSTRAINT concept_assertion_pkey PRIMARY KEY (concept, individual);
ALTER TABLE ONLY data_role_assertion
    ADD CONSTRAINT data_role_assertion_pkey PRIMARY KEY (data_role, individual, value, datatype, language);
ALTER TABLE ONLY individual_name
    ADD CONSTRAINT individual_name_pkey PRIMARY KEY (id);
ALTER TABLE ONLY object_role_assertion
    ADD CONSTRAINT object_role_assertion_pkey PRIMARY KEY (object_role, a, b);


ALTER TABLE ONLY tbox_concept_inclusion
    ADD CONSTRAINT tbox_concept_inclusion_pkey PRIMARY KEY (sub, super);
ALTER TABLE ONLY tbox_data_role_inclusion
    ADD CONSTRAINT tbox_data_role_inclusion_pkey PRIMARY KEY (sub, super);
ALTER TABLE ONLY tbox_name
    ADD CONSTRAINT tbox_name_pkey PRIMARY KEY (id);
ALTER TABLE ONLY tbox_object_role_inclusion
    ADD CONSTRAINT tbox_object_role_inclusion_pkey PRIMARY KEY (sub, super);

                                                                               
ALTER TABLE ONLY annotation_to_literal
    ADD CONSTRAINT annotation_to_literal_pkey PRIMARY KEY (annotation_role, individual, value, datatype, language);
ALTER TABLE ONLY annotation_to_resource
    ADD CONSTRAINT annotation_to_resource_pkey PRIMARY KEY (annotation_role, a, b);