Lubm3xData - ConstantB/ontop-spatial GitHub Wiki

This pages describes the way that we loaded the data generated by LUBM into Quest

Issues to address

At the moment Quest uses the OWLAPi to load TBox and ABoxes. This is very inefficient for large ABoxes. We need a lighter mechanism where little parsing is done and where streaming of triples is possible.

Solution:

  • Generate all LUBM data files.
  • Transform and merge all the data in a simple triple file (e.g., N-Triple)
  • Create a new ABox assertion streamer that read the file line by line with very simple parsing.

Generating the files

This is done using the traditional LUBM data generator tool using the command:

java -cp classes/ edu.lehigh.swat.bench.uba.Generator -univ 1000 -onto http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl

Transforming and merging (Source)

To do this we will use Jena, in particular the command rdfcat.

  • Setting up Jena. Download Jena and setup your environment as follows:
    • Add the following to your .bashrc file.
    1. export JENAROOT=~/Documents/OBDA/related_software/Jena-2.6.4
      export PATH=$JENAROOT/bin:$PATH
      
  • Execute the command
  1. chmod u+x $JENAROOT/bin/*
    
With Jena configured, we can now process the original data and dump it as N-Triples with the command
find . -type f -name "University*.owl" -exec rdfcat -out N-TRIPLE -x {} >> University0-99.nt <br>;

We also need to remove imports and other non-data triples with the commands:

cat University0-99.nt | grep -v http://www.w3.org/2002/07/owl#Ont > University0-99-clean.nt
cat University0-99-clean.nt | grep -v http://www.w3.org/2002/07/owl#imports > University0-99-clean2.nt

To merge each university into a single nt file we used the following bash script:

#sh
#!/bin/bash
echo "Generating nt files"
for i in {0..99}
  do
     echo "Doing uni $i"
     find . -type f -name "University$i_*.owl" -exec rdfcat -out N-TRIPLE -x {} >> uni$i.nt <br>;
 done  

To clean all files we did

#sh
#!/bin/bash
echo "Cleaning nt files"
for i in {0..99}
  do
     echo "Doing uni $i"
     cat university-data-$i.nt | grep -v http://www.w3.org/2002/07/owl#Ont | grep -v http://www.w3.org/2002/07/owl#imports > university-data-$i.nt.tmp
     rm university-data-$i.nt
     mv university-data-$i.nt.tmp university-data-$i.nt
 done                                           
⚠️ **GitHub.com Fallback** ⚠️