Tools - mconlon17/vivo-etl GitHub Wiki
We use the following tools. Other tools could be used. We are demonstrating a basic pattern for creating consistent RDF from multiple sources to multiple targets.
csv2json.py
A utility, csv2json.py
is provided in this repository to convert CSV and TSV files
to JSON.
When data comes in the form of TSV or CSV, use the utility csv2json.py
to convert
it to JSON. For example, the command line below converts a file called
enterprise.tsv
and produces a file enterprise.json
, ready for further processing.
python3 csv2json.py <enterprise.tsv >enterprise.json
wget
Use the command line tool wget
to create a JSON file from an API.
JSON2RDF
Use JSON2RDF
to convert any JSON file a raw
RDF file. JSON2RDF
uses the semantics
of the JSON file to create predicates for the triples it uses to create RDF.
riot
Use riot
, the Apache Jena command line
"RDF input
output tool" to convert the output of
JSON2RDF
to a TURTLE representation appropriate for robot
(see below). By
default, JSON2RDF
produces N-triples with blank node IRI for entities. robot
is
not yet able to read such files. Converting the triples format to an anonymous TTL
format allows robot
to assign blank node identifiers and proceed with processing.
robot
Use robot
to convert the raw RDF produced by JSON2RDF
and riot
to VIVO RDF for the
ontologies of choice by using an appropriate SPARQL CONSTRUCT query. Examples
are provided.
tdbloader
Use tdbloader
for
loading triples to a triple store. It's super fast. And
for MacOS, and Linux, there's also
tdbloader2
that's even faster.