CI Pipeline - OpenCS-ontology/OpenCS GitHub Wiki
Introduction
The CI pipeline in the OpenCS project is comprised of 3 Github Action workflows. These are defined in separate .yml
files. Those are:
- pre-release.yml -- workflow triggered each time a push is done to the
main
branch of the main OpenCS repository. - tagged-release.yml -- workflow triggered each time a push containing a new tag (i.e. when a new tagged release is done) in the main OpenCS repository.
- validate.yml -- workflow triggered on pushes and pull requests to the
main
branch of the main OpenCS repository. Its main purpose is to validate the new version of the OpenCD ontology.
Each workflow will be described in more detail in its dedicated section below.
Pre-release workflow
This is a workflow responsible all the necessary preparatory activities when a pre-release action is done. A pre-release in the context of the OpenCS repository is currently interpreted as any push to the repository without a tag. The default tag in such a situation is the dev
tag. Each step of this workflow is described in its dedicated sub-section.
Preliminary steps
The preliminary steps in this workflow include defining the Docker container as well as the checkout repository
step. The Docker container is defined in the OpenCS CI-worker repository. It uses the Ubuntu operating system.
The checkout repository
step is done by invoking the ready checkout action in version 4 to access the main OpenCS repository.
Packaging ontology
This step involves invoking the package.py script, which is located in the ci-worker repository. It takes 3 parameters: input dir
, output dir
and version tag
. The input dir
parameter should always point to the root of the main OpenCS repository (so should be set to opencs). The output dir
in the case of this workflow is set to the package directory, and the version tag
is set to dev.
The main purpose of this script is to parse the entire OpenCS graph (including both ontology and schema), which is stored in the form of .ttl
files. This script comprises of 2 functions that are called one after the other when the script is invoked:
process_simple
-- function that simply processes all the rdf files located in thein_glob
path using theparse_all
helper function. The parsed graph is then serialized into theout_base_path
path using theserialize_multi
helper function.process_ontology
-- Function that processes the entire OpenCS graph ontology. It begins with retrieving and parsing the header by invoking theget_auto_header
helper function. The graph with only the header parsed is then serialized using theserialize_multi
function. It then parses all the ontology files located in thecore
folder in the main OpenCS repository using theparse_all
helper function. Finally it parses theauthors.ttl
file and then serializes the whole graph by using theserialize_multi
function. It takes thein_dir
,out_dir
, andversion
and version parameters, which are directly passed from the main parameter list of thepackage.py
script.
get_auto_header
-- function that creates an automatic graph header by adding the IRI version and the time of creation. It takes the currentversion
(the tag passed during release) as an parameter, and returns a graph with the header created.parse_all
-- function used to parse and add all the rdf files located in theglob_path
directory to the passed graphg
.serialize_multi
-- function that serializes a given graph into 3 file formats:.ttl
,.rdf
and.nt
. As parameters, it takes te graphg
, the path to save the files intobase_path
and ause_gzip
flag that determines whether to additionally compress the files into gzip format.
The results of this step are the opencs.ttl.gz
, opencs.rdf.gz
and opencs.nt.gz
files located in the package
folder. These files will be later on sent into another repository.
Infering additional assertions
This step involves invoking the infer_assertions.sh script, which is located in the ci-worker repository. The script's output is then translated from the .ofn
format into the .ttl
format using the ofn_to_ttl script (also located in the ci-worker directory). Finally, the resulting inferred_assertions.ttl
file is gzipped and moved into the package
directory.
The infer_assertions.sh
script utilizes the robot tool in order to perform various actions, such as changing schema import to SKOS import or removing unnecessary axioms.
Automatic releases action
This step involves invoking the ready softprops/action-gh-release action in version 1. This action is used for automatic release preparation. In the case of this workflow, it is used to add all the files from the /package
to the new pre-release, as well as set the tag as dev
and the name of the pre-release as Development build
.
Preparing files to be committed
This is a simple step in which a new directory named /output_files
, to which files opencs.ttl.gz
, opencs.rdf.gz
and opencs.nt.gz
are unpacked.
Generating and preparing index pages
This step involves invoking the generate_pages.py script. It is located in the ci-worker repository. The main purpose of it is to generate the page for the dedicated OpenCS Github Pages repository. This page contains information such as the current OpenCS version, the previous version and the date the version was introduced. It takes 3 parameters: current_release
, which is the current version tag (dev
in the case of the pre-release), repository_name
(OpenCS), and output_path
, which is the path to which the page is to be saved. The page is saved in the format of a Markdown .md
file.
The script is comprised of 3 functions:
get_releases
-- function that creates a list of all the releases names inside a specified directory. It takesrepository_name
andtoken
as parameters, wheretoken
is the special Github authorization token.save_page
-- function for dumpingyaml_dict
content into a file specified with apath
.main
-- the main function invoked by the script. It creates thepage.md
file containing information on which version is the current one, which version is the previous one and what is the date on which the current version was introduced. It utilizes theget_releases
function for releases list and creates the page by invoking thesave_page
function. Its parameters arecurrent_release
, which is the current tag name,repository_name
,output_path
to which the page is to be saved, and the Github authorizationtoken
.
Pushing files to another directory
A step in which files from the /output_files
folder are copied into the /releases/dev/
folder in the OpenCS ontology website repository. This is done by using the cpina/github-action-push-to-another-repository action.
Tagged release workflow
This is a workflow responsible for all the preparatory activities needed when a tagged release action is done. It is triggered when an explicit tag is pushed alongside a new version. Please note that this is in many ways analogous to the pre-release workflow, with only some steps being expanded upon. Individual steps of this workflow are described in their dedicated sub-sections below.
Preliminary steps
The preliminary steps in this workflow include defining the Docker container as well as the checkout repository
step. The Docker container is defined in the OpenCS CI-worker repository. It uses the Ubuntu operating system.
The checkout repository
step is done by invoking the ready checkout action in version 4 to access the main OpenCS repository.
Packaging ontology
This step is completely analogous to the same step in the pre-release workflow, with the only difference being that instead of a pre-set dev
tag, we are passing the current release tag to the package.py
script.
Infering additional assertions
This step is analogous to the one in the pre-release workflow.
Automatic releases action
In this step the ready softprops/action-gh-release action in version 1 is invoked. It is used to add the files from the /package
folder to the new release, as well as set the appropriate tag name (passed in the push triggering this workflow).
Preparing files to be committed
In this step 4 new folders are created: /output_files
, /tag
, /stable
and /versions
. Files opencs.ttl.gz
, opencs.rdf.gz
and opencs.nt.gz
are then unpacked into the /output_files
folder. Finally, all the unpacked files are copied from the /output_files
folder into the /tag
and /stable
folders.
Preparing data for ontology web browser
In this step a Python script prepare_browser_data.py
is executed. It takes as input the latest /stable/opencs.ttl
file, loads the knowledge graph to add skos:narrow property (based on skos:broader) and outputs a browser
directory with subdirectories 00, 01, etc., each containing serialised JSON-LD files for each concept (1000 per subdirectory) and an additional subdirectory core
to store index_dict.json
, a dictionary of (prefLabel, conceptId) that is used by the web browser.
Generating and preparing index pages
This step is almost analogous to the one in the pre-release workflow. The only difference is that the resulting page.md
file is copied as index.md
into the tag
, stable
and versions
directories, and there is an additional step of setting a new environmental variable containing a path to the folder with named after the current tag version.
Pushing files to another directory
A step in which files from the /tag
, /stable
,/versions
, and /browser
folders are copied into their respective folder counterparts in the OpenCS ontology website repository. It begins with a set up git step, in which a ready function webfactory/ssh-agent is invoked. This function is used in the version 0.7.0 (it is worth noting that newer version, i.e. 0.8.0 is rising security errors). This function is used in order to properly set up the private SSH keys on the current worker node in the workflow.
Finally, the files are copied across repositories. This is done by simply cloning the OpenCS ontology website repository by using the git clone
command. Files are copied using a simple cp
command, and then all the changes are added to git (git add
), commited (git commit
) and pushed into the ontology website repository.
Validate workflow
Workflow used for invoking all the validation checks functions, as well as for creating the validation report. It is triggered every time a push or a pull request is made to the main OpenCS repository. All the individual steps of this workflow are explained in their dedicated sub-sections.
Preliminary steps
The preliminary steps in this workflow are analogous to the steps in another workflows. They include defining the Docker container (from the OpenCS CI-worker repository) and performing a checkout into the main OpenCS repository, by using the ready checkout action in version 4.
Validation step
This step involves invoking the validate_jena.sh script. This script is located in the ci-worker repository.
This script relies on applying the Apache Jena framework to the OpenCS ontology. The Apache Jena is installed in the Docker (defined in the OpenCS CI-worker repository) in version 4.10.0.
The first step of the script is creating the /package folder. Then all the files from the core directory (containing all the graph entries) are merged into one core.ttl
file by using the Apache Jena's turtle command. Next, Jena is used to perform validation checks on all the entries -- the checks are defined in the shacl_constraints.ttl file in the main OpenCS repository. More detailed description of the checks can be found on a separate page: SHACL validation.
Finally, a short report based on the performed checks is displayed. The report is defined in form of a Sparql query in the validation_short_report.rq file (also in the ci-worker repository). It displays the summary of the number of warnings, information and errors that occurred during the check run in a form of a small table. If the run was not successful, i.e. there have been errors, and not just warnings or information, the whole workflow run is failed.
Uploading validation report
In the final step of the workflow, the validation report is updated. The validation_report.ttl
is located in the /package
folder. By using the actions/upload-artifact (in version 3), we are able to upload it as an artifact and attach it into the run summary files. To find the report, on the main OpenCS repository one needs to go to actions -> validate (on the left) -> choose the run that is to be checked -> the report is located in the artifacts section and can be downloaded.
The validation report is uploaded in a zipped validation_report.ttl
file.
Example warning message in the report looks as follows:
sh:result [ rdf:type sh:ValidationResult;
sh:focusNode ocs:C45220;
sh:resultMessage "Disjoint[<http://www.w3.org/2004/02/skos/core#related>]: not disjoint: <https://w3id.org/ocs/ont/C35315> is in [https://w3id.org/ocs/ont/C41081, https://w3id.org/ocs/ont/C42913, https://w3id.org/ocs/ont/C44461, https://w3id.org/ocs/ont/C41326, https://w3id.org/ocs/ont/C36051, https://w3id.org/ocs/ont/C33053, https://w3id.org/ocs/ont/C38074, https://w3id.org/ocs/ont/C19629, https://w3id.org/ocs/ont/C35315, https://w3id.org/ocs/ont/C32749, https://w3id.org/ocs/ont/C36335, https://w3id.org/ocs/ont/C40450]";
sh:resultPath [ sh:oneOrMorePath
skos:broader ];
sh:resultSeverity sh:Warning;
sh:sourceConstraintComponent sh:DisjointConstraintComponent;
sh:sourceShape _:b0;
sh:value ocs:C35315
];
This particular message informs about the lack of disjoint between skos:related and skos:broader in certain elements of the graph.
This step runs only if the workflow has not been canceled.