CI Pipeline - OpenCS-ontology/OpenCS GitHub Wiki

Introduction

The CI pipeline in the OpenCS project is comprised of 3 Github Action workflows. These are defined in separate .yml files. Those are:

pre-release.yml -- workflow triggered each time a push is done to the main branch of the main OpenCS repository.
tagged-release.yml -- workflow triggered each time a push containing a new tag (i.e. when a new tagged release is done) in the main OpenCS repository.
validate.yml -- workflow triggered on pushes and pull requests to the main branch of the main OpenCS repository. Its main purpose is to validate the new version of the OpenCD ontology.

Each workflow will be described in more detail in its dedicated section below.

Pre-release workflow

This is a workflow responsible all the necessary preparatory activities when a pre-release action is done. A pre-release in the context of the OpenCS repository is currently interpreted as any push to the repository without a tag. The default tag in such a situation is the dev tag. Each step of this workflow is described in its dedicated sub-section.

Preliminary steps

The preliminary steps in this workflow include defining the Docker container as well as the checkout repository step. The Docker container is defined in the OpenCS CI-worker repository. It uses the Ubuntu operating system.
The checkout repository step is done by invoking the ready checkout action in version 4 to access the main OpenCS repository.

Packaging ontology

This step involves invoking the package.py script, which is located in the ci-worker repository. It takes 3 parameters: input dir, output dir and version tag. The input dir parameter should always point to the root of the main OpenCS repository (so should be set to opencs). The output dir in the case of this workflow is set to the package directory, and the version tag is set to dev.
The main purpose of this script is to parse the entire OpenCS graph (including both ontology and schema), which is stored in the form of .ttl files. This script comprises of 2 functions that are called one after the other when the script is invoked:

process_simple -- function that simply processes all the rdf files located in the in_glob path using the parse_all helper function. The parsed graph is then serialized into the out_base_path path using the serialize_multi helper function.
process_ontology -- Function that processes the entire OpenCS graph ontology. It begins with retrieving and parsing the header by invoking the get_auto_header helper function. The graph with only the header parsed is then serialized using the serialize_multi function. It then parses all the ontology files located in the core folder in the main OpenCS repository using the parse_all helper function. Finally it parses the authors.ttl file and then serializes the whole graph by using the serialize_multi function. It takes the in_dir, out_dir, and version and version parameters, which are directly passed from the main parameter list of the package.py script.

get_auto_header -- function that creates an automatic graph header by adding the IRI version and the time of creation. It takes the current version (the tag passed during release) as an parameter, and returns a graph with the header created.
parse_all -- function used to parse and add all the rdf files located in the glob_path directory to the passed graph g.
serialize_multi -- function that serializes a given graph into 3 file formats: .ttl, .rdf and .nt. As parameters, it takes te graph g, the path to save the files into base_path and a use_gzip flag that determines whether to additionally compress the files into gzip format.

The results of this step are the opencs.ttl.gz, opencs.rdf.gz and opencs.nt.gz files located in the package folder. These files will be later on sent into another repository.

Infering additional assertions

This step involves invoking the infer_assertions.sh script, which is located in the ci-worker repository. The script's output is then translated from the .ofn format into the .ttl format using the ofn_to_ttl script (also located in the ci-worker directory). Finally, the resulting inferred_assertions.ttl file is gzipped and moved into the package directory.
The infer_assertions.sh script utilizes the robot tool in order to perform various actions, such as changing schema import to SKOS import or removing unnecessary axioms.

Automatic releases action

This step involves invoking the ready softprops/action-gh-release action in version 1. This action is used for automatic release preparation. In the case of this workflow, it is used to add all the files from the /package to the new pre-release, as well as set the tag as dev and the name of the pre-release as Development build.

Preparing files to be committed

This is a simple step in which a new directory named /output_files, to which files opencs.ttl.gz, opencs.rdf.gz and opencs.nt.gz are unpacked.

Generating and preparing index pages

This step involves invoking the generate_pages.py script. It is located in the ci-worker repository. The main purpose of it is to generate the page for the dedicated OpenCS Github Pages repository. This page contains information such as the current OpenCS version, the previous version and the date the version was introduced. It takes 3 parameters: current_release, which is the current version tag (dev in the case of the pre-release), repository_name (OpenCS), and output_path, which is the path to which the page is to be saved. The page is saved in the format of a Markdown .md file.

The script is comprised of 3 functions:

get_releases -- function that creates a list of all the releases names inside a specified directory. It takes repository_name and token as parameters, where token is the special Github authorization token.
save_page -- function for dumping yaml_dict content into a file specified with a path.
main -- the main function invoked by the script. It creates the page.md file containing information on which version is the current one, which version is the previous one and what is the date on which the current version was introduced. It utilizes the get_releases function for releases list and creates the page by invoking the save_page function. Its parameters are current_release, which is the current tag name, repository_name, output_path to which the page is to be saved, and the Github authorization token.

Pushing files to another directory

A step in which files from the /output_files folder are copied into the /releases/dev/ folder in the OpenCS ontology website repository. This is done by using the cpina/github-action-push-to-another-repository action.

Tagged release workflow

This is a workflow responsible for all the preparatory activities needed when a tagged release action is done. It is triggered when an explicit tag is pushed alongside a new version. Please note that this is in many ways analogous to the pre-release workflow, with only some steps being expanded upon. Individual steps of this workflow are described in their dedicated sub-sections below.

Preliminary steps

Packaging ontology

This step is completely analogous to the same step in the pre-release workflow, with the only difference being that instead of a pre-set dev tag, we are passing the current release tag to the package.py script.

Infering additional assertions

This step is analogous to the one in the pre-release workflow.

Automatic releases action

In this step the ready softprops/action-gh-release action in version 1 is invoked. It is used to add the files from the /package folder to the new release, as well as set the appropriate tag name (passed in the push triggering this workflow).

Preparing files to be committed

In this step 4 new folders are created: /output_files, /tag, /stable and /versions. Files opencs.ttl.gz, opencs.rdf.gz and opencs.nt.gz are then unpacked into the /output_files folder. Finally, all the unpacked files are copied from the /output_files folder into the /tag and /stable folders.

Preparing data for ontology web browser

In this step a Python script prepare_browser_data.py is executed. It takes as input the latest /stable/opencs.ttl file, loads the knowledge graph to add skos:narrow property (based on skos:broader) and outputs a browser directory with subdirectories 00, 01, etc., each containing serialised JSON-LD files for each concept (1000 per subdirectory) and an additional subdirectory core to store index_dict.json, a dictionary of (prefLabel, conceptId) that is used by the web browser.

Generating and preparing index pages

This step is almost analogous to the one in the pre-release workflow. The only difference is that the resulting page.md file is copied as index.md into the tag, stable and versions directories, and there is an additional step of setting a new environmental variable containing a path to the folder with named after the current tag version.

Pushing files to another directory

A step in which files from the /tag, /stable,/versions, and /browser folders are copied into their respective folder counterparts in the OpenCS ontology website repository. It begins with a set up git step, in which a ready function webfactory/ssh-agent is invoked. This function is used in the version 0.7.0 (it is worth noting that newer version, i.e. 0.8.0 is rising security errors). This function is used in order to properly set up the private SSH keys on the current worker node in the workflow.
Finally, the files are copied across repositories. This is done by simply cloning the OpenCS ontology website repository by using the git clone command. Files are copied using a simple cp command, and then all the changes are added to git (git add), commited (git commit) and pushed into the ontology website repository.

Validate workflow

Workflow used for invoking all the validation checks functions, as well as for creating the validation report. It is triggered every time a push or a pull request is made to the main OpenCS repository. All the individual steps of this workflow are explained in their dedicated sub-sections.

Preliminary steps

The preliminary steps in this workflow are analogous to the steps in another workflows. They include defining the Docker container (from the OpenCS CI-worker repository) and performing a checkout into the main OpenCS repository, by using the ready checkout action in version 4.

Validation step

This step involves invoking the validate_jena.sh script. This script is located in the ci-worker repository.
This script relies on applying the Apache Jena framework to the OpenCS ontology. The Apache Jena is installed in the Docker (defined in the OpenCS CI-worker repository) in version 4.10.0.
The first step of the script is creating the /package folder. Then all the files from the core directory (containing all the graph entries) are merged into one core.ttl file by using the Apache Jena's turtle command. Next, Jena is used to perform validation checks on all the entries -- the checks are defined in the shacl_constraints.ttl file in the main OpenCS repository. More detailed description of the checks can be found on a separate page: SHACL validation.
Finally, a short report based on the performed checks is displayed. The report is defined in form of a Sparql query in the validation_short_report.rq file (also in the ci-worker repository). It displays the summary of the number of warnings, information and errors that occurred during the check run in a form of a small table. If the run was not successful, i.e. there have been errors, and not just warnings or information, the whole workflow run is failed.

Uploading validation report

In the final step of the workflow, the validation report is updated. The validation_report.ttl is located in the /package folder. By using the actions/upload-artifact (in version 3), we are able to upload it as an artifact and attach it into the run summary files. To find the report, on the main OpenCS repository one needs to go to actions -> validate (on the left) -> choose the run that is to be checked -> the report is located in the artifacts section and can be downloaded.
The validation report is uploaded in a zipped validation_report.ttl file.
Example warning message in the report looks as follows:

sh:result    [ rdf:type                      sh:ValidationResult;
                 sh:focusNode                  ocs:C45220;
                 sh:resultMessage              "Disjoint[<http://www.w3.org/2004/02/skos/core#related>]: not disjoint: <https://w3id.org/ocs/ont/C35315> is in [https://w3id.org/ocs/ont/C41081, https://w3id.org/ocs/ont/C42913, https://w3id.org/ocs/ont/C44461, https://w3id.org/ocs/ont/C41326, https://w3id.org/ocs/ont/C36051, https://w3id.org/ocs/ont/C33053, https://w3id.org/ocs/ont/C38074, https://w3id.org/ocs/ont/C19629, https://w3id.org/ocs/ont/C35315, https://w3id.org/ocs/ont/C32749, https://w3id.org/ocs/ont/C36335, https://w3id.org/ocs/ont/C40450]";
                 sh:resultPath                 [ sh:oneOrMorePath
                                   skos:broader ];
                 sh:resultSeverity             sh:Warning;
                 sh:sourceConstraintComponent  sh:DisjointConstraintComponent;
                 sh:sourceShape                _:b0;
                 sh:value                      ocs:C35315
               ];

This particular message informs about the lack of disjoint between skos:related and skos:broader in certain elements of the graph.
This step runs only if the workflow has not been canceled.