Infrastructure Layer: Utility and generic functionality - i-on-project/integration GitHub Wiki

The infrastructure layer provides generic technical capabilities that could, in theory, be reused in any other project. This layer includes functionality such as downloading a remote file, interacting with a git server, connecting to a database, or emitting notifications.

Git Handler

Submitting data to a git repository is one of the main new features of Integration ‘21 due to the decision to submit all output to a common repository on GitHub. To interact with Git, we have opted to use the JGit library. JGit exposes two API levels: plumbing and porcelain. Plumbing APIs are for interaction with low-level objects while the porcelain APIs allow for more user-friendly and high-level interactions. To encapsulate dependencies and implementation details we created two interfaces:

  • IGitHandlerFactory is a functional interface whose single method, checkout, returns an IGitHandler object. Checkout expects authentication information to connect to a remote repository, as well as the path to a local directory on which to place retrieved data.
  • The IGitHandler interface is an abstraction for interactions with a single Git repository that exposes a subset of Git commands such as add, commit, and push. IGitHandler defines an update method that updates the repository by running git fetch and git pull, verifies if the target branch exists in the remote server and, if it does not, will create and publish the branch.

File Hash

File hashes are used by file parsers on the Domain Layer to skip processing files that have already been parsed. This service is provided by the IFileDigest functional interface that expects a File argument and returns a ByteArray containing its calculated hash value. Its implementation, FileDigestImpl, calculates the file’s hash value using a Message Digest with the SHA-256 algorithm. The IHashRepository interface allows clients to search for previously calculated hashes, as well as inserting or updating hash values in the database.

Integration Job Repository

Spring Batch maintains its own database schema to persist and retrieve data necessary to regular operation. To avoid creating additional database tables we have opted to create a database View that queries Spring Batch’s default schema and provides a unified view of job metadata.

This query, if unchanged, produces multiple rows per job execution due to the JOIN with the batch_job_execution_params table as it contains one row for each job parameter, and these have a 1-to-many relationship with the batch_job_execution table. To avoid creating repeated tuples we use the crosstab function to pivot job parameters into table headers, thus allowing the query to return only one row per job execution.

CREATE OR REPLACE VIEW public.vw_job_detail
AS SELECT bje.job_instance_id AS id,
    bji.job_name AS name,
    timezone('utc'::text, bje.create_time) AS creation_date,
    timezone('utc'::text, bje.start_time) AS start_date,
    timezone('utc'::text, bje.end_time) AS end_time,
        CASE
            WHEN (bje.status::text = ANY (ARRAY['STARTED'::CHARACTER VARYING, 'STARTING'::CHARACTER VARYING]::text[])) AND timezone('utc'::text, bje.create_time) < (timezone('utc'::text, CURRENT_TIMESTAMP) - '01:00:00'::INTERVAL) THEN 'FAILED'::CHARACTER VARYING
            ELSE bje.status
        END AS STATUS,
    ct.format AS output_format,
    ct.institution,
    ct.programme,
    ct.uri AS resource_uri
   FROM batch_job_execution bje
     JOIN batch_job_instance bji ON bji.job_instance_id = bje.job_instance_id
     JOIN crosstab('SELECT job_execution_id, key_name, string_val
        FROM batch_job_execution_params
        ORDER BY 1'::text, 'SELECT unnest(''{format,institution,programme,srcRemoteLocation}''::text[])'::text) ct(job_execution_id BIGINT, format CHARACTER VARYING(100), institution CHARACTER VARYING(250), programme CHARACTER VARYING(250), uri CHARACTER VARYING(250)) ON ct.job_execution_id = bje.job_execution_id;

The IJobRepository interface allows retrieval of all running jobs and querying for a specific job by its ID. The IJobRepository interface is implemented by the IntegrationJobRepository class, which makes use of JDBC to query the database and parse its data. IntegrationJobRepository also guarantees the database view described above is created, if not present, before the first query command.

Institution and Programme Repositories

Information about supported Institutions and Programmes is stored in a project configuration file as described in Section 4.5. The IInstitutionRepository interface allows querying institutions by their identifier, while the IProgrammeRepository interface expects an InstitutionModel object and a programme acronym to retrieve a ProgrammeModel object. Both interfaces’ implementations use the Jackson library’s YAML Factory utility to parse the configuration file and retrieve its contents.

Database Schema

Included in the src/main/resources directory is the schema-postgresql.sql file that is used by Spring to setup custom database schema configurations. In this file we have included the SQL script to create the table used by the File Hash repository as well as Postgres’ tablefunc extension that enables utility functions such as crosstab.