Infrastructure Layer: Utility and generic functionality - i-on-project/integration GitHub Wiki
The infrastructure layer provides generic technical capabilities that could, in theory, be reused in any other project. This layer includes functionality such as downloading a remote file, interacting with a git server, connecting to a database, or emitting notifications.
Git Handler
Submitting data to a git repository is one of the main new features of Integration ‘21 due to the decision to submit all output to a common repository on GitHub. To interact with Git, we have opted to use the JGit library. JGit exposes two API levels: plumbing and porcelain. Plumbing APIs are for interaction with low-level objects while the porcelain APIs allow for more user-friendly and high-level interactions. To encapsulate dependencies and implementation details we created two interfaces:
IGitHandlerFactory
is a functional interface whose single method, checkout, returns anIGitHandler
object. Checkout expects authentication information to connect to a remote repository, as well as the path to a local directory on which to place retrieved data.- The
IGitHandler
interface is an abstraction for interactions with a single Git repository that exposes a subset of Git commands such asadd
,commit
, andpush
.IGitHandler
defines an update method that updates the repository by runninggit fetch
andgit pull
, verifies if the target branch exists in the remote server and, if it does not, will create and publish the branch.
File Hash
File hashes are used by file parsers on the Domain Layer to skip processing files that have already been parsed. This service is provided by the IFileDigest
functional interface that expects a File
argument and returns a ByteArray
containing its calculated hash value. Its implementation, FileDigestImpl
, calculates the file’s hash value using a Message Digest with the SHA-256 algorithm.
The IHashRepository
interface allows clients to search for previously calculated hashes, as well as inserting or updating hash values in the database.
Integration Job Repository
Spring Batch maintains its own database schema to persist and retrieve data necessary to regular operation. To avoid creating additional database tables we have opted to create a database View that queries Spring Batch’s default schema and provides a unified view of job metadata.
This query, if unchanged, produces multiple rows per job execution due to the JOIN
with the batch_job_execution_params
table as it contains one row for each job parameter, and these have a 1-to-many relationship with the batch_job_execution
table. To avoid creating repeated tuples we use the crosstab
function to pivot job parameters into table headers, thus allowing the query to return only one row per job execution.
CREATE OR REPLACE VIEW public.vw_job_detail
AS SELECT bje.job_instance_id AS id,
bji.job_name AS name,
timezone('utc'::text, bje.create_time) AS creation_date,
timezone('utc'::text, bje.start_time) AS start_date,
timezone('utc'::text, bje.end_time) AS end_time,
CASE
WHEN (bje.status::text = ANY (ARRAY['STARTED'::CHARACTER VARYING, 'STARTING'::CHARACTER VARYING]::text[])) AND timezone('utc'::text, bje.create_time) < (timezone('utc'::text, CURRENT_TIMESTAMP) - '01:00:00'::INTERVAL) THEN 'FAILED'::CHARACTER VARYING
ELSE bje.status
END AS STATUS,
ct.format AS output_format,
ct.institution,
ct.programme,
ct.uri AS resource_uri
FROM batch_job_execution bje
JOIN batch_job_instance bji ON bji.job_instance_id = bje.job_instance_id
JOIN crosstab('SELECT job_execution_id, key_name, string_val
FROM batch_job_execution_params
ORDER BY 1'::text, 'SELECT unnest(''{format,institution,programme,srcRemoteLocation}''::text[])'::text) ct(job_execution_id BIGINT, format CHARACTER VARYING(100), institution CHARACTER VARYING(250), programme CHARACTER VARYING(250), uri CHARACTER VARYING(250)) ON ct.job_execution_id = bje.job_execution_id;
The IJobRepository
interface allows retrieval of all running jobs and querying for a specific job by its ID. The IJobRepository
interface is implemented by the IntegrationJobRepository
class, which makes use of JDBC to query the database and parse its data. IntegrationJobRepository
also guarantees the database view described above is created, if not present, before the first query command.
Institution and Programme Repositories
Information about supported Institutions and Programmes is stored in a project configuration file as described in Section 4.5.
The IInstitutionRepository
interface allows querying institutions by their identifier, while the IProgrammeRepository
interface expects an InstitutionModel
object and a programme acronym to retrieve a ProgrammeModel
object.
Both interfaces’ implementations use the Jackson library’s YAML Factory utility to parse the configuration file and retrieve its contents.
Database Schema
Included in the src/main/resources directory
is the schema-postgresql.sql
file that is used by Spring to setup custom database schema configurations. In this file we have included the SQL script to create the table used by the File Hash repository as well as Postgres’ tablefunc extension that enables utility functions such as crosstab.