Architecture and Design - i-on-project/integration GitHub Wiki

i-on is a long-lived and iteratively built initiative and the Integration project is no exception. Regular updates are expected to add support for new programmes and institutions as the initiative grows, and maintenance will be necessary to keep producing robust and valid data each semester, as source data changes.

i-on Integration Architecture

The i-on Integration architecture has three major components: the application, the database, and the file repository.

The application, containing business logic and handling all processing tasks. A Postgres database that contains the Spring Batch Metadata that holds the information related to batch job instances, batch job execution, and step execution. Finally, the file repository server, where output files resulting from job execution are stored, is implemented as a Git repository hosted on a Git server.

i-on Integration project structure

Our approach was to implement a project structure reflecting a simpler Domain-Driven Layered Architecture (as proposed by Eric Evans in this Domain Driven Design book) that will help clearly define application boundaries, allow more cohesive design aspects, and make these designs easier to interpret and modify.

As described in the figure below the architecture is divided in four layers that provide separation of concerns, splitting concepts integral to i-on Integration.

Each layer only interacts with layers below it in the architecture diagram.

Scheduler

I-on Integration exposes a public API to for no job executions and management, delegating scheduling tasks to outside actors.

This design allows other i-on projects to request job executions on demand, as they might have their own internal logic to detect or predict source data changes (or simply require data in a format that is not yet available). However, this is not sufficient to guarantee data will be updated regularly and predictably.

To keep data updated on a regular and predictable basis we have introduced the Scheduler component. The Scheduler’s main and only responsibility is to periodically call Integration’s web API to trigger job executions.

File Repository

Part of the decoupling between the Integration and Core projects was the deprecation of the Write API which led to the need to create the shared File Repository.

This repository is designed to act as an intermediate data holder that can be read and written to programmatically by all i-on projects. Several alternate services fit this purpose such as Microsoft SharePoint or Google Drive , but the choice was made to use a git repository hosted on GitHub as it meets our requirements: can be used as a file repository, allows public access, and provides versioning. Additionally, GitHub allows the creation of Pull Requests, to be used as a manual quality control mechanism to ensure files are reviewed by a human before being made available to the public at large.

Output files are exported to the integration-data GitHub repository. More information about the file structure can be found in the readme section of that project.