Jenkins For Biocache Store And Other LA Tasks - AtlasOfLivingAustralia/documentation GitHub Wiki
Introduction
We explain in this page how to use jenkins as a task manager for biocache-store
jobs and other LA operations.
Running biocache-store
from jenkins has some advantages:
- You have an history of
biocache-store
commands, so it's easy to compare duration, errors, outputs. - Other team members can see what operations were done or are running in your
biocache-store
, logs, etc. - You can enrich the
biocache-store
output, adding color to logs, summaries, etc - You can receive notifications (html, email, etc) when some jobs fails or ends successfully, or run other jobs.
- You can automatize your
biocache-store
different jobs - etc
Installation
You can install jenkins in ubuntu, debian and derivatives, with a simple:
wget -q -O - https://pkg.jenkins.io/debian/jenkins.io.key | sudo apt-key add -
sudo sh -c 'echo deb http://pkg.jenkins.io/debian-stable binary/ > /etc/apt/sources.list.d/jenkins.list'
sudo apt update
sudo apt install jenkins
Jenkins can install in the same server than you have biocache-store
installed, but also in other different server. In this last case, you will need to call biocache-store
tasks through ssh
or a jenkins agent etc.
Permissions
If you run biocache-store
jobs as the jenkins user, you will need to change the ownership of this LA directories:
/data/biocache/
/data/solr/
/data/biocache-load/
This is not needed if you use a different approach (like sudo
).
Tasks
In general we add new jobs to jenkins, with the sidebar "New item" and as "Freestyle project".
And as many biocache-store
tasks needs parameters (like a data resource number) you will need to use a parameterized build.
Some ALA jenkins ingestion tasks
Citing @djtfmartin about ALA tasks workflow:
ALA doesn't use the ingest for large datasets (...) The way ALA uses it is we load datasets during the week, but we have jenkins jobs that twice a week that run processing, sampling and index everything
Recommended plugins
Some jenkins plugins we recommend to improve it use with biocache-store
and other similar jobs:
Plugin | Description | Comments |
---|---|---|
AnsiColor | Adds ANSI coloring to the Console Output | Useful for enright the biocache jobs with colors using grc, so it's more easy to detect ERROR, WARN, etc log messages |
Build Name and Description Setter | This plug-in sets the display name and description of a build to something other than #1, #2, #3 | So instead of #number we can rename a jobs like ingest dr615" or similar |
HTML5 Notifier Plugin | The HTML5 Notifier Plugin provides W3C Web Notifications support for builds. | You can receive jobs notifications in your browser thanks to this plugin |
Log Parser Plugin | Parses the console log generated by a build | This generates a summary of ERROR,WARN,INFO messages of a job |
Mailer Plugin | This plugin allows you to configure email notifications for build results | For email notifications to your team |
Email Extension Plugin | This plugin is a replacement for Jenkins's email publisher. It allows to configure every aspect of email notifications: when an email is sent, who should receive it and what the email says | Or this more advanced email extension |
Adding tests to our jobs
You can add some additional tests, for instance, to your ingest jobs so we improve our data processing tasks with extra checks like:
- Verify that your data is correctly mapped in the collectory.
- A basic LA record consumers test
- Verification of LA data mappings in solr and,
- verification of mappings in LA biocache-store after load
These are just some unofficial scripts that we use instead of do manual checks, please feel free to improve them, or add more and documment it here.
Screenshots
Sample of jobs in gbif.es jenkins:
Ingestion logs with Build Name Setter plugin:
Logs with colors:
Logs summary per job: