Infrastructure Requirements - AtlasOfLivingAustralia/documentation GitHub Wiki

Infrastructure Requirements for running a Living Atlas

The infrastructure required to run a Living Atlases depends on following factors:

The number of components beyond the core set of components you wish to run
Amount of occurrence records you need to index in your system
The number of spatial layers you wish to incorporate

We recommend the use of cloud infrastructures for Living Atlas installations. This could be a commercial provider (e.g. Amazon EC2, Google Cloud Engine, Microsoft Azure), or a cloud infrastructure within your country operated by an institution (e.g. an OpenStack based installation).

Basic Installation

A basic installation of the core components with support for up to 20 million records could be a single Ubuntu 18.04 server with 4-8 CPU, 32GB RAM and SSD storage. Ideally though, it is recommended that Cassandra and SOLR are ran on separate virtual machines, as both of these components require a reasonable amount of resources. Running Cassandra and SOLR separately will allow you to run data maintenance task (loading, processing, indexing) without impacting the performance of your web portal tools.

Recommended Installation for larger installations

For installations requiring the indexing of large amounts of data (over 50 million records and/or indexing of large number of spatial layers), we would recommend a clustered installation. This clustered installation is in use by Australia (75 million records and 500+ spatial layers) and UK (219 million records and 50+ spatial layers).

Clustering affects the installation of Apache SOLR, Apache Cassandra and the biocache command-line tools.

See the Cassandra requirements and solr requirements.

Core components for a Living Atlas

The core set of components that an Living Atlas will require as a starting point are the following:

Data registry (component name: collectory, example: UK registry)
Occurrence search UI (component name: biocache-hub, example: ALA occurrence search)
Occurrence web services (component name: biocache-service)
Occurrence data loading tools (component name: biocache-store aka biocache-cli)
Image service (component name: image-service, example: ALA images)
Apache SOLR
Apache Cassandra
mysql (for the collectory) and postgresql (for the image-service)
apache or nginx as proxies

These components will give a Living Atlas installation the following capabilities:

metadata editing for collections, institutions, data publishers
loading, processing and indexing of darwin core archives
occurrence searching
image storage and serving
basic mapping capabilities

These components are installed as part of the ALA demo installation scripts. This is the recommended starting point for projects in the initial phase of looking at the Living Atlases components for their project.

Additional components - advanced installations

In addition to the core components, the following components can be setup to enhance an installation further:

Authentication (CAS based)
Species lists (component name: specieslist, example: NBN UK lists portal )
Species pages & services
Spatial services
Spatial portal - advanced spatial tools and species distribution modelling (component name: spatial-hub, example: ALA spatial portal)
Alerts
Logger
Regions
Dashboard
Sandbox