Data Hub - AtlasOfLivingAustralia/documentation GitHub Wiki


Introduction

A Data Hub is a front end to a LA portal, its main goal is to show a subset of the whole data.

The subset of data can be split by region, taxonomy, basisofrecord (specimen only), temporal lapse or by any other query.

A display of a subset is a Hub. Hubs can be regional, taxonomic and thematic (depending on how the data is split).


One LA portal can have a lot of Hubs, each of them focused on different data and environnements: one can be about the data inside a specific region, an institution (like an Herbarium), another can show the data about a species in the 20s, et cetera.

Hubs can have differents active modules inside them, each of them are independant from the others (at least on the front-end).


Hub Examples

Notable examples of hubs:


Biocache, Regions and BIE

Biocache (records), Regions and BIE (species pages) are used as query context; they can be configured in externals configuration files:

  • Biocache: /data/ala-hub/config/ala-hub-config.properties
  • BIE: /data/ala-bie/config/ala-bie-config.properties
  • Regions: /data/regions/config/regions-config.properties

Demo queries in the biocache can be made with dq=.

BIE load Darwin Core Archive (DwCA) and can be extensible, support for adhoc fields and for the "Distribution" extension exist. Example of a query on BIE : fq=distribution:Scotland


Basic steps to setup a Hub

Styling your Hub

You can create a different branding for each of your hubs so can be different from you LA main branding. Also probably you'll use a different domain or subdomain from the institution that hub will belongs to (some local community or some herbarium, etc).

For details about styling see Styling the web app.


Create and configure the hub in the Collectory

In the collectory admin interface look for the View all data hubs option, here you can add a new DataHub. Edit the name and any other information you required.

In the Members section add the identifier of the Institutions, Collections and Data Resources that belong to the Hub. This is the information that will actually show in the web app.


Configure the web app to show the records of the new hub

Find the UID of the hub you already created in the Collectory admin interface. In the web app config file (grails-app/conf/config.groovy) add the property with the appropiate UID.

biocache.queryContext = "data_hub_uid:dh1"

Or via ala-install inventory:

enable_query_context = true
biocache_query_context = data_hub_uid:dh1

Hub detailed configuration and deployment

You can deploy your hub using your main LA inventories, plus some specific hub configuration. For instance:

[biocache-hub]

hubdemo.l-a.site

[bie-hub]

biehubdemo.l-a.site

[regions]

regionshubdemo.l-a.site

[all:vars]

# used in biocache and bie hubs
biocache_query_context = data_hub_uid:dh1

# used in regions
enable_query_context = true
query_context = data_hub_uid:dh1
hub_filter = data_hub_uid:dh1
enable_hub_data = true

orgNameLong = Herbarium of ACME
orgNameShort = Herb-ACME

header_and_footer_baseurl = https://skins.l-a.site/acme-hub-demo
header_and_footer_version = 2

orgCity=Some city
orgStateProvince=Some state
orgPostcode=
orgCountry=Some country
orgPhone=
orgFax=
orgPhone=
org_url = https://example.org

skin_favicon_baseurl = https://example.org/favicon.ico
favicon_url = https://example.org/favicon.ico
skin_favicon = https://example.org/favicon.ico

# TODO put here other page
explore_url = https://www.ala.org.au/explore-by-location/
regions_explore_url = https://www.ala.org.au/explore-by-location/,Explore

biocache_hub_hostname = hubdemo.l-a.site
biocache_hub_url = https://hubdemo.l-a.site
biocache_hub_base_url = https://hubdemo.l-a.site
biocache_hub_context_path =
biocache_base_url = https://hubdemo.l-a.site
biocache_records_url = https://hubdemo.l-a.site

bie_hub_base_url = https://biehubdemo.l-a.site
bie_hub_hostname = biehubdemo.l-a.site
bie_hub_context_path =
bie_base_url = https://biehubdemo.l-a.site

regions_base_url = https://regionshubdemo.l-a.site
regions_hostname = regionshubdemo.l-a.site

[biocache-hub:vars]

skin_home_url = https://hubdemo.l-a.site
ala_base_url = https://hubdemo.l-a.site
ssl_certificate_server_dir=/etc/letsencrypt/live/hubdemo.l-a.site/

[bie-hub:vars]

skin_home_url = https://biehubdemo.l-a.site
ala_base_url = https://biehubdemo.l-a.site
ssl_certificate_server_dir=/etc/letsencrypt/live/biehubdemo.l-a.site/

[regions:vars]

skin_home_url = https://regionshubdemo.l-a.site
ala_base_url = https://regionshubdemo.l-a.site
ssl_certificate_server_dir=/etc/letsencrypt/live/regionshubdemo.l-a.site/


And deploy using the rest of your inventories:

ansible-playbook -u ubuntu -i l-a.site-inventory.yml -i l-a.site-local-extras.yml -i l-a.site-local-passwords.yml -i hubdemo.yml ../../ala-install/ansible/biocache-hub-standalone.yml --limit hubdemo.l-a.site

Process and Index your data again

Now we need to "mark" which records belongs to that datahub. For this we need to ingest again the data resources that form part of this datahub. At the end, in cassandra field dataHubUid_p should contain your dh1 hub id after processing. Similarly in solr the data_hub_uid should containt dh1 after indexing.


Configure the facet fields

The config variables starting with facets. are responsible for this:

  • facets.include - comma separated list of fields to include (usually only those fields not in the default set as specified by ${biocache.baseUrl}/search/facets)
  • facets.exclude - comma separated list of fields to exclude. i.e. fields in the default set you don't want to appear
  • facets.hide - comma separated list of field that would be included in the facet column, that you want to be hidden (i.e. un-ticked in the "customise filters" drop down menu). These fields will be displayed if the user changes the default display settings and chooses to turn them on.

Note, you can also change the default set of facets but this is set in the biocache-service application, also via config vars (I think). You may also want to change the way facets are grouped together (see ${biocache.baseUrl}/search/grouped/facets).


CORS configuration

As you will request data from your main LA deployment, you'll need to allow your new hub domain to access to your main LA biocache.


Allow Data Hub IP address

You have to add the IP of the userdetails to your list of authorized systems to https://auth.your-l-a.site/userdetails/admin/authorisedSystem/list


Location of configurations

The above sample inventories generate the configurations in /data like:

$ ls  -1 /data/
ala-bie
ala-hub
regions

the same location as your main biocache/bie/regions.

But if you configure some ansible variables (as for instance, the la-toolkit generates inventories does) you can set up a different location:

biocache_hub = sample-hub
biocache_hub_artifact = ala-hub
bie_hub = sample-bie-hub
bie_hub_artifact = ala-bie
regions = sample-regions
regions_artifact = regions

it will produce:

$ ls -l /data/
sample-hub
sample-bie-hubdemo
sample-regions

And configurations like /data/sample-hub/config/sample-hub-config.properties".

But in this case you will need a custom grails hub app (and artifact) to read the configuration from this locations. See for instance the OZCAM hub. The generic-hub is a starting project for this kind of grails biocache hubs.

See how the ala-hubs reads its config here in the standard config /data/ala-hub/...

Other option is to deploy the standard ala-hub war etc, and add in the generated inventories:

biocache_hub = ala-hub
biocache_hub_artifact = ala-hub

to not customize the war.

Customizations

See this page for info about hubs customizations.


More information

For more technical information, see section 2.6 Data Hub on the ALA Key Technical Documentation (English).