Import Genomics DB data using Ansible - GenomicsDB/GenomicsSampleAPIs GitHub Wiki

The genomicsdb-webserver playbook repo is meant to be used as a quick and easy way to import genomics db data (tile db and meta db), and setup a working genomics db webserver at a target node. This also means that the data has to be exported from another instance, and made available to the playbook.

The genomicsdb-webserver playbook uses the genomicsdb playbook, hence all the infrastructure setup and variables from that playbook are applicable. The details of the genomicsdb playbook can be found here

The Ansible playbook - genomicsdb-webserver will setup the following

  1. Copy the data set from target to the host node
  2. Load the data into Tile DB
  3. Import the data into Meta DB
  4. Update Meta DB with the workspace and array name that the playbook uses
  5. Setup nginx configuration
  6. Setup GA4GH configuration
  7. Setup GA4GH service
  8. Start both nginx and GA4GH service

In addition to the variables from the genomicsdb playbook, the following variables can be overridden (if necessary). The defaults can be found in defaults/main.yml under the genomicsdb-webserver role

Name Description
array_name Name of the tile db array
system_services_path Path where system services are stored
system_services_extension File extension for the services file.
webserver_port Externally available port where the webserver can be accessed
import_path Path where the tiledb_csv and metadb_csv files are available
metadb_file Name of the db.gz file that has the Meta DB data (Export using `pg_dump --no-owner --data-only -d db_name
callset_mapping_file Name of the callset mapping file for GenomicsDB import, contains callset information mapping for GenomicsDB. NOTE callset_mapping_file also contains the path to the files that will be used during the import process. The path has to be accessible from the node for the owner_user.
vid_mapping_file Name of vid mapping file for GenomicsDB import, contains fields and reference set information for GenoimcsDB.
size_per_column_partition Buffer size that GenomicsDB will allocate while reading sample/CallSet. See GenomicsDB wiki for more info.
delete_and_create_tiledb_array If set to true, GenomicsDB loading process will delete existing data in the array.