Ansible Tips for Variant Store Playbook - GenomicsDB/GenomicsSampleAPIs GitHub Wiki

Useful tips before you begin

  1. If your environment uses proxy to access the internet, then the playbook assumes that you have setup the http_proxy and https_proxy environment variables correctly before running the playbook.

  2. Passing Variables On The Command Line is a useful read to learn how to pass variables when running ansible-playbook

  3. If you are not running as root, please prefix the ansible-playbook commands in the sections below with sudo -E

  4. If you need to setup ssh_proxy, you will have to use the user_libs as well to add any packages that your ProxyCommand may use. An example extra-vars is shown below:

    --extra-vars '{"user_libs":["nmap"], "ssh_proxy": "ProxyCommand ncat --proxy-type socks4 --proxy <socks-proxy-url> %h %p"}'
    

Running the Ansible Playbook

Pre-requisites

  1. If ansible is not installed already, run:

    pip install ansible==1.9.4
    
  2. Access ansible related files in GenomicsSampleAPIs/infrastructure and run ansible-galaxy command to setup ansible-role-build-essential and ansible-role-repo-epel:

    git clone [email protected]:Intel-HLS/GenomicsSampleAPIs.git
    cd GenomicsSampleAPIs/infrastructure
    ansible-galaxy install -p ./ansible/roles -r ansible_requirements.txt
    

Running the Genomics DB Ansible Playbook

Install Genomics DB Infrastructure describes the details about the playbook, and all the variables that are used below. The steps below is a practical usage guide.

  1. In the ansible directory, run the playbook.

    cd ansible
    ansible-playbook -i 'default,' genomicsdb.yml --connection=local
    

Running the Genomics DB web server Ansible Playbook

Install Genomics DB web server describes the details about the playbook, and all the variables that are used below. The steps below is a practical usage guide on setting up a web server with existing data set.

Files required to run webserver

The web-server requires the following files:

  1. PostgreSQL dump of the Meta DB that corresponds to the Genomics DB that you want to instantiate. (extra-vars metadb_file). You can get a dump of the PostgreSQL DB using:

    pg_dump --no-owner --data-only db_name | gzip > db_name.db.gz
    
  2. CallSet mapping file for GenomicsDB (extra-vars callset_mapping_file)

    1. Note that this files has paths to the csv or vcf files. It is necessary that the paths are available on the host where the webserver is being setup.
  3. Vid mapping file for GenomicsDB (extra-vars vid_mapping_file)

Running the Genomics DB web server Ansible Playbook

It is not necessary to run the genomicsdb playbook as a pre-requisite if you plan to run the genomicsdb-webserver. By default, the genomicsdb-webserver will force override repos from the genomicsdb playbook.

  1. Run steps under pre-requisites.

  2. In the ansible directory, run the playbook.

    cd ansible
    ansible-playbook -i 'default,' --extra-vars '{"meta_db_name" : "<name for metadb>", "array_name" : "<Name of TileDB array>", "webserver_port" : "<port number to run the webserver>", "import_path" : "<path to folder with exported files>", "metadb_file" : "<pg_dump db_name.db.gz>", "callset_mapping_file" : "<file.callset_mapping>", "vid_mapping_file" : "<file.vid_mapping>"}' genomicsdb-webserver.yml --connection=local
    

Note: The genomicsdb-webserver playbook can be run multiple time with different datasets in order to setup multiple instances of the web server. But make sure that the overridden variables' (--extra-vars) values are non-overlapping, and if they are then the consequences are unknown.

Common issues that users run into

  1. If you make a mistake in your ssh_proxy then the /home/${owner_user}/.ssh/config file is no longer valid. You will have to sudo as the ${owner_user} and remove the incorrect lines
  2. If webserver playbook was used to setup and variants are not being returned, then tiledb loading settings may need to be altered. Read more in this table about delete_and_create_tiledb_array and size_per_column_partition.
  3. If webserver playbook was used and no array_name is given, test data is used by default.