Add new molecular database - metaspace2020/metaspace GitHub Wiki

Adding a new molecular database is a manual process. This document describes the sequence of commands required for this.

Required input

  1. TSV file with next columns:
  • sequential number without specifying the column name
  • a unique ID for a specific database - “id” column name
  • molecule name - “name” column name
  • chemical formula of a molecule - “formula” column name
    id    name    formula
13    NPA024518    Nocapyrone R    C11H16O4
14    NPA024517    Penixanthone A    C16H18O3
  1. Database name, version, description and citation. For example:
  • Name: NPA
  • Version: 2019-08
  • Description: Taken from the NPA homepage: "The Natural Products Atlas provides open access coverage of bacterial and fungal natural products, giving researchers the power to visualize the chemical diversity of the natural world." Citation: van Santen, J. A.; Jacob, G.; Leen Singh, A.; Aniebok, V.; Balunas, M. J.; Bunsko, D.; Carnevale Neto, F.; Castaño-Espriu, L.; Chang, C.; Clark, T. N.; Cleary Little, J. L.; Delgadillo, D. A.; Dorrestein, P. C.; Duncan, K. R.; Egan, J. M.; Galey, M. M.; Haeckl, F. P. J.; Hua, A.; Hughes, A. H.; Iskakova, D.; Khadilkar, A.; Lee, J.-H.; Lee, S.; LeGrow, N.; Liu, D. Y.; Macho, J. M.; McCaughey, C. S.; Medema, M. H.; Neupane, R. P.; O’Donnell, T. J.; Paula, J. S.; Sanchez, L. M.; Shaikh, A. F.; Soldatou, S.; Terlouw, B. R.; Tran, T. A.; Valentine, M.; van der Hooft, J. J. J.; Vo, D. A.; Wang, M.; Wilson, D.; Zink, K. E.; Linington, R. G. "The Natural Products Atlas: An Open Access Knowledge Base for Microbial Natural Products Discovery”, ACS Central Science, 2019, 5, 11, 1824-1833. 10.1021/acscentsci.9b00806
  1. Molecule images Create docker image, based on Dockerfile:
cd metaspace/engine/docker/mol-struct-gen/
docker build -t metaspace2020/mol-struct-gen -f Dockerfile .

List of commands for adding a database

TSV file

  1. Upload the TSV file into S3 bucket sm-mol-db and set "Read Object" for "Everyone" in the Permission tab.
  2. Adding information (name, version, link) about the database to vars.yml.template file for all environments in metaspace-ansibe-config repository.
  3. Copy TSV file into EC2 instance.
  4. Run import script on each environment:
source activate sm38
cd /opt/dev/metaspace/metaspace/engine/
python scripts/import_molecular_db.py NPA 2019-08 "/tmp/npa_2019-08.tsv"
  1. Set some fields in DB:
UPDATE public.molecular_db SET targeted=false WHERE id=ID;
UPDATE public.molecular_db SET molecule_link_template='URL' WHERE id=ID;

Molecular images

  1. Generating images of molecules using a docker container
docker run -v $PWD:/home/obabel/mol-struct-gen --rm metaspace2020/mol-struct-gen <MOLDB_FILE> <MOL_IMG_DIR>
  1. Archiving the directory with SVG files into tar.xz archive
tar cf - mol_img_dir/ | xz -z - > mol-images-name.tar.xz
  1. Loading *.tar.xz archive on S3 bucket s3-mol-db and setting "Read Object" for "Everyone" in the Permission tab.

  2. Run ansible playbook to copy images inside EC2 instance

ansible-playbook -i env/ENV provision/web.yml -t sm-web --start-at-task="Create directory for molecular structure images"

Description for M\ site

  1. Adding a database description for the /help page in this file.
  2. Run ansible playbook to apply changes
ansible-playbook -i env/dev deploy/web.yml
⚠️ **GitHub.com Fallback** ⚠️