Config, databases, and SFTP - RTXteam/RTX GitHub Wiki

Config files

There are two ARAX config files: config_secrets.json and config_dbs.json.

config_secrets.json

This config file is auto-downloaded to machines running ARAX code (appx. every 24 hours, using the same auto-download system as the old configv2.json) from the 'master' copy on araxconfig.rtx.ai at /home/araxconfig/config_secrets.json.

This config file is meant to contain things that should never be checked into the repo or shared publicly (usernames/passwords, etc.).

In the event you want to override the 'master' config_secrets.json, simply create a local copy of config_secrets.json, rename it config_secrets_local.json, and edit its contents how you'd like. If a config_secrets_local.json is present it will always be used preferentially over config_secrets.json.

config_dbs.json

This file lives in the RTX repo (at RTX/code/config_dbs.json). It essentially contains the paths for the 'master' copies of the various databases on arax-databases.rtx.ai that are auto-downloaded to machines running ARAX code (by ARAX_database_manager.py), as well as the paths of the current KG2pre/KG2c Neo4j endpoints.

NOTE: The root of the paths in config_dbs.json (i.e., /translator/data/orangeboard/databases/) is not the current root path for databases on arax-databases.rtx.ai (which is actually /home/rtxconfig/); they are legacy paths from our old database storage location. ITRB could not adjust their scripts to work with the new root paths when we moved to arax-databases.rtx.ai, so we left those root paths as they were in config_dbs.json, and instead RTXConfiguration maps from the old root paths to the current root paths as appropriate. It's silly, but it works. When you upload a database to arax-databases.rtx.ai, just put it under the proper KG2 directory (e.g., /home/rtxconfig/KG2.8.0).

Note: Before pushing a change to config_dbs.json in master, ensure that any new databases pointed to have already been uploaded to arax-databases.rtx.ai in the proper KG2 directory as well as to the ITRB SFTP server! If you point config_dbs.json (in the master branch) to a database that does not exist in both of those two places, things will break.

When updating a database, follow the steps in the section Steps when updating a database to ensure nothing breaks!

Overriding maturity and Plover/KG2 URL

RTXConfiguration dynamically determines a machine's 'maturity' (based on current branch and/or instance/domain name), which is used to select which Plover KG2 URLs to use. But it also provides a mechanism for overriding that maturity. If, for example, you wanted your own machine to run as 'production' maturity, simply create a local one-line file called maturity_override.txt that contains that maturity:

echo "production" > RTX/code/maturity_override.txt

Remember to delete your local override file after you're done!

By default, ARAX will determine the correct URL to use for querying KG2 (hosted in Plover2.0) by looking at KG2's SmartAPI registration. If, however, you want ARAX to use a different KG2 URL (e.g., you're working on rolling out a new KG2 version), you can force ARAX to use a certain KG2 URL by putting it in the plover_url_override slot in RTX/code/config_dbs.json. So, for instance, that line in config_dbs.json might look like this:

   "plover_url_override": "https://kg2cplover.rtx.ai:9990",

When you're done, be sure to set the plover_url_override slot back to null.

Steps when updating a database

When you need to update one of the auto-downloaded databases listed in config_dbs.json, whether for a new KG2 version or for any other reason, follow these steps (order is important!):

  1. Make sure to give the new/updated database a new (unique) name (e.g., bump v1.0 --> v1.1, or KG2.X.1 --> KG2.X.2, as appropriate)
  2. Locally or in the branch you're working in (if applicable), update config_dbs.json to refer to the new database name
  3. Test the new database locally
    1. This includes running the ARAX pytest suite! Make sure you didn't break any tests.
  4. If all tests pass, upload the database to arax-databases.rtx.ai under the proper KG2 directory (e.g., /home/rtxconfig/KG2.8.0)
    1. Before uploading, ensure there is enough free disk space on arax-databases.rtx.ai (e.g., using df -h)
  5. Copy the database from arax-databases.rtx.ai to arax.ncats.io:
    1. First ensure there is enough free disk space on arax.ncats.io
    2. Then run these commands:
      1. scp [email protected]
      2. cd ../../data/orangeboard/databases/KG2.X.Y
      3. scp [email protected]:/home/rtxconfig/KG2.X.Y/my_database_v1.1_KG2.X.Y.sqlite .
  6. Follow the steps in the section Uploading databases to ITRB's SFTP server to upload the new database and its md5sum to the ITRB SFTP server
  7. Update config_dbs.json in master to point to the new database.
    1. If you're working in a branch, merge your branch into master at this point; this should carry your previous change to config_dbs.json into master.
    2. This should trigger an auto-deployment to ITRB's Staging instances, which should already have access to the new database thanks to Step 6.
  8. At this point it's safe for master to be rolled out to arax.ncats.io.
    1. It's generally a good idea to run the DatabaseManger when doing so, but it shouldn't be required.
  9. Download the new database to cicd.rtx.ai (automatic downloads to that instance don't currently work quite right), via these steps:
    1. ssh [email protected]
    2. cd RTX/
    3. git pull origin master
    4. python3 code/ARAX/ARAXQuery/ARAX_database_manager.py --mnt --skip-if-exists --remove_unused
    5. Note: You can do this step either before or after Step 8. Prior to completing this step, commits may show as 'Failing' in GitHub.

Uploading databases to ITRB's SFTP server

In addition to arax-databases.rtx.ai, all databases must be uploaded to ITRB's SFTP server, which is the instance ITRB's system downloads databases from.

ITRB manages users for the SFTP server (contact them if you need to gain access).

When uploading databases to the SFTP server, you need to upload not only the database file itself, but also its md5 checksum.

Steps for a single database

Below is a complete example showing how to upload a single database (in this case, curie_to_pmids_v1.0_KG2.7.6.sqlite) and its md5 checksum to ITRB's SFTP server:

ssh [email protected]
cd /data/orangeboard/databases/KG2.7.6
sudo bash
md5sum curie_to_pmids_v1.0_KG2.7.6.sqlite > curie_to_pmids_v1.0_KG2.7.6.sqlite.md5
exit
sftp [email protected]
cd databases/KG2.7.6
put curie_to_pmids_v1.0_KG2.7.6.sqlite
cd ../../md5_sums/KG2.7.6
put curie_to_pmids_v1.0_KG2.7.6.sqlite.md5
exit

Steps for all databases at once

Generally it's easier to upload all the new databases for a new KG2 version to the SFTP server in one batch. Below is an example of doing so for the KG2.8.0 databases:

# First upload all database files to the SFTP server
ssh arax.ncats.io
cd /data/orangeboard/databases/KG2.8.0
sftp team-expander-[myuser]@sftp.transltr.io
cd databases
mkdir KG2.8.0
cd KG2.8.0
put *2.8.0*
exit

# Then create their md5 checksums and upload those as well
sudo bash
mkdir md5_sums
chmod 777 md5_sums
exit
for file in *2.8.0*; do md5sum ${file} > md5_sums/${file}.md5; done
cd md5_sums
sftp team-expander-[myuser]@sftp.transltr.io
cd md5_sums
mkdir KG2.8.0
cd KG2.8.0        <------ IMPORTANT. Missed last time for 2.9.0
put *2.8.0*
exit

You do not need to warn ITRB when deploying a new database; simply ensure that you have uploaded it and its md5 checksum to the SFTP server in the way shown above, and then push your code change to config_dbs.json that points to that new database. If your commit was to master it will trigger a rebuild of the ITRB ARAX CI instance (arax.ci.transltr.io); it would be wise to test this instance to ensure it seems to be working properly. Note that if your commit involved pointing to a new database in config_dbs.json, you may need to wait up to around an hour to test the instance since it will take the system a while to download the new database(s) while it's rebuilding. If the system isn't working after said timeframe, post a message in the devops-teamexpanderagent channel in the Translator slack workspace and 'at' @Pouyan Ahmadi.