Dev info - RTXteam/RTX GitHub Wiki
- Coding guidelines
- Setting up for local dev work on ARAX
- Setting up local UI
- For ARAX Developers: General software development guidelines
- Testing
- Different instances
- Config files
- Branches and merging
- Dealing with Pull Requests
- Changing passwords
- Update Generic Terms Blocklist
- Old or infrequently used info
Please see the coding guidelines for SOP's and general practices.
NOTE: Use python 3.9! (Other versions may result in errors.) Some sort of python environment management tool may be your friend (e.g., pyenv, virtualenv).
- Clone the RTX repo and navigate into it (
cd RTX
) - Run
pip install -r requirements.txt
- Give your public RSA key to another ARAX dev for authentication
- If you don't have an RSA key already, you'll need to generate one
- The dev will need to put your public key on
araxconfig.rtx.ai
(under thearaxconfig
user) andarax-databases.rtx.ai
(under thertxconfig
user). - A simple test to see if it has worked is to run
ssh [email protected]
- Navigate to
RTX/code/ARAX/test
and runpytest -v
- Note that this triggers downloading of the necessary sqlite databases to your machine and can take over an hour depending on your internet connection. The databases take up about 200GB combined, so make sure you have that much space free on your machine! And also make sure your computer doesn't go to sleep while this step is running (otherwise some of the downloads may fail).
- On an MBP, you will have to comment out a bunch of code in
openai_server/__main__.py
that does the forking; use this code for__main__.py
instead:
import sys
import os
import traceback
import json
sys.path.append(os.path.dirname(os.path.abspath(__file__)) +
"/../../../../ARAX/ARAXQuery")
sys.path.append(os.path.dirname(os.path.abspath(__file__)) +
"/../../../..")
from RTXConfiguration import RTXConfiguration
from ARAX_database_manager import ARAXDatabaseManager
def eprint(*args, **kwargs): print(*args, file=sys.stderr, **kwargs)
FLASK_DEFAULT_TCP_PORT = 8080
CONFIG_FILE = 'openapi_server/flask_config.json'
def main():
rtx_config = RTXConfiguration() # noqa: F841
dbmanager = ARAXDatabaseManager(allow_downloads=True)
try:
eprint("Checking for complete databases")
if dbmanager.check_versions():
eprint("Databases incomplete; running update_databases")
dbmanager.update_databases()
else:
eprint("Databases seem to be complete")
except Exception as e:
eprint(traceback.format_exc())
raise e
del dbmanager
# Read any load configuration details for this instance
try:
with open(CONFIG_FILE, 'r') as infile:
local_config = json.load(infile)
except Exception:
eprint(f"Error loading config file: {CONFIG_FILE}")
local_config = {"port": FLASK_DEFAULT_TCP_PORT}
import connexion
import flask_cors
import openapi_server.encoder
app = connexion.App(__name__, specification_dir='./openapi/')
app.app.json_encoder = openapi_server.encoder.JSONEncoder
app.add_api('openapi.yaml',
arguments={'title': 'ARAX Translator Reasoner'},
pythonic_params=True)
flask_cors.CORS(app.app)
app.run(port=local_config['port'], threaded=True)
NOTE: This section is slightly outdated, doesn't seem to work as is.. needs updating
If you are running ARAX_query and friends on your local machine and are generating nice JSON, but you want to be able to visualize these JSON beasts through the UI, here's how you can do that (at least it worked for me on my Windows box):
Step 1) Install one more needed modules for CORS support and make sure connexion[swagger-ui] are installed
pip3 install flask_cors
pip3 install connexion[swagger-ui]
Step 2) Add a custom endpoint destination:
cd code/UI/interactive
cp config.js.example config.js
edit config.js to contain:
config.base = 'http://localhost:5001/';
config.baseAPI = config.base + "api/arax/v1.2";
Step 3) Start the Flask server (blocks this shell and runs until ^C)
cd code/UI/OpenAPI/python-flask-server
python3 -m openapi_server
Step 4) Point your web browser to the UI files on your local filesystem, something like:
file://G:/Repositories/GitHub/RTX/code/UI/interactive/index.html
or
file://G:/Repositories/GitHub/RTX/code/UI/interactive/index.html?r=1
(the r number is the response id that you want to view in the UI)
By changing the r number, you should be able to view the messages you are creating and storing via ARAQ_Query {make sure you don't have return(store=false) in your DSL otherwise there's no r number} In theory launching queries from the GUI should work, too, but I haven't properly tested it.
Care should be taken that the code never just dies because then there is no feedback about the problem in the API/UI. Use the ARAXResponse.error
mechanism to log informative messages throughout your code (see below section for more details):
-
DEBUG
: Only something an ARAX team member would want to see -
INFO
: Something an API user might like to see to examine the steps behind the scenes. Good for innocuous assumptions. -
WARNING
: Something that an API user should sit up and notice. Good for assumptions with impact -
ERROR
: A failure that prevents fulfilling the request. Note that logging an error may not halt processing. Several can accumulate. If you need processing to terminate, eitherreturn
orraise
anException
depending on where this error occurs.
An ARAXResponse
object is passed into each ARAX module's apply()
method; among many things, this object serves as ARAX's log. You may either use this same response object throughout your module by passing it to different methods/classes as needed OR you may instantiate new response objects and then merge them with the response object that is ultimately returned from the module's apply()
method.
- Major methods (not little helper ones that can't fail) and calls to different ARAX classes should always:
- Either instantiate a new
ARAXResponse
object or take one as an input parameter- Log with
response.debug
,response.info
,response.warning
, andresponse.error
- Place returned data objects in the
response.data
envelope (dict
)- Return that response object
- Log with
- Callers of major methods should call with
result = object.method()
- Then immediately merge the new result into the active response (if they are separate response objects)
- Then immediately check
result.status
to make sure it is'OK'
, and if not, return response or take some other action for method call failure - The class may store the
Response
object as an object variable and sharing it among the methods that way (this may be convenient)
We generally manage all work (bug fixes, features, and enhancements) via GitHub issues. The general workflow for working on a GitHub issue is as follows:
- Create a branch for your issue (typically off of the
master
branch, but possibly another branch depending on your particular issue) - Implement the necessary code changes for your issue in your branch
- Ensure your commit messages are under 70 characters and always reference the issue in your commit (e.g., with '#1000', if your issue number was 1000)
- It is generally ok to push commits to your branch that leave the system in a broken state, unless the branch is shared with other devs who do not expect the system to be broken (but you should never push breaking changes to
master
!)
- If you are working on this issue for an extended period of time you will likely want to periodically merge
master
(or whatever your parent branch was) into your branch (see section on Branches and Merging) - It can be a good idea to add one or more pytests (see the Testing section) that test out your fix/changes, but please ensure the test completes speedily (within ~10 seconds) or mark it with
@pytest.mark.slow
! - Once you believe you are done implementing changes, merge
master
into your branch and run the ARAX Pytest suite - If any tests are failing, you need to figure out why and address those
- Once all tests are passing, you can make a Pull Request to merge your branch into
master
(or whatever your parent branch was)- Be sure to reference the issue from your PR (same way as in commit messages)
- Once you become more experienced you may omit creating a PR and instead directly merge your branch into
master
- Once your PR is merged, please delete your branch (assuming you aren't using it for any other issues)
- Within about 10 minutes after your code is merged to
master
, it should be live onarax.ci.transltr.io
(thanks to ITRB's auto-deployment) - Verify that your changes are working as expected on arax.ci.transltr.io!
- After that, post a message in the GitHub issue letting whoever submitted the issue know that the changes are complete (and ideally provide a link to an example response demonstrating the changes, like https://arax.ci.transltr.io/?r=317043)
- If the person who submitted the issue is satisfied, the issue can be closed
- In your code, do not assume a particular location for the "current working directory". In general, try to use
os.path.abspath
to find the location of__FILE__
for your module and then construct a relative path to find other ARAX/RTX files/modules. - Always run the ARAX Pytest suite before pushing to
master
; do not push your changes tomaster
if any pytests are failing! - Strive to adhere to PEP8 style in your Python code.
The ARAX Pytest suite lives at: RTX/code/ARAX/test/
. The README in that directory provides details on how to use the test suite, but some examples are provided below as well.
Note that running pytests automatically triggers updating of your databases and KP info cache (as needed), so you do not need to run those update processes manually.
To run all tests, cd
to that folder and run
pytest -v .
To run the tests in a specific file
pytest -v <file.py>
To run a specific test:
pytest -v <file.py> -k <a test like test_example_3>
To run the slow tests:
pytest -v --runslow
To run the 'external' tests:
pytest -v --runexternal
To run all tests:
pytest -v --runslow --runexternal
The /asyncquery endpoint is a bit hard to test because you need to have a callback receiver that is Internet accessible or accessible to ARAX. There is a crude callback receiver available on ARAX itself.
How to use such a system is documented here: https://github.com/RTXteam/RTX/issues/1756
- "our" prod: arax.ncats.io
- "our" test: arax.ncats.io/test
- "our" beta: arax.ncats.io/beta
- ITRB production: arax.transltr.io
- ITRB test: arax.test.transltr.io
- ITRB CI/staging: arax.ci.transltr.io
See also this google doc with all endpoints and the branches they run.
The Jenkins dashboard for ITRB builds is here: https://deploy.transltr.io/.
ARAX has one config file that does not live in the RTX repo; it is called config_secrets.json
. The 'master copy' of this file lives on [email protected]
at /home/araxconfig/config_secrets.json
. ARAX developers' public RSA keys need to be listed in authorized_keys
on this instance; this allows config_secrets.json
to be automatically downloaded to their machine when queries are run (it auto-refreshes every 24 hours).
If desired, you may override config_secrets.json
by creating a (local) copy of it at RTX/code/config_secrets_local.json
, which you can tweak to contain whatever usernames/passwords you need. If a config_secrets_local.json
file is present, it will always be used instead of the regular config_secrets.json
.
NOTE: You should never push config_secrets.json
or share its contents in a public space! (i.e., beyond our team)
The ARAX database config file lives in the RTX repo at RTX/code/config_dbs.json
. This file specifies which versions of our various databases should be used. The ARAXDatabaseManager
automatically takes care of downloading/removing databases from developers' machines as needed, according to what is specified in config_dbs.json
.
-
production
anditrb-test
should not be committed to, save for ITRB-specific changes -
master
is to be merged intoproduction
and/oritrb-test
, not the other way around
To merge master
into mybranch
(replace with your own branch name), do the following:
git checkout master
git pull origin master
git checkout mybranch
git pull origin mybranch
git merge --no-ff origin/master
[if any merge conflicts: fix them and commit]
git push origin mybranch
To merge mybranch
into master
, do the following:
WARNING: Be very careful when merging anything into master
! Be sure your changes are fully tested and always first merge master
into your branch and test before doing this.
git checkout mybranch
git pull origin mybranch
git checkout master
git pull origin master
git merge --no-ff origin/mybranch
[if any merge conflicts: fix them and commit]
git push origin master
See this gist
- Install
gh
via these directions. - Check out the PR locally
gh pr checkout <PR number>
- Edit, check, commit, etc.
- If everything looks good:
-
git branch
to see what<branch name>
you are on -
git checkout master
switch to master branch -
git pull origin master
to make sure master is up to date -
git checkout <branch name>
switch back to PR branch -
git merge --no-ff origin/master
merge master into PR - Fix any merge conflicts
-
git checkout master
switch to master -
git merge --no-ff origin/<branch name>
to merge PR to master
-
To switch back to master: git checkout master
:server change-password
sudo service neo4j stop
sudo rm -rf /var/lib/neo4j/data/dbms
sudo -u neo4j neo4j-admin set-initial-password PASSWORD
sudo service neo4j start
$sudo mysql
>GRANT ALL ON RTXFeedback.* TO "rt"@"localhost" IDENTIFIED BY 'PASSWORD';
If rejected use:
$sudo mysql
>set password for 'rt'@'localhost'='PASSWORD';
To blocklist is stored in the general_concepts.json. Here, there are two ways of filtering out generic concepts
-
Specifying Curies:
In the
curie
section, add the curies you want to filter out in lower case to the list. It is not necessary to specify equivalent curies as they get filtered out too. -
Specifying terms:
In the
synonyms
section, add the terms to the list you want to filter out to the list. This can be specified either as a string; for examplecongenital
. These terms provided can be case insensitive. A node gets filtered out if its name or any of its synonyms(specified in the Node attributes) match with an item in this list. In thepatterns
section, a valid python regular expression such aspharmacolog.*
can be added and a node gets filtered out if its name or any of its synonyms(specified in the Node attributes) match with an item in this list.
Note: The synonymizer should be automatically downloaded into your dev environment upon running the pytest suite (or ARAX_database_manager.py
). But if you need to build one yourself for some reason, this explains how to do so.
See the "Build the synonymizer" instructions here.