Operations & deployment info - RTXteam/RTX GitHub Wiki
This is a draft operations page for the ARAX system. It is not complete. We are working on filling it with instructions and procedures.
- Navigate your web browser to the ARAX web browser UI, for example arax.test.transltr.io.
- In the Queries page (selected using the "Queries" navigation link in the section "Input" of the navigation bar in the left-hand side of the window), click on the
{JSON}
tab. - In the text-box below "JSON input", enter the following TRAPI query graph verbatim:
{"nodes": {"n00": {"ids": ["RTX:KG2c"]}}, "edges": {}}
- Click the blue "Post to ARAX" button
- In the navigation bar in the left-hand side of the window, under the "Output" section, click on the "Results" navigation link.
- In the "Expansion Results" section, you should see a single "result", whose title should indicate the RTX-KG2 release version, like "Result 1 :: RTX-KG2.10.0c".
- ssh into the arax server:
ssh <user>@<arax server name>
- get into the docker container:
sudo docker exec -ti rtx1 bash
- look at which services are running:
service --status-all
this should return a list that looks similar to the following:
[ + ] RTX_Complete
[ + ] RTX_OpenAPI_beta
[ + ] RTX_OpenAPI_devED
[ + ] RTX_OpenAPI_devLM
[ - ] RTX_OpenAPI_dili
[ - ] RTX_OpenAPI_legacy
[ - ] RTX_OpenAPI_mvp
[ - ] RTX_OpenAPI_production
[ + ] RTX_OpenAPI_test
[ + ] apache-htcacheclean
[ + ] apache2
[ - ] apparmor
[ - ] bootmisc.sh
[ - ] checkfs.sh
[ - ] checkroot-bootclean.sh
[ - ] checkroot.sh
[ - ] cron
[ - ] dbus
[ - ] hostname.sh
[ ? ] hwclock.sh
[ - ] killprocs
[ - ] mountall-bootclean.sh
[ - ] mountall.sh
[ - ] mountdevsubfs.sh
[ - ] mountkernfs.sh
[ - ] mountnfs-bootclean.sh
[ - ] mountnfs.sh
[ + ] mysql
[ + ] neo4j
[ ? ] networking
[ - ] nginx
[ ? ] ondemand
[ - ] procps
[ - ] rc.local
[ - ] rsync
[ - ] sendsigs
[ - ] umountfs
[ - ] umountnfs.sh
[ - ] umountroot
[ - ] unattended-upgrades
[ - ] urandom
[ - ] x11-common
- the services that need to be running for production are
apache2
,mysql
,apache-htcacheclean
,RTX_Complete
, andRTX_OpenAPI_production
. - In this case
RTX_OpenAPI_production
is not running to start again runservice RTX_OpenAPI_production start
to start it again. This should print the following if all goes well:
* Starting system RTX_OpenAPI_production daemon [ OK ]
-
Check the list of containers:
sudo docker ps -a
-
(a) If the container
rtx1
is running but is not responding restart it withsudo docker restart rtx1
(b) Otherwise, if it is stopped start it with
sudo docker start rtx1
-
get into the docker container:
sudo docker exec -ti rtx1 bash
-
Start all of the commonly used services:
service apache2 start
service apache-htcacheclean start
service mysql start
service RTX_Complete start
service RTX_OpenAPI_production start
service RTX_OpenAPI_beta start
service RTX_OpenAPI_test start
service RTX_OpenAPI_devED start
service RTX_OpenAPI_devLM start
- Wait a few seconds and double check that it is running at arax.ncats.io
Important: Please do not start ARAX by running the init script /etc/init.d/RTX_OpenAPI_<DEVAREA>
directly. Instead always
use the service
command to start ARAX. Otherwise it will cause issues like RTX issue 2350.
- establish a remote terminal session in the instance:
ssh [email protected]
; you have to know what your Linux username onarax.ncats.io
is, and it may not be the one you use on your home institution systems or dev system. The rest of the steps below assume you are running commands in the bash shell in the host OS onarax.ncats.io
. - start the
rtx1
Docker container:sudo docker start rtx1
- start
mysql
inside the container:sudo docker exec rtx1 service mysql start
- start the "autocomplete" service inside the container:
sudo docker exec -it rtx1 service RTX_Complete start
- start the production ARAX API inside the container:
sudo docker exec rtx1 service RTX_OpenAPI_production start
- (for any other ARAX API endpoints like "beta" or "devED", do the same as above but substituting the other endpoint name instead of "production")
devED
test
beta
devLM
NewFmt
- start
apache2
inside the container:sudo docker exec rtx1 service apache2 start
- point your browser at https://arax.ncats.io and run a test query. Also test out the autocompleter.
Log into the arax.ncats.io
instance:
Enter the rtx1
Docker container:
sudo docker exec -ti rtx1 bash
Kill all python processes (this causes all RTX services to stop working correctly since they run python):
killall python3
Then to restart, run:
service RTX_OpenAPI_production start
service RTX_OpenAPI_devED start
service RTX_Complete start
service RTX_OpenAPI_test start
service RTX_OpenAPI_beta start
service RTX_OpenAPI_devLM start
service RTX_OpenAPI_NewFmt start
Note that the last service is only relevant during the interim period where we are transitioning between TRAPI versions, and thus have separate ARAX endpoints for the previous TRAPI version (e.g., 1.1) and the new TRAPI version (e.g., 1.2).
Deploying changes to the endpoint /foo
on arax.ncats.io (e.g., /beta
) which is running branch currentbranch
(e.g., master
) involves (approximately) the following steps:
ssh arax.ncats.io
sudo docker exec -it rtx1 bash
su - rt
cd /mnt/data/orangeboard/foo/RTX
git status
check that the only modifications to tracked files are in openapi.yaml
and then do:
git pull origin currentbranch
exit
service RTX_OpenAPI_foo restart
tail -f /tmp/RTX_OpenAPI_foo.elog
If you need to switch the branch that the endpoint /foo
is on, say from currentbranch
to otherbranch
, the above steps would instead look something like this:
git pull origin currentbranch
git checkout otherbranch
git pull origin otherbranch
exit
service RTX_OpenAPI_foo restart
tail -f /tmp/RTX_OpenAPI_foo.elog
This process essentially consists of building a new KG2c and other downstream databases off of this new KG2 version, organizing the necessary build artifacts on arax.ncats.io, uploading them to ITRB's SFTP server, and making any necessary code changes to ensure ARAX is compatible with the new KG2 version.
See this Github issue template for steps to roll-out a new KG2 version. You can create a new issue from this template at: https://github.com/RTXteam/RTX/issues/new?template=kg2rollout.md.
On arax.ncats.io
, we use Nginx as a TLS endpoint which proxies unencrypted HTTP requests to port 8080 on the host OS. We currently set the number of worker_connections
to 10000.
For all ARAX services (both ITRB deployed services and those that are running on our team's development instance and that are not ITRB deployed), we use a central database server for storing records of ARAX queries and pointers to their result JSON in an S3 bucket. The database server is running on on-demand EC2 instance arax-responses.rtx.ai
in the us-east-1
AWS region. The server is running inside a Docker container 4394c6724a54
on that instance. Within the container, as root, you would run service mysqld status
to check the status:
service mysqld status
mysqld: unrecognized service
root@4394c6724a54:/# service mysql status
* /usr/bin/mysqladmin Ver 8.0.34-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu))
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Server version 8.0.34-0ubuntu0.20.04.1
Protocol version 10
Connection Localhost via UNIX socket
UNIX socket /var/run/mysqld/mysqld.sock
Uptime: 380 days 21 hours 41 min 7 sec
Threads: 27 Questions: 424280834 Slow queries: 820 Opens: 695 Flush tables: 3 Open tables: 465 Queries per second avg: 12.892
If you see these errors in the ARAX log:
2024-09-09T19:15:05.622014 ERROR: Unable to store response record in MySQL
and if they are recurring and reported by multiple users, MySQL may be down. You can check:
ssh [email protected]
sudo docker exec 4394c6724a54 service mysql status
If MySQL is indeed down or not accepting connections, the recommended fix would be to restart mysqld on arax-responses.rtx.ai
,
ssh [email protected]
sudo docker exec 4394c6724a54 service mysql restart
Translator services are required to gather web request telemetry data (on both the client side if the client is a Translator service and on the server side if the server is another Translator service) via OpenTelemetry and to deposit those telemetry data into a Jaeger data collector. ITRB-deployed ARAX and Plover2.0 services transmit their OpenTelemetry data to an ITRB-provided Jaeger service. But ARAX services on arax.ncats.io
(our development instance) and our development Plover2.0 instances (when running) send their OpenTelemetry data to a Jaeger service on the EC2 instance jaeger.rtx.ai
. Therefore, the jaeger.rtx.ai
instance should be kept running at all times. All ITRB instances of ARAX and Plover2.0 send their telemetry data to jaeger-otel-agent.sri
.