Installing Terminology Validators - onc-healthit/inferno-community GitHub Wiki
Terminology Support
2020-November-30 UMLS Auth Workaround
IMPORTANT: As of November 2020, UMLS login now requires the use of a federated login provider, such as Google, Microsoft, or https://login.gov. This change breaks the first step of the terminology build process outlined below.
As a temporary workaround, if you download https://download.nlm.nih.gov/umls/kss/2019AB/umls-2019AB-full.zip (note: this file is several GB in size), rename it umls.zip
and place it in <inferno root>/tmp/terminology
, that should allow the terminology processing to skip the download step and continue normally. This also eliminates the need to create the .env
file (outlined below).
We hope to have this login system supported in the near future, which will re-enable the ability to do a completely automated terminology build from start to finish.
For more information on the UTS sign-in process changes, visit https://www.nlm.nih.gov/research/umls/uts-changes.html.
Terminology prerequisites
In order to validate terminologies, Inferno must be loaded with files generated from the Unified Medical Language System (UMLS). The UMLS is distributed by the National Library of Medicine (NLM) and requires an account to access.
Inferno provides some rake tasks which may make this process easier, as well as a Dockerfile and docker-compose file that will create the validators in a self-contained environment.
Prerequisites:
- A UMLS account
- A working Docker toolchain, which has been assigned at least 10GB of RAM (The Metathesaurus step requires 8GB of RAM for the Java process)
- Note: the Docker terminology process will not run unless Docker has access to at least this much RAM.
- A copy of the Inferno repository, which contains the required Docker and Ruby files
You can prebuild the terminology docker container by running the following command:
docker-compose -f terminology_compose.yml build
Once the container is built, you will have to add your UMLS username and passwords to a file named .env
at the root of the inferno project. The file should look like this:
UMLS_USERNAME=<your UMLS username>
UMLS_PASSWORD=<your UMLS password>
Once that file exists, you can run the terminology creation task by using the following commands, in order:
docker-compose -f terminology_compose.yml up
This will run the terminology creation steps in order, using the UMLS credentials supplied in .env
. These tasks may take several hours. If the creation task is cancelled in progress and restarted, it will restart after the last completed step. Intermediate files are saved to tmp/terminology
in the Inferno repository that the Docker Compose job is run from, and the validators are saved to resources/terminology/validators/bloom
, where Inferno can use them for validation.
Building the validators without Docker
To build the validators without using the provided Docker script (in a Mac or Linux environment), run the following commands, from the Inferno repository root directory:
export UMLS_USERNAME=<your UMLS username>
export UMLS_PASSWORD=<your UMLS password>
./bin/run_terminology.sh
This will run through all of the steps to create the validators on the local system, rather than in a Docker container. This step requires that Ruby be installed on your local system, and that you have run the bundle install
task in your Inferno root directory as well.
Manually creating the validators
If you want to manually walk through each step in the validator creation process, detailed instructions for each step are provided below:
Download FHIR ValueSet and CodeSystem resources
Download the FHIR ValueSet and CodeSystem definitions:
bundle exec rake terminology:download_program_terminology
Downloading the UMLS zip file
Inferno provides a task which attempts to download the UMLS .zip file for you:
bundle exec rake terminology:download_umls[username, password]
Note: username and passwords should be entered as strings to avoid issues with special characters. For example
bundle exec rake terminology:download_umls['jsmith','hunter2!']
Or
bundle exec rake 'terminology:download_umls[jsmith,hunter2!]'
This command requires a valid UMLS username
and password
. Inferno does not store this information and
only uses it to download the necessary files during this step.
If this command fails, or you do not have a UMLS account, the Full Release
file can be
downloaded directly from the UMLS website. Inferno currently supports the 2019AB
UMLS release version in our terminology system.
https://www.nlm.nih.gov/research/umls/licensedcontent/umlsarchives04.html#2019AB
Unzipping the UMLS files
The UMLS files should be decompressed for processing and use. The metamorphoSys utility provided within the UMLS distribution must be unzipped as well.
Inferno provides a task which will attempt to unzip the files into the correct location for further operation:
bundle exec rake terminology:unzip_umls
Users can also manually unzip the files. The mmsys.zip file should be unzipped to the same directory as the other downloaded files.
See https://www.nlm.nih.gov/research/umls/implementation_resources/metamorphosys/help.html#screens_tabs for more details.
Creating a UMLS Subset
The metamorphoSys tool can customize and install UMLS sources. Inferno provides a configuration file and a task to help run the metamorphoSys tool.
bundle exec rake terminology:run_umls
The UMLS tool can also be manually executed.
Note: This step can take a while to finish
Loading the subset
Inferno loads the UMLS subset into a SQLite database for executing the queries which support creating the terminology validators. A shell script is provided at the root of the project to automatically create the database
./bin/create_umls.sh
Creating the Terminology Validators
Once the UMLS database has been created the terminology validators can be created for Inferno's use.
bundle exec rake terminology:create_vs_validators
Cleaning up
The UMLS distribution is large and no longer required by Inferno after processing.
Inferno provides a utility which removes the umls.zip file, the unzipped distribution, and the installed subset
bundle exec rake terminology:cleanup_umls