Installing Terminology Validators - onc-healthit/inferno-community GitHub Wiki

Terminology Support

2020-November-30 UMLS Auth Workaround

IMPORTANT: As of November 2020, UMLS login now requires the use of a federated login provider, such as Google, Microsoft, or https://login.gov. This change breaks the first step of the terminology build process outlined below.

As a temporary workaround, if you download https://download.nlm.nih.gov/umls/kss/2019AB/umls-2019AB-full.zip (note: this file is several GB in size), rename it umls.zip and place it in <inferno root>/tmp/terminology, that should allow the terminology processing to skip the download step and continue normally. This also eliminates the need to create the .env file (outlined below).

We hope to have this login system supported in the near future, which will re-enable the ability to do a completely automated terminology build from start to finish.

For more information on the UTS sign-in process changes, visit https://www.nlm.nih.gov/research/umls/uts-changes.html.

Terminology prerequisites

In order to validate terminologies, Inferno must be loaded with files generated from the Unified Medical Language System (UMLS). The UMLS is distributed by the National Library of Medicine (NLM) and requires an account to access.

Inferno provides some rake tasks which may make this process easier, as well as a Dockerfile and docker-compose file that will create the validators in a self-contained environment.

Prerequisites:

  • A UMLS account
  • A working Docker toolchain, which has been assigned at least 10GB of RAM (The Metathesaurus step requires 8GB of RAM for the Java process)
    • Note: the Docker terminology process will not run unless Docker has access to at least this much RAM.
  • A copy of the Inferno repository, which contains the required Docker and Ruby files

You can prebuild the terminology docker container by running the following command:

docker-compose -f terminology_compose.yml build

Once the container is built, you will have to add your UMLS username and passwords to a file named .env at the root of the inferno project. The file should look like this:

UMLS_USERNAME=<your UMLS username>
UMLS_PASSWORD=<your UMLS password>

Once that file exists, you can run the terminology creation task by using the following commands, in order:

docker-compose -f terminology_compose.yml up

This will run the terminology creation steps in order, using the UMLS credentials supplied in .env. These tasks may take several hours. If the creation task is cancelled in progress and restarted, it will restart after the last completed step. Intermediate files are saved to tmp/terminology in the Inferno repository that the Docker Compose job is run from, and the validators are saved to resources/terminology/validators/bloom, where Inferno can use them for validation.

Building the validators without Docker

To build the validators without using the provided Docker script (in a Mac or Linux environment), run the following commands, from the Inferno repository root directory:

export UMLS_USERNAME=<your UMLS username>
export UMLS_PASSWORD=<your UMLS password>
./bin/run_terminology.sh

This will run through all of the steps to create the validators on the local system, rather than in a Docker container. This step requires that Ruby be installed on your local system, and that you have run the bundle install task in your Inferno root directory as well.

Manually creating the validators

If you want to manually walk through each step in the validator creation process, detailed instructions for each step are provided below:

Download FHIR ValueSet and CodeSystem resources

Download the FHIR ValueSet and CodeSystem definitions:

bundle exec rake terminology:download_program_terminology

Downloading the UMLS zip file

Inferno provides a task which attempts to download the UMLS .zip file for you:

bundle exec rake terminology:download_umls[username, password]

Note: username and passwords should be entered as strings to avoid issues with special characters. For example

bundle exec rake terminology:download_umls['jsmith','hunter2!']

Or

bundle exec rake 'terminology:download_umls[jsmith,hunter2!]'

This command requires a valid UMLS username and password. Inferno does not store this information and only uses it to download the necessary files during this step.

If this command fails, or you do not have a UMLS account, the Full Release file can be downloaded directly from the UMLS website. Inferno currently supports the 2019AB UMLS release version in our terminology system.

https://www.nlm.nih.gov/research/umls/licensedcontent/umlsarchives04.html#2019AB

Unzipping the UMLS files

The UMLS files should be decompressed for processing and use. The metamorphoSys utility provided within the UMLS distribution must be unzipped as well.

Inferno provides a task which will attempt to unzip the files into the correct location for further operation:

bundle exec rake terminology:unzip_umls

Users can also manually unzip the files. The mmsys.zip file should be unzipped to the same directory as the other downloaded files.

See https://www.nlm.nih.gov/research/umls/implementation_resources/metamorphosys/help.html#screens_tabs for more details.

Creating a UMLS Subset

The metamorphoSys tool can customize and install UMLS sources. Inferno provides a configuration file and a task to help run the metamorphoSys tool.

bundle exec rake terminology:run_umls

The UMLS tool can also be manually executed.

Note: This step can take a while to finish

Loading the subset

Inferno loads the UMLS subset into a SQLite database for executing the queries which support creating the terminology validators. A shell script is provided at the root of the project to automatically create the database

./bin/create_umls.sh

Creating the Terminology Validators

Once the UMLS database has been created the terminology validators can be created for Inferno's use.

bundle exec rake terminology:create_vs_validators

Cleaning up

The UMLS distribution is large and no longer required by Inferno after processing.

Inferno provides a utility which removes the umls.zip file, the unzipped distribution, and the installed subset

bundle exec rake terminology:cleanup_umls