Migration QuickUMLS 1.3 to 1.4 - Georgetown-IR-Lab/QuickUMLS GitHub Wiki

LevelDB is dead, long live to UnQLite!

When opening a QuickUMLS 1.3 installation with QuickUMLS 1.4, you'll see the following message:

>>> import quickumls
>>> matcher = quickumls.QuickUMLS('path/to/1.3/install')

[WARNING] This installation was created with QuickUMLS v.1.3 or earlier, which does not support multiple database backends. For now, I'll assume that leveldb was used as default, implicit assumption will change in future versions of QuickUMLS. More info here: https://github.com/Georgetown-IR-Lab/QuickUMLS/wiki/Migration-QuickUMLS-1.3-to-1.4

This message indicates that the installation was created with a version of QuickUMLS prior to 1.4. This change is due to the fact that QuickUMLS now supports multiple backends: LevelDB and UnQLite. Here's a brief summary of similarity and differences between the two:

  • Both LevelDB and UnQLite are used as key-value storage in QuickUMLS for CUIs and semantic types. Both are written in low level languages (C++ for LevelDB, C for UnQLite), and have Python bindings. Both have similar read and write speed (In a test with 1 million strings from "0" to "999999", LevelDB completed all writes in 4.2 seconds, while UnQLite took 5.4 seconds. Reading all keys back took 2.2 seconds for LevelDB, and 2.4 seconds for UnQLite).
  • LevelDB was created by Sanjay Ghemawat and Jeff Dean at Google, and first released in 2011. QuickUMLS uses Python bindings by Google. Compared to UnQLite, LevelDB is more disk efficient; in the test above, it was able to store 1M key values in just 808 kilobytes.
  • UnQLite was created by Chems Eddine Mrad at Symisc Systems, and first released in 2012. QuickUMLS uses Python bindings by Charles Leifer. UnQLite is less space efficient (using 75M to store the 1M key values mentioned above), but, unlike LevelDB it is thread and process safe, meaning that it allows to create multiple QuickUMLS objects backed by the same installation at the same time.

Currently, the QuickUMLS client still uses LevelDB as default to maintain compatibility with previous installations, but it now raises a warning if you do so.

Remove LevelDB Warning

To remove the LevelDB warning due to an older installation, simply create a file named database_backend.flag in your QuickUMLS installation directory, and write leveldb in the file. Your installation directory should now look as follows:

>>> ls -l
total 88K
drwxrwxr-x 4 ubuntu ubuntu 4.0K May 10 00:50 cui-semtypes.db
-rw-rw-r-- 1 ubuntu ubuntu    7 May 10 00:50 database_backend.flag
-rw-rw-r-- 1 ubuntu ubuntu    3 May 10 00:50 language.flag
drwxrwxr-x 2 ubuntu ubuntu  76K May 10 01:03 umls-simstring.db

This will stop the matcher from complaining during initialization.