CASI - sporedata/researchdesigneR GitHub Wiki

General description

The Clinical Abbreviation Sense Inventory (CASI) for medical term disambiguation dataset comprises 440 of the most frequently-used abbreviations and acronyms selected from 352,267 dictated clinical notes.

The Unified Medical Language System (UMLS), Another Database of Abbreviations in Medline (ADAM), and Stedman's Medical Abbreviations, Acronyms & Symbols (4th edition) were all used to lexically align the 949 senses of each abbreviation and acronym from 500 randomly selected instances within clinical notes.

A sense inventory (SI) is a collection of abbreviations and acronyms (short forms) with their potential meanings (long forms), and other pertinent information about these terms.

Related publications

Data access

CASI Dataset