2010 i2b2 VA - sporedata/researchdesigneR GitHub Wiki

General description

The 2010 i2b2/VA dataset is a collection of de-identified clinical notes used for a shared task that focused on concept extraction from clinical narratives. The dataset was developed as part of the 2010 i2b2 (Informatics for Integrating Biology and the Bedside) and Veterans Affairs (VA) shared challenge, which aimed to foster research in natural language processing (NLP) of clinical data. Specifically, the task was designed to identify and extract medical concepts, relations between them, and assertions about these concepts from clinical text.

In summary, the 2010 i2b2/VA dataset was a pivotal resource for advancing the state of NLP in healthcare, focusing on the extraction of structured information from unstructured clinical text. This has far-reaching implications for healthcare technology, including improving the functionality of EHRs, developing clinical decision support tools, and enabling large-scale medical research.

Dataset Categories

The 2010 i2b2/VA dataset consists of de-identified discharge summaries and other clinical notes obtained from hospital records. The notes were annotated with:

  • Medical concepts (problems, treatments, tests).
  • Relations between these concepts.
  • Assertions about the concepts.

The dataset is notable for its real-world applicability since it contains clinical notes written by healthcare providers, capturing the complexities and ambiguities of natural language in a medical context.

Related publications