Support of coding schemes - QIICR/ProjectIssuesAndWiki GitHub Wiki
Objective
It will be important to provide a capability for the user to quickly locate codes while creating structured reports, introduce new codes, and easily create and share private coding schemes as necessary.
Background
DICOM supports a number of coding schemes. These include a subset of SNOMED-CT (SRT coding scheme), and codes that were not found in the commonly used terminologies (at least, at the time they were introduced to the standard) in DCM coding scheme. The reason for selecting a subset of SNOMED for inclusion in DICOM is because SNOMED provides several approaches to encode the same thing (e.g., DICOM always prefers "Structure of [...]" codes vs "Entire [...]", and also SNOMED codes included in the standard do not require any license for DICOM users.
While selecting a specific code, it is important to consider the context where that code is defined in the standard. For example, (111462,DCM,"Solid mass") is defined in the context of ultrasound measurements (which, by the way, is not obvious for the user that looks up the term in the DICOM Part 16 Annex D containing the list of DCM definitions, unless they also search for the context groups in which the code is used, in this case CID 6064 "Ultrasound Findings for Breast"). Clearly, the Annex D definition of "A tumor or lesion" is also completely inadequate, since it fails to mention the "solid" versus "cystic" distinction.
Several existing resources are available for looking up codes:
- UMLS Metathesaurus: allows to search several terminologies, such as Radlex, SNOMED, MESH. Has a nice autocompletion feature.
- BioPortal: includes DCM coding scheme (added by David), supports programmatic search, does not require login or license
Use cases we need to support
- Code exists in DICOM, user needs to look up that code quickly, either constrained to a certain context, or not. We could probably support this by programmatic interaction with the existing repositories, and can also have a separate instance for just DICOM included terms (the set of those codes is rather small).
- Code does not exist in DICOM, suitable code needs to be located in the external resources. Possibly, this lookup should guide user to select the relationship/group for that code.
- Code does not exist anywhere else either, a private coding scheme should be populated automatically on the background (Slicer default coding scheme, or under Slicer/QIICR UID), and it should be possible to save, reload, and export that coding scheme for sharing with other users / proposing amendments to the standard.
Considerations for generation of private codes
- A key issue is avoiding collisions of code value.
- A central service (e.g., registry) could be provided, but that assumes constant availability and willingness to share, as well as support for the resource.
- A random number based generations scheme with low probability of collision could work (e.g., a 20 byte (160 bit) FIPS 140-2 random number would probably be sufficient; see http://docs.oracle.com/javase/7/docs/api/java/security/SecureRandom.html). For encoding of such long binary values in DICOM 16 character Code Value strings this turns out to be too long; even using RADIX93 encoding (all printable 7 bit ASCII characters except backslash and space) would require 25 characters (see also http://dclunie.blogspot.com/2013/09/youre-gonna-need-bigger-field-not-radix.html). A 12 byte (96 bit) random number seems to be about the most that can fit in a 16 bit string; using RADIX93 encoding takes 15 characters. Indeed, an RFC1924 (RADIX85) conversion also fits in 15 characters and might be a better choice, being a standard (https://tools.ietf.org/html/rfc1924). The question is, is a 96 bit random number sufficient to prevent collisions (http://preshing.com/20110504/hash-collision-probabilities/ and http://en.wikipedia.org/wiki/Birthday_problem)?
- If one could generate such codes randomly, then the same Coding Scheme Designator ("99RANDOM96BITRFC1924") could be used by every instance following the same pattern.
- The generation should capture the Code Meaning to use (and check it is compliant length and character wise), as well as a plain text "definition", and for a future extension, +/ relationships to "component" codes (perhaps in other schemes, e.g., isA, partOf, etc.)
- A combination of both local (random) generation and a central registry (to which to optionally submit one's new code) might be the best of both worlds, allowing offline use, minimal dependencies, yet still allow for sharing.
- If these codes were ever to be hand transcribed, a check digit might be a good idea, to detect entry errors and give immediate feedback, and since a RADIX93 or RADIX85 of a 96 bit code for representation takes 15 characters, it would allow space in the 16 characters for an extra digit (e.g., LOINC codes are nnnnn-x where x is a check digit, see http://loinc.org/faq/getting-started/structure-of-loinc-codes-and-names/#loinc-code-structure).