Use case "Retrospective data encoding" - QIICR/ProjectIssuesAndWiki GitHub Wiki

(raw notes under development)

Objective

To use DICOM to fully represent the datasets collected at Iowa sites, as outlined in the diagram below:

Details

Description of the sample dataset used for the development

Use case diagram

An example dataset consists of data for 3 patients with head/neck cancer:

Each of the patients has a pre-treatment PET/CT scan (scan1) and one or several post-treatment scans (scan2, scan3, ...). In each of the scans the tumors and hot lymph nodes were traced manually in the PET scans by a radiation oncologist and stored as a labeled volume dataset. As a convention, label 1 was used for the primary tumor, label 2 for the hottest lymph node, label 3, 4, 5 etc. for other uptake regions:

  • Patient 62: Had initially a primary tumor and 1 hot node. Nothing was identified in scan 2 and 3. In scan 4 one hot node is present.
  • Patient 71: Had initially a primary tumor and 6 hot nodes. In the second scan the primary tumor consisted of two unconnected parts (both have label 1), and 1 hot node is visible. In the third scan the primary tumor is gone but 1 hot node remains.
  • Patient 244: Had initially 1 tumor and three hot nodes, all of them were gone in the first post-treatment scan. In the second post-treatment scan the patient showed a hot node in the lung adjacent to the heart. Note, that the utilized label for this distant node does not correspond to the node with the same label in the pre-treatment scan.

In addition:

  • For one PET/CT dataset (patient244/scan3), a lymph node in the lung was contoured in the CT dataset (patient) by a technician.

  • For all scans reference regions for uptake measurement (in liver, aortic arch, and the cerebellum) were identified by automated algorithms. The identified regions are stored as labeled volume datasets.

  • For each of the PET/CT scans the related data is stored in a separate folder containing: PT.vtk ... the SUVbw normalized PET scan CT.vtk ... the CT scan of the PET/CT scan PT_regions.nrrd ... labeled volume dataset with tracings of the tumors/lymph nodes in the PET scan CT_regions.nrrd ... only for patient244/scan3: labeled volume dataset (same size/resolution as the CT scan) with tracing of the tumors/lymph nodes in the CT scan cerebellum.nii.gz ... reference region identified for measuring uptake in the cerebellum aorta.nii.gz ... reference region identified for measuring uptake in the aortic arch liver.nii.gz ... reference region identified for measuring uptake in the liver

This dataset does NOT contain:

  • Quantitative indices derived from the data (e.g. CT segmentation based tumor volume, PET SUV average, average tracer uptake of node relative to average uptake in liver reference region, etc. )
  • RECIST based measurements, SUV_peak location, etc.
  • co-registered contrast-enhanced CT scan with segmentations

Quantitative indices

See [https://github.com/QIICR/ProjectIssuesAndWiki/wiki/Quantitative-Indices-Extension](Quantitative indices page)

Locations of the tumors

Specific locations for the 3 cases

  • patient 62: Tonsil
  • patient 71: Oropharynx
  • patient 244: Tonsil

List of possible primary sites

The codes for these sites have been defined in CID 7601 ftp://medical.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7601.html

  • Oral Cavity
  • Nasopharynx
  • Maxillary Sinus
  • Larynx
  • Lip
  • Supraglottis
  • Pharyngeal Tonsils
  • Hypopharynx
  • Glottis
  • Buccal Mucosa
  • Retromolar Trigone
  • Tonsil
  • Base of Tongue
  • Paranasal Sinus
  • Oral Tongue
  • Floor of Mouth
  • Oropharynx
  • Unknown Primary
  • Nasal Cavity
  • Paranasal Sinus
  • Pyriform Sinus
  • Uvula
  • Lower Alveolar Ridge
  • Salivary Gland

Notes from David

All suggestions implemented in https://github.com/QIICR/Iowa2DICOM as of March 2015.

For all objects generated: check for the consistency of the various dates/times for the new instances.

SEG object

SEG object - in attributes in top level data set

  • reader: Content Creator's Name (0070,0084)
  • time point: Clinical Trial Time Point ID (0012,0050) (LO) (NB. must be same for all instances in same STUDY, which is a reminder that Study Date/Time should be of that of images that are segmented and not that of when SEG was made)
  • session: Clinical Trial Series ID (0012,0071) (NB. Must be same for all instances in SERIES, which is a reminder that the Series Date/Time should be that of when the segmentation (series) was created)

The definition is "An identifier of the series in the context of a clinical trial", which is very non-specific, but since we are not using it for anything else ...

Algorithm encoding: per segment:

Question: how to encode anatomical location of the tumor?

Segment allows one to specify either anatomy in the property type when there is no need for any other property type (e.g., to say that the segment is "liver"), or if the property type needs to be something other than anatomy, then the Anatomic Region Sequence can be sent (e.g., to say that the segment is "primary tumor" or "tumor involvement of lymph node", and the anatomy is "base of tongue" or "parotid lymph node") (ftp://medical.nema.org/medical/dicom/current/output/chtml/part03/sect_C.8.20.4.html#table_C.8.20-4).

For primary tumor, send:

Segmented Property Category = (M-01000, SRT, "Morphologically Altered Structure")
Segmented Property Type = (M-80003, SRT, "Neoplasm, Primary") (from CID 7159 ftp://medical.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7159.html
Anatomic Region = (T-53131, SRT, "base of tongue")

For lymph node, send:

Segmented Property Category = (M-01000, SRT, "Morphologically Altered Structure")
Segmented Property Type = (M-80006, SRT, "Neoplasm, Secondary") (from CID 7159 ftp://medical.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7159.html
Anatomic Region = (T-C4140, SRT, "parotid lymph node")
>Anatomic Region Modifier = (G-A100, SRT, "Right")

Note that different level of detail can be afforded using ftp://medical.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7600.html. Specifically, the following are of interest, if the level of detail in the example above is not available:

  • SRT, T-C4000, lymph node
  • SRT, T-C4100, lymph node of head
  • SRT, T-C4004, lymph node of head and neck

Whereas, for background target for liver, cerebellum, etc., send:

Segmented Property Category = (T-D000A, SRT, "Anatomical Structure")
Segmented Property Type = (T-62000, SRT, "Liver") (from CID 4030 ftp://medical.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_4030.html or similar)

Alternatively, for background target for normalization of quantitative indices, one could use the same pattern as the primary and the nodes, with the anatomy separate, and a Category of (R-42018, SRT, "Spatial and Relational Concept") and Type of (C94970, NCIt, "Reference Region") (this is not in CID 7165 ftp://medical.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7165.html, so we should probably add it) but do NOT use (125040, DCM, "Background"), which means something else (ftp://medical.nema.org/medical/dicom/current/output/chtml/part16/chapter_D.html#DCM_125040)). E.g.,:

Segmented Property Category = (R-42018, SRT, "Spatial and Relational Concept")
Segmented Property Type = (C94970, NCIt, "Reference Region")
Anatomic Region = (T-62000, SRT, "Liver")

This is probably the only way to convey the notion that this is a reference region in the SEG object.

SR Object - in content tree rather than top level data set

  • reader: in content tree (minimal use of Observation Context)
  CONTAINER Imaging Measurement Report (TID 1500 row 1)
  > CODE Observer Type = Person (t 1500 r 3; t 1001 r 1; t 1002 r 1)
  > PNAME Person Observer Name = "" (t 1500 r 3; t 1001 r 1; t 1002 r 2; t 1003 r 1)
  • time point: in content tree is encoded with each measurement (not top level, since cross-time comparisons can be in same report)
  ... CONTAINER Measurement Group (TID 1410 or 1411 row 1)
  ... > TEXT Time Point = "" (t 1410/1411 r 4; t 1502 r 3)
  • session: no good standard place in templates (we should do a CP). Suggest using a private code, and do it at measurement rather than entire report level (so that in future can have cross-session comparisons in same report), e.g.:
  ... CONTAINER Measurement Group (TID 1410 or 1411 row 1)
  ... > TEXT (C67447, NCIt, "Activity Session") = "" (after t 1410/1411 r 4)

"NCIt" (yes, the t is lowercase) is the NCI Thesaurus coding scheme (this is the best match I could find for the concept in the UMLS, though it is not specific to a "reporting session"); we could define a more specific private code; the definition is "Time, period or term devoted to some activity"

Alternatively (or additionally) we could add it at the report rather than measurement level (i.e., factored out when common), e.g.:

  CONTAINER Imaging Measurement Report (TID 1500 row 1)
  > TEXT (C67447, NCIt, "Activity Session") = "" (after t 1500 r 3 perhaps)

Not sure if we should also send the same attributes in the SR that are used in the SEG too, which would be redundant with the content tree, but I definitely think they should be in the tree. The Content Creator's Name is not in the standard SR IOD, but it would be harmless to add it; I do not think we should bother with adding Verifying Observer Sequence, Author Observer Sequence, Participant Sequence, at this late stage.

Also, we need to decide what date to use for the "Study" that the SR applies too ... if each SR is about only one time point, then the SR too (like the SEG) can use the Study Date of the images. If it contained cross-time point stuff that would not be the case. <- indeed the current measurements we are encoding are for a single time point

As for anatomy in the SR, this is conveyed either with each measurement, or factored out into each measurement group, and encoded in (G-C0E3, SRT, "Finding Site") (+/- the (G-C171, SRT, "Laterality") modifier) (see TID 1419 ftp://medical.nema.org/medical/dicom/current/output/chtml/part16/chapter_A.html#sect_TID_1419).

To convey whether the measurement is of a primary or secondary tumor or a reference region, we could use a modifier of the NUM Measurement itself (TID 1419 row 6), though this is not ideal, since really we want to modify the lesion itself (i.e., the item of the Measurement Group that has a Tracking Identifier).

Another less palatable option would be to use the modifier of Finding Site, (G-A1F8, SRT, "Topographical modifier"), but this is intended to be used for things like proximal and distal, etc., i.e., to refine the description of the anatomy (not the intended use or other property).

This is a bit of a gap in our measurement SR templates, and we should probably add a CP to add some sort of "finding type" as a sibling of the Tracking Identifier; I suggest we reuse (121071, DCM, "Finding") for this.

So:

For primary tumor, send:

  ... CONTAINER Measurement Group (TID 1411 row 1)
  ... > TEXT Tracking Identifier (TID 1419 row 2)
  ...
  ... > CODE Finding = (M-80003, SRT, "Neoplasm, Primary")  (new; needs CP to add)
  ...
  ... > CODE Measurement Method (TID 1419 row 1)
  ... > CODE Finding Site = (T-53131, SRT, "base of tongue") (TID 1419 row 2)
  ... > NUM ...
  ...

For lymph node, send:

  ... CONTAINER Measurement Group (TID 1411 row 1)
  ... > TEXT Tracking Identifier (TID 1419 row 2)
  ...
  ... > CODE Finding = (M-80006, SRT, "Neoplasm, Secondary")  (new; needs CP to add)
  ...
  ... > CODE Measurement Method (TID 1419 row 1)
  ... > CODE Finding Site = (T-C4140, SRT, "parotid lymph node") (TID 1419 row 2)
  ... >> CODE Laterality = (G-A100, SRT, "Right")
  ... > NUM ...
  ...

For background target for liver, cerebellum, etc., send:

  ... CONTAINER Measurement Group (TID 1411 row 1)
  ... > TEXT Tracking Identifier (TID 1419 row 2)
  ...
  ... > CODE Finding = (C94970, NCIt, "Reference Region")  (new; needs CP to add)
  ...
  ... > CODE Measurement Method (TID 1419 row 1)
  ... > CODE Finding Site = (T-62000, SRT, "Liver")
  ... > NUM ...
  ...

UIDs

2.25-style UIDs can be easily generated with pydicom:

      >>> dicom.UID.generate_uid(None)
        2.25.31215762025423160614120088028604965760