Exploration of data-related issues arising from MONAI Label and Deploy
E.g., one question that has come up : annotation representation of videos
Access and storage of data (from Evaluation and Benchmarking workgroup)
Call for additional interested community members to Data WG
Plan monthly meetings
Minutes
New member welcome, review of Data Working Group charter
Data Working Group regular meetings scheduled for 2nd Tuesday of each month in 2022, at 8am PT starting August 9
No new Data WG tagged issues or discussions in Github
Tabular data
We have to strike a balance between researchers bringing CSVs in tabular form and structured data coming from devices / instruments / modalities
Review of the types of data beyond imaging data, how it is represented both in a research environment and ‘raw’ from the instrumenty
Aside from the data points, the points in time for the clinical event is notable
“Data Module”
What if we had a module that represents data, that can be represented on disk, and can have data “loaded” into it with APIs? Something akin to a database, with an API that permits loading data via a CSV, or DICOM, or FHIR, or OMOP?
Promotes reusability as the data consistency could be guaranteed (e.g., MD5 checksum)
Data module could be “frozen” with a ledger with blockchain
Current state of the art is that specific datasets themselves are being assigned DOIs (“digital object identifier”, similar to how DOIs representing a paper in a journal)
Have we seen DOI-driven datasets used in MONAI as of yet?
Community engagement
Discussion on engaging the community through a survey to ensure their needs are being met
Sample questions
Supported data formats in MONAI?
How well are data formats working in MONAI? What are the gaps?
What datasets are predominant and what file formats are they using?
Do we need additional file formats?
What would you like to see next?
Is documentation on formats adequate?
ACTION ITEM: (Brad) Start a Google Doc to gather survey questions; Brad to share the doc by end of week, and MONAI Data WG to add suggestions; review and approve for August 9
OMOP
Good structure, and makes data portable; however OMOP can be quite large with a heavy implementation
Are there samples available that could be used to connect with MONAI?
Should we reach out to the OMOP community and see if there’s interest in whether someone from that community would want to build a connector for MONAI?
Video annotation
Generally this is looking at one or many frames, whether specific targets are in the frame
There isn’t a “standard” interoperable representation geared toward medical videos
Annotations across a video could get to be quite large (e.g., a manual segmentation of a 60fps 60 minute video while a surgical tool is in frame could be 10,000s of frames; is compression going to be an issue?)
What’s the expected behavior if MONAI Label connects to a video feed?
How does time play a role – particularly, to synchronize disparate streams of data (e.g., measuring heart rate and O2 saturation on an instrument is one stream of data + the video stream of data; must be in sync even if streams start at different times)
Mixed annotations outside of time could also be a factor; e.g., taking into account a CT study and a video at the same time
Next call
(Carole) Lead discussion on “Access and storage of data” from Evaluation and Benchmarking Workgroup
(Brad) Present draft survey and get approval for its use from the committee
(Group) Review of Data WG tagged Issues and Discussion in GitHub