Discuss desired data structure pipeline from MONAI 0.2 onward
Check-in on Data Working Group for additional scope for definition
Minutes
Brad to prepare content for next week’s MONAI board meeting on activities / motion to adopt
Update from joint discussion with Evaluation, Reproducibility & Benchmarking workgroup
Effort to get challenges data structured properly for sharing
Data work group looks at the “what” in this data getting in
Difficult to isolate pure data properties related to the experiments
E.g., although benchmarking is looking at this problem, with cross validation, some data fields like “type of scanner” might not be in the scope of the workgroup
Can DICOM help?
Potentially - there are fields that could be populated - but this isn’t a solution for all kinds of data - what about when we leave DICOM?
E.g., what about when it is NIFTI - no metadata is included, and there’s only a free-text header field limited to 80 characters.
Compilation task should give researchers the ability to extract painlessly
Reproducible IO
I/O Working Group looking at kicking out a file to repeat a training session
Look at MLFlow (https://mlflow.org/) - they capture entire environment and flows down to network architecture
Some networks may not need to be reproducible
Also consider factors that affect reproducibility like hardware and drivers
Compilation pipeline - is it possible to detect when data was transformed "different" than how someone else transformed the data
E.g., a warning on "damaging the data"
MSD in MONAI 0.2
Random seed feature - training data is separated to validation / cross validation
Implemented "import MSD dataset" - this pulls the data and parses the JSON