Minutes Data Working Group 30 Jul 2020 - Project-MONAI/MONAI GitHub Wiki

Agenda

Review MONAI Github wiki for minutes and notes
Review previous meeting minutes
Review discussions/presentations from joint working group and steering committee
Plan next steps

Minutes

Joint effort between Data and Evaluation, Reproducibility & Benchmarking workgroups is to model samples that come from challenges and papers
- E.g., review the surgical data examples from papers referenced in previous minutes page
- Data workgroup should create a prototype of structure and schema in FHIR
  - (Brad) Synthesize a FHIR resource based on the papers
- MONAI should explore which Python FHIR library to explore, that can effectively convert "FHIR to Tensor"
  - (Brad) Look into Python libraries and share
MLFlow
- Explore MLFlow as a potential model lifecycle management tool; details from website (from https://mlflow.org/) include:
  - MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
  - MLflow Tracking: Record and query experiments: code, data, config, and results
  - MLflow Projects: Package data science code in a format to reproduce runs on any platform
  - MLflow Models: Deploy machine learning models in diverse serving environments
  - Model Registry: Store, annotate, discover, and manage models in a central repository
- Question: How does MLFlow intersect with MONAI?
- Question: Does MONAI plug into MLFlow and how does it integrate with the ecosystem?
- What group should explore this? Might be something for the reproducability group
Integrations and Partners
- What about H2O.ai? - H2O.AI has AutoML and reproducability tooling
- Should there be a "partners" ad-hoc workgroup for MONAI to look at integrations within the broader community? - e.g., what about AWS?
Feedback from engineering
- Dev team should look at slide 7 of the joint working group content
- Should look at cross-validation; how to stratify the data, repeat the training workflow
  - Validation meaning giving the results of the training model
  - E.g., repeat validation 5 times - unbiased validation of model quality
- Currently MONAI 0.2 only supports validation fraction of data - only parameter available for now - need to expand so users can generate a fixed set (not just random seed)
- MSD - only used in the JSON file provided originally by the challenge provider
  - Look at proposal of a FHIR specification; e.g., need a converter to take MSD to a FHIR format (could be a utility library)
  - Can the underlying data representation be normalized? Need to do experiment
- Evaluation: MSD is readonly - how do you filter studies based on predictive match
  - Search terms which would subset the data into a virtual collection (e.g., by patient age range)
  - What about holdout data, how to make this consistently done?

Action Items

(Brad) Synthesize a FHIR resource based on the papers
(Brad) Look into Python libraries and share
(Brad) Share Powerpoints and create Github issues to represent directions to grow