Minutes Data Working Group 30 Jul 2020 - Project-MONAI/MONAI GitHub Wiki
Agenda
Review MONAI Github wiki for minutes and notes
Review previous meeting minutes
Review discussions/presentations from joint working group and steering committee
Plan next steps
Minutes
Joint effort between Data and Evaluation, Reproducibility & Benchmarking workgroups is to model samples that come from challenges and papers
E.g., review the surgical data examples from papers referenced in previous minutes page
Data workgroup should create a prototype of structure and schema in FHIR
(Brad) Synthesize a FHIR resource based on the papers
MONAI should explore which Python FHIR library to explore, that can effectively convert "FHIR to Tensor"
(Brad) Look into Python libraries and share
MLFlow
Explore MLFlow as a potential model lifecycle management tool; details from website (from https://mlflow.org/) include:
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
MLflow Tracking: Record and query experiments: code, data, config, and results
MLflow Projects: Package data science code in a format to reproduce runs on any platform
MLflow Models: Deploy machine learning models in diverse serving environments
Model Registry: Store, annotate, discover, and manage models in a central repository
Question: How does MLFlow intersect with MONAI?
Question: Does MONAI plug into MLFlow and how does it integrate with the ecosystem?
What group should explore this? Might be something for the reproducability group
Integrations and Partners
What about H2O.ai? - H2O.AI has AutoML and reproducability tooling
Should there be a "partners" ad-hoc workgroup for MONAI to look at integrations within the broader community? - e.g., what about AWS?
Feedback from engineering
Dev team should look at slide 7 of the joint working group content
Should look at cross-validation; how to stratify the data, repeat the training workflow
Validation meaning giving the results of the training model
E.g., repeat validation 5 times - unbiased validation of model quality
Currently MONAI 0.2 only supports validation fraction of data - only parameter available for now - need to expand so users can generate a fixed set (not just random seed)
MSD - only used in the JSON file provided originally by the challenge provider
Look at proposal of a FHIR specification; e.g., need a converter to take MSD to a FHIR format (could be a utility library)
Can the underlying data representation be normalized? Need to do experiment
Evaluation: MSD is readonly - how do you filter studies based on predictive match
Search terms which would subset the data into a virtual collection (e.g., by patient age range)
What about holdout data, how to make this consistently done?
Action Items
(Brad) Synthesize a FHIR resource based on the papers
(Brad) Look into Python libraries and share
(Brad) Share Powerpoints and create Github issues to represent directions to grow