Model Development Guide - Organ-Quality-Assessment/project-management GitHub Wiki

The purpose of this wiki page is to outline how the model development process works on this project.

Since the NHSBT images will be stored on Oracle cloud infrastructure and not downloadable to local storage, the model development has to be done on Oracle.

We have local copies of the NORIS images that we are currently using, but the model development should be done on Oracle anyway.

Model development approaches

The goal of the models is to predict the suitability of an organ for transplant based on an image of the organ. The organs we are currently working with are Livers and Kidneys.

The truth on suitability of an organ for transplant is obtained by getting medial professionals to score the images.

Once we have access to the NHSBT data, we will be able to do more powerful predictions based on patient outcomes.

The following diagram shows the broad ML modelling approaches we are taking to achieve this. image

Notebook sessions on Oracle

Jupyter notebook sessions have been created on Oracle for this purpose.

Data Science (compartment) > OrQA Models (project)

image

There will be one notebook created per user, which will be linked to that user's GitHub account.

This will ensure that:

  • People commit their code under their own GitHub account
  • People do not accidentally overwrite each other's work because they are in the same notebook session

GitHub Repository

The git repository used for model development is: https://github.com/Organ-Quality-Assessment/Automated-Organ-Assessment

The following branches have been created:

  • main - production
  • dev - latest working version
  • [various other branches] - for features/ bug fixes/ individuals work to be merged onto dev when completed

Any new branches you create should be created from the dev branch. Make sure you do git pull to ensure you are up-to-date before creating a branch.

The structure of the repository is:

Model_training_template.ipynb
models/
template.py

The Model_training_template.ipynb is a notebook that runs the model training and testing, including:

  • Getting images from Oracle storage
  • Getting scores from the database via strapi API
  • Splitting the data into a Training dataset and a Test dataset
  • Storing the model info & performance in a database

These components are expected to be the same for all models.

The models/template.py contains the model functions:

  • Image Processing
  • Feature Extraction
  • Model Training
  • Model Testing

These components are specific to the model being developed.

How to create a new model

Creating a new model is as simple as creating a new python file in the models/ folder.

Copy the template.py to ensure it has the four functions:

  • image_processing
  • feature_extraction
  • model_training
  • model_testing

These functions take inputs in specific formats (e.g. uses the PIL module for images) and require specific formats for the outputs (e.g. numpy ndarray for the table of features).

Then edit these functions to create your model, importing any modules you need.

GitHub Issues

Development work should always be associated with a GitHub issue so that all project work is tracked.

For example, you should create a new issue in the following cases:

  • You want to start developing a new model
  • There is a bug in your model that needs to be fixed
  • There is a bug in someone else's model that needs to be fixed
  • You have an idea on how to improve your model that you want to try out
  • You want to make changes to the model workflow
  • You want to change the way data is split into training/ test sets
  • You want to record extra information about model runs (e.g. performance criteria)

When you create a new issue, you should describe what the issue is and the work required. You can optionally assign the issue to someone. Add it to the OrQA Newcastle project to ensure that it gets picked up and tracked alongside all other project work.

You can see the issues associated with a repository by clicking the "Issues" tab on the repository page in GitHub.

image

Project

All issues important to the project can be viewed in the project view on GitHub. This view will be used to manage project work, so make sure your issues are on here!

The columns/ lanes represent the status of the issue:

  • Triage
  • Todo
  • Todo sometime
  • In progress
  • Waiting for feedback
  • Done

image