Skip to content

Step by step "How to" guide

Romain F. Laine edited this page Apr 1, 2020 · 1 revision

WARNING!! This page is under construction. WARNING !!!

This page describes the workflow of a typical notebook run-through in a step-by-step fashion. You can also refer to this video for a complete run-through.

But here are the different steps:

Preparing your data

  1. Decide what task you want to perform using ZeroCostDL4Mic: we currently provide published networks that can perform image segmentation, denoising, restoration and artificial labelling. Please refer to our main Wiki page for info on the currently implemented notebooks. You should also read our bioRxiv preprint on the general framework to understand whether that's the right tool for you.

  2. Now, you need some training dataset. You can either test the networks on the example training dataset that we provide or generate your own training dataset to be used on your own data. The different pages of our Wiki describe how to acquire the data you need in order to train the different networks.

If you decide to simply test the networks on our example training dataset, you can simply follow the different Zenodo links that we provide on our main Wiki page. You can download the whole dataset by clocking on the Download button on the Zenodo pages. (For U-net we do not provide dataset ourselves but point towards an available dataset originally generated for the ISBI 2012 Segmentation challenge, see the U-net page for details).

  1. In order for Google Colab to have access to these data, they need to be uploaded to your Google Drive. you can do that simply by logging into your Google Drive account and clicking on and then use the Folder upload option. Please ensure that ALL the data is uploaded properly before proceeding further.

Getting started with the notebook and initialising the parameters

  1. You can directly open the Google Colab notebook from our [main Wiki page], by clicking on the corresponding Open In Colab badge . This will automatically open the notebook on Google Colab. Here, we've chosen to use the Dark mode of the notebook (white text on black background) for improved visibility.

On the left-hand side you will find a Table of content that can be navigated through by clocking on the different items. It is important that you read the 0. Before getting started section.

At that stage, you will not be bale to make changes to the notebook unless you save a copy of the notebook. This can be easily done by going into File and Save a copy in Drive.... If you are signed into a Google Drive account, this will automatically save a copy of the notebook in your Google Drive in the Colab Notebooks folder.

  1. Now you're in the notebook, the first step is to check whether Google Colab will currently allow GPU access and, if so, what type. You can do this by running the 1.1 Change the run-time section. The GPU type is described on the last line of the Cell output. Here, we have been allocated Tesla P100-PCIE-16GB.

  1. Then, you need to give Google Colab access to the data that you uploaded into your Google Drive. This step is called Mounting the drive and is executed in the 1.2 Mount your Google Drive section. You will need to be logged in and provide an access code accessible via the provided link in the notebook.

Once mounted, the drive and all it content is accessible via the dedicated tab on the left of the page.

  1. Then, we're ready to install the network and the required dependencies on the virtual machine made available to us by Google Colab. This is done by running the cell corresponding to the section 2 of the notebook.

After running section 2, if you already have a trained model and only wish to perform prediction using this trained model on new data, you can then jump to the 5. Use the network section.

  1. Section 3 then provides very important information and requires important user input to prepare for the network training. The first part of section 3 highlights and describes the different parameters that the user will need to provide in order perform the training. This will include going the location (path, on the Google Drive) of the different parts of the trining dataset (source and target for fully supervised networks such as U-net, Stardist, CARE and Label-free prediction, or only the source training dataset for unsupervised networks like Noise2Void). The other parts of the user input will be the training parameters, e.g. number of epochs, number of steps, batch size. If you are unsure about the meaning of these parameters, you can also refer to our Glossary page. Some parameters are currently considered as advanced and can simply be left as their default values if the user wants to get started with a simple training session.

A typical section 3 user input will look like the figure below. Do not forget to run this cell for the notebook to take your input into account.

  1. The following section will initialise the network and will perform the training. You can also visualise a randomly chosen dataset from your training dataset for inspection that the data was uploaded and mounted properly.

Training the model and performing quality control

  1. Here we go! Now, you can start the training by running the Train the network section. In some cases, the network may throw some minor warnings about TensorFlow versions, this is not an issue and this can be ignored. We have enforced specific versions of TensorFlow in order to improve reliability of the notebooks.

This step can take a few minutes to a few hours depending on the network, training parameters and training dataset size. So be patient! The time taken by each epoch is sometimes displayed as an output of the network training. This can be used to estimate how long the training will take ((here shown as ~2s / epochs and 400 epochs, it should take about 15min, so yes, you have time for a coffee). The time taken for the training will be indicated once it's complete.

  1. After training is complete, and if you enabled the Visual assessment option in Section 3, you will be presented with a prediction side-by-side with the target image for visual comparison. This particular image was set aside before starting the training and was therefore never seen by the network previously, this is a good way to see how the network behaves. More quantitative assessment of the trained model can be performed later on if the user has made some quality control data available.

  1. Additionally, Stardist has a step of network optimisation. Not all networks have this step, so please follow the notebook and you'll be alright.

  1. At that stage, the trained model will have been saved automatically in the mounted Google Drive in the model folder chosen in Section 3. You can download the trained model from your Google Drive for safekeeping or for sharing with another user (provided the limitations of using pre-trained model highlighted in our bioRxiv paper).

  1. Section 5 is very important! It allows you to perform some important quality control on the trained network. This can be evaluated using 2 complementary approaches: (1) Observe the evolution of the loss function as a function of training time, and (2) Mapping errors on Quality control dataset (also often called "Test dataset" in the machine learning field), where the known ground truth can be compared to the prediction from these trusted dataset. These dataset should not be included in the training dataset during training, otherwise the network performance will be over-estimated.

Below are examples of loss function curves over training time, both from the training and validation dataset. If this doesn't make sense, please see our Glossary page and also this review, which explains how to interpret the curves very well.

The second stage of quality control is to see whether the network is able to generalise to unseen dataset. This is performed here by the user providing a set of data with the known ground truth output. We use both Squared Error maps and SSIM maps and metrics in order to visually and qualitatively assess whether the model can provide accurate output from unseen data.

More info about how to interpret the Square Error and the SSIM maps, as well as the metrics.

PLACEHOLDER FOR SE AND SSIM MAPS

Using the trained model to obtain predictions on new data

  1. The last step, available in Section 6, is what you have been waiting for the whole time: It is the generation of predictions from unseen data using the trained model. This can be performed by giving the path to the Google Drive directory containing the new unseen data that you wish to use.

The notebook will show you an example of prediction output. Here, in this example we trained a Stardist model and the notebook shows an overlay of the original input image with the mask image obtained form the prediction.

  1. The predictions obtained form the unseen data is now available in the Results folder as chosen earlier and can therefore be downloaded from your Google Drive for further analysis.

Final notes

Many thanks for trying out ZeroCostDL4Mic! Whether you find it useful, intuitive or difficult and buggy, we want to hear from you and always welcome constructive feedbacks. Feel free to report issues on this page or drop us an email or simply Tweet your results using #ZeroCostDL4Mic.