Data Analysis with Python - jeanollion/bacmman GitHub Wiki
This page illustrates how to interact between python (statistical analysis) and BACMMAN (image analysis). In particular it is very useful to generate subsets of objects of interest from python and visualize them in BACMMAN.
Selections are subsets of segmented objects. They can be created either in BACMMAN or in python. See this page for more details on how to use selection in BACMMAN.
This tutorial illustrates how to:
- Read measurements exported from BACMMAN
- Import selections created in BACMMAN
- Create selections in python and export them to BACMMAN:
Version >=0.6.1 of PyBacmman is required (pip install --upgrade PyBacmman
)
pip install PyBacmman matplotlib
import pandas as pd
import os
from pybacmman.dataset import Dataset
- a Dataset object is created by giving the path to the dataset folder that contains the configuration file
- if this folder contains a configuration file, a valid dataset object will be created, and the object class names will be imported
folder_path = "/storage/Images/Workshop" # change this path so that it points to the folder containing experiment folders
dsName = "MotherMachine"
ds = Dataset(os.path.join(folder_path, dsName))
print(ds)
MotherMachine oc=['Microchannels', 'Bacteria'] path=/storage/Images/Workshop/MotherMachine
- object class names can be modified
ds.set_object_class_name("Microchannels", "MC")
ds.set_object_class_name(-1, "Bact")
print(ds)
MotherMachine oc=['MC', 'Bact'] path=/storage/Images/Workshop/MotherMachine
- The next command will read measurements exported from bacmman
-
1
is for the object class #1 = bacteria. alternatively object class name can also be used
data = ds.get_data(1)
print(data)
Position | PositionIdx | Indices | Frame | Idx | Time | BacteriaLineage | NextDivisionFrame | PreviousDivisionFrame | SizeRatio | ... | TrackErrorPrev | BacteriaCenterX | BacteriaCenterY | BacteriaCenterZ | GrowthRateArea | SizeAtBirthArea | Size | GrowthRateLength | SizeAtBirthLength | Length | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | dataset1_0-50 | 0 | 0-0-0 | 0 | 0 | 0.0 | A | 5.0 | NaN | NaN | ... | False | 0.7500 | 1.7311 | 0.0 | 0.031858 | 2.2556 | 2.2746 | 0.030178 | 2.8169 | 2.8022 |
1 | dataset1_0-50 | 0 | 0-0-1 | 0 | 1 | 0.0 | B | 4.0 | NaN | NaN | ... | False | 0.6717 | 4.4859 | 0.0 | 0.037086 | 2.3185 | 2.3102 | 0.029628 | 2.8055 | 2.7930 |
2 | dataset1_0-50 | 0 | 0-0-2 | 0 | 2 | 0.0 | C | NaN | NaN | NaN | ... | False | 0.7341 | 7.5325 | 0.0 | -0.021801 | 3.1801 | 2.1916 | -0.014285 | 3.4087 | 2.4818 |
3 | dataset1_0-50 | 0 | 0-0-3 | 0 | 3 | 0.0 | D | NaN | NaN | NaN | ... | False | 0.7297 | 10.0219 | 0.0 | -0.032620 | 2.7481 | 2.1441 | -0.029575 | 3.1067 | 2.5174 |
4 | dataset1_0-50 | 0 | 0-0-4 | 0 | 4 | 0.0 | E | NaN | NaN | NaN | ... | False | 0.6734 | 13.0442 | 0.0 | -0.095301 | 2.5061 | 2.1283 | -0.079399 | 3.0058 | 2.6062 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2782 | dataset1_0-50 | 0 | 49-14-1 | 49 | 1 | 196.0 | AHHHHHHHHT | NaN | 48.0 | 1.1099 | ... | False | 0.6600 | 3.8810 | 0.0 | NaN | NaN | 2.3973 | NaN | NaN | 2.4197 |
2783 | dataset1_0-50 | 0 | 49-14-2 | 49 | 2 | 196.0 | AHHHHHHHTH | NaN | 47.0 | 1.1534 | ... | False | 0.6646 | 6.7062 | 0.0 | 0.030377 | 2.2704 | 2.9155 | 0.030341 | 2.4078 | 3.0979 |
2784 | dataset1_0-50 | 0 | 49-14-3 | 49 | 3 | 196.0 | AHHHHHHHTT | NaN | 47.0 | 1.1162 | ... | False | 0.7094 | 9.4957 | 0.0 | 0.027377 | 1.9223 | 2.3933 | 0.024445 | 2.1540 | 2.6160 |
2785 | dataset1_0-50 | 0 | 49-14-4 | 49 | 4 | 196.0 | AHHHHHHTHH | NaN | 47.0 | 1.1523 | ... | False | 0.6683 | 12.4986 | 0.0 | 0.036595 | 2.5278 | 3.3823 | 0.028740 | 2.6439 | 3.2923 |
2786 | dataset1_0-50 | 0 | 49-14-5 | 49 | 5 | 196.0 | AHHHHHHTHT | NaN | 47.0 | NaN | ... | False | 0.6641 | 15.4414 | 0.0 | -0.037930 | 2.8649 | 1.9305 | -0.037824 | 2.8911 | 1.9508 |
2787 rows × 28 columns
ax = data.GrowthRateArea.hist(bins=100, range=(-0.01, 0.05))
ax.set_xlabel("Growth Rate")
ax.set_title("Bacteria MutH Growth Rate")
- The next command will read selection exported from bacmman
- To export selection from BACMMAN select selections from the data browsing tab and run the menu command Run > Extract Selected Selections
selection = ds.get_selections()
print(selection)
Position | PositionIdx | ObjectClassIdx | Indices | Frame | SelectionName | |
---|---|---|---|---|---|---|
0 | dataset1_0-50 | 0 | 1 | 0-0-3 | 0 | bact |
1 | dataset1_0-50 | 0 | 1 | 0-0-2 | 0 | bact |
2 | dataset1_0-50 | 0 | 1 | 0-0-5 | 0 | bact |
3 | dataset1_0-50 | 0 | 1 | 0-0-4 | 0 | bact |
4 | dataset1_0-50 | 0 | 1 | 4-0-1 | 4 | bact |
... | ... | ... | ... | ... | ... | ... |
23 | dataset1_0-50 | 0 | 1 | 5-0-2 | 5 | bact |
24 | dataset1_0-50 | 0 | 1 | 1-0-3 | 1 | bact |
25 | dataset1_0-50 | 0 | 1 | 5-0-3 | 5 | bact |
26 | dataset1_0-50 | 0 | 1 | 1-0-2 | 1 | bact |
27 | dataset1_0-50 | 0 | 1 | 1-0-1 | 1 | bact |
A selection can be used to subset a dataframe:
- either through the function pandas.subset_by_DataFrame use both Position and Indices columns to identify an object (in case several dataset are involved --see DatasetList section-- use also DatasetName).
- or directly by indicating its name in dataset.get_data
from pybacmman.pandas import subset_by_DataFrame
subset = subset_by_DataFrame(data, selection, on=["Position", "Indices"])
print(f"number of rows in data: {data.shape[0]}, selection: {selection.shape[0]} subset: {subset.shape[0]}")
number of rows in data: 2787, selection: 28 subset: 28
subset2 = ds.get_data(1, "long cells")
print(f"number of rows in data: {data.shape[0]}, selection: {selection.shape[0]} subset: {subset2.shape[0]}")
number of rows in data: 2787, selection: 28 subset: 28
- The next command will create a subset of data containing only long cells, and save it as a selection directly into BACMMAN.
- BACMMAN should be open before executing this command.
- BACMMAN includes a python gateway that is able to listen to queries from python. In the menu Misc> Python Gateway you can set the adress, port and python port and they must match with those set as argument of the store_selection function. If an error occurs, it can be a problem of communication between java and python
- In case BACMMAN is not open or a communication error occurs, at text file will be saved and BACMMAN will read it next time it will be open.
data_subset = data[data.Length>7]
print(f"number of objects: {data_subset.shape[0]}")
ds.store_selection(data_subset, 1, "long cells", address='127.0.0.1', port=25333, python_proxy_port=25334)
number of objects: 4
- DatasetList objects are designed to manipulate several datasets
- They can be created either by:
- passing a list of Dataset objects
- or by passing a folder path that contains several dataset folders.
- In the latter case a filter can be passed in order to select only a subset of existing datasets. Filter can be either a part of the dataset name or a function on the names
- Retained object classes are the common object classes between all datasets
- get_data will concatenate data from each dataset in the DatasetList object
- save_selection will save the corresponding object from each dataset
from pybacmman.dataset import DatasetList
name_filter = lambda f:"JL" in f or "FL17" in f
dsl = DatasetList(path = folder_path, filter=name_filter)
print(dsl)