Custom Analysis Part 5 - veeninglab/BactMAP GitHub Wiki
Cluster Analysis: Preparation
In the kymograph above, there is not a very clear moment in time where
DnaX dissociates and associates. This is rather unsurprising, since this
is an average of xx cells. I wanted to know if I could find some groups
of general behavior within these cells using cluster analysis methods.
I didn’t find it easy to find suitable cluster analysis methods for
time-based data. Finally, I decided to reduce the dimensions of my data
(keeping the time dimension) and use the R package
TSClust
,
which is dedicated to time-series clustering.
In the following sections, I made a few small scripts to get the data in the shape I want. This might be a little overwhelming. I am showing this mostly to give an idea of what you could do after you used the standard BactMAP commands. Please note that all this code is certainly not necessary to understand BactMAP so you can skip through it.
Reduce the dimensions of the data
To detect different groups of dnaX localization, I group the cells by mean cell fluorescence over time. I think this is possible because I’m mostly interested in when DnaX associates and dissociates: there I would expect a drop in fluorescence. Of course, it might be possible that the median cell fluorescence or the variance in fluorescence in the cell are better indicators.
#add percentage of division to cells_image
cells_m <- perc_Division(cells_image$rawdata_turned)$timelapse
#remove background fluorescence: background here is taken as the mean
cells_m$values <- cells_m$values - mean(cells_m$values)
#get mean fluorescence
cells_m <- unique(cells_m[,c("cell", "frame", "values", "percentage", "percentage_binned", "division")])
cells_mean <- aggregate(cells_m$values,
by=list("cell"=cells_m$cell,
"frame"=cells_m$frame,
"percentage"=cells_m$percentage,
"percentage_binned"=cells_m$percentage_binned,
"division" = cells_m$division),
FUN=mean)
#unique identifier for each cell per division cycle
cells_mean$celldiv <- paste(cells_mean$cell, cells_mean$division, sep="_")
Amplification
Now I have a small dataframe cells_mean
with a mean fluorescence
intensity x
for each cell per image frame. The next step is to
equalize the division time to percentages with the same interval. I do
this using the percentage of division, but I don’t want to loose too
much resolution by binning it to only 10 bins. Therefore I amplify my
datapoints and bin it in 100 groups afterwards. Then I pre-shape the
dataframe in long format, which is the necessary format for clustering
by TSClust.
#order cells by cell number and frame
cells_mean <- cells_mean[order(cells_mean$cell, cells_mean$frame),]
#get values per cell/division identifier in correct order
list_values <- lapply(unique(cells_mean$celldiv), function(y) cells_mean$x[cells_mean$celldiv==y])
#amplify these lists
list_values <- lapply(list_values, function(y) rep(y, each=10))
#identify each list of values
names(list_values) <- unique(cells_mean$celldiv)
##binning
list_bins <- lapply(c(1:length(list_values)),
function(y) data.frame( "percentage"=cut(c(1:length(list_values[y](/veeninglab/BactMAP/wiki/y))), breaks=100, labels=c(1:100)),
"values"=list_values[y](/veeninglab/BactMAP/wiki/y),
"celldiv"=names(list_values)[y](/veeninglab/BactMAP/wiki/y))
)
#combine bins to single points (by mean())
list_bins <- lapply(list_bins, function(y) aggregate(y$values,
by=list("percentage"=y$percentage,
"celldiv"=y$celldiv),
FUN=mean)
)
#turn list into dataframe
dataframe_bins <- do.call('rbind', list_bins)
colnames(dataframe_bins)[3] <- "values"
#make into long format (needed by TSclust)
dataframe_bins <- tidyr::spread(dataframe_bins, key=percentage, value=values)
#rownames into celldiv (needed by TSclust)
rownames(dataframe_bins) <- dataframe_bins$celldiv
#remove celldiv column
dataframe_bins$celldiv <- NULL
:arrow_left: Custom Analysis Part 4: Average Localization | Custom Analysis Part 6: Cluster Analysis :arrow_right: |
---|