Custom Analysis Part 5 - veeninglab/BactMAP GitHub Wiki

Cluster Analysis: Preparation

In the kymograph above, there is not a very clear moment in time where DnaX dissociates and associates. This is rather unsurprising, since this is an average of xx cells. I wanted to know if I could find some groups of general behavior within these cells using cluster analysis methods. I didn’t find it easy to find suitable cluster analysis methods for time-based data. Finally, I decided to reduce the dimensions of my data (keeping the time dimension) and use the R package TSClust, which is dedicated to time-series clustering.

In the following sections, I made a few small scripts to get the data in the shape I want. This might be a little overwhelming. I am showing this mostly to give an idea of what you could do after you used the standard BactMAP commands. Please note that all this code is certainly not necessary to understand BactMAP so you can skip through it.

Reduce the dimensions of the data

To detect different groups of dnaX localization, I group the cells by mean cell fluorescence over time. I think this is possible because I’m mostly interested in when DnaX associates and dissociates: there I would expect a drop in fluorescence. Of course, it might be possible that the median cell fluorescence or the variance in fluorescence in the cell are better indicators.

#add percentage of division to cells_image
cells_m <- perc_Division(cells_image$rawdata_turned)$timelapse

#remove background fluorescence: background here is taken as the mean
cells_m$values <- cells_m$values - mean(cells_m$values)

#get mean fluorescence
cells_m <- unique(cells_m[,c("cell", "frame", "values", "percentage", "percentage_binned", "division")])
cells_mean <- aggregate(cells_m$values,
                        by=list("cell"=cells_m$cell,
                                "frame"=cells_m$frame,
                                "percentage"=cells_m$percentage,
                                "percentage_binned"=cells_m$percentage_binned,
                                "division" = cells_m$division),
                        FUN=mean)

#unique identifier for each cell per division cycle
cells_mean$celldiv <- paste(cells_mean$cell, cells_mean$division, sep="_")

Amplification

Now I have a small dataframe cells_mean with a mean fluorescence intensity x for each cell per image frame. The next step is to equalize the division time to percentages with the same interval. I do this using the percentage of division, but I don’t want to loose too much resolution by binning it to only 10 bins. Therefore I amplify my datapoints and bin it in 100 groups afterwards. Then I pre-shape the dataframe in long format, which is the necessary format for clustering by TSClust.

#order cells by cell number and frame
cells_mean <- cells_mean[order(cells_mean$cell, cells_mean$frame),]

#get values per cell/division identifier in correct order
list_values <- lapply(unique(cells_mean$celldiv), function(y) cells_mean$x[cells_mean$celldiv==y])

#amplify these lists
list_values <- lapply(list_values, function(y) rep(y, each=10))
#identify each list of values
names(list_values) <- unique(cells_mean$celldiv)

##binning
list_bins <- lapply(c(1:length(list_values)),
                    function(y) data.frame( "percentage"=cut(c(1:length(list_values[y](/veeninglab/BactMAP/wiki/y))), breaks=100, labels=c(1:100)),
                                            "values"=list_values[y](/veeninglab/BactMAP/wiki/y),
                                            "celldiv"=names(list_values)[y](/veeninglab/BactMAP/wiki/y))
                    )

#combine bins to single points (by mean())
list_bins <- lapply(list_bins, function(y) aggregate(y$values,
                                                     by=list("percentage"=y$percentage,
                                                             "celldiv"=y$celldiv),
                                                     FUN=mean)
                    )

#turn list into dataframe
dataframe_bins <- do.call('rbind', list_bins)
colnames(dataframe_bins)[3] <- "values"

#make into long format (needed by TSclust)
dataframe_bins <- tidyr::spread(dataframe_bins, key=percentage, value=values)
#rownames into celldiv (needed by TSclust)
rownames(dataframe_bins) <- dataframe_bins$celldiv
#remove celldiv column
dataframe_bins$celldiv <- NULL

:arrow_left: Custom Analysis Part 4: Average Localization	Custom Analysis Part 6: Cluster Analysis :arrow_right: