004‐ Clustering - rezakj/iCellR GitHub Wiki

Clustering

We provide three functions to run the clustering method of your choice:

iclust (** recommended): This function is optimized for iCellR and supports PCA, UMAP, t-SNE, Destiny (diffusion map), PHATE, or KNetL maps as input. It utilizes the Louvain algorithm for clustering a graph constructed using k-Nearest Neighbor (KNN), similar to PhenoGraph (Levine et al., Cell, 2015). However, it employs distance values (Euclidean by default) as weights, instead of Jaccard similarity values.
run.phenograph: R implementation of the PhenoGraph algorithm. Rphenograph wrapper (Levine et al., Cell, 2015).
run.clustering: This function offers a wide range of options to explore your data using various clustering and indexing methods. You can select any combination from the table below to experiment with different approaches and "flavors" of analysis.

clustering methods	distance methods	indexing methods
ward.D, ward.D2, single, complete, average, mcquitty, median, centroid, kmeans	euclidean, maximum, manhattan, canberra, binary, minkowski or NULL	kl, ch, hartigan, ccc, scott, marriot, trcovw, tracew, friedman, rubin, cindex, db, silhouette, duda, pseudot2, beale, ratkowsky, ball, ptbiserial, gap, frey, mcclain, gamma, gplus, tau, dunn, hubert, sdindex, dindex, sdbw

Option 1: Clustering conventionally based on top pcs

Adjust sensitivity for more or less clusters.

Lower sensitivity numbers = more clusters.
Higher sensitivity numbers = less clusters (reverse logic).
100-150 generally works best for most data.

Using the top 10 PCs generally works best for most datasets. Use opt.pcs.plot(my.obj) to find the suggested optimal number of PCs to use. We recommend using 10.

my.obj <- iclust(my.obj, sensitivity = 150, data.type = "pca", dims=1:10)

Option 2: Clustering based on KNetL dimentions (or UMAP dimentions)

Conventionally, clustering is performed using PCA data (usually the first 10 dimensions). However, this function allows you to choose t-SNE, UMAP, or KNetL map dimensions as alternatives. If you have fine-tuned your KNetL map and are confident in its results, we recommend clustering based on the KNetL map.

Clustering can be one of the more challenging aspects of data analysis, and adjustments may be necessary based on marker genes. This might involve merging certain clusters, using gating tools (refer to our cell gating tools), or experimenting with different sensitivity values to identify a greater or smaller number of communities.

Notes:

Adjust sensitivity for more or less clusters.
Lower sensitivity numbers = more clusters.
Higher sensitivity numbers = less clusters (reverse logic).
100-150 generally works best for most data.

my.obj <- iclust(my.obj, sensitivity = 150, data.type = "knetl")

# data.type could be umap or tsne, etc.

Other examples for using iclust:

my.obj <- iclust(my.obj, sensitivity = 150, data.type = "umap")

# or 

my.obj <- iclust(my.obj, sensitivity = 150, data.type = "tsne")

or use run.phenograph instead of iclust

my.obj <- run.phenograph(my.obj, k = 100, data.type = "pca", dims=1:10)

Alternatively, use the run.clustering function to pick and customize your adventure.

 my.obj <- run.clustering(my.obj, 
	clust.method = "kmeans", 
	dist.method = "euclidean",
	index.method = "silhouette",
	max.clust = 25,
	min.clust = 2,
	dims = 1:10)

# If you want to manually set the number of clusters, and not used the predicted optimal number, set the minimum and maximum to the number you want:
#my.obj <- run.clustering(my.obj, 
#	clust.method = "ward.D",
#	dist.method = "euclidean",
#	index.method = "ccc",
#	max.clust = 8,
#	min.clust = 8,
#	dims = 1:10)

# more examples 

#my.obj <- run.clustering(my.obj, 
#	clust.method = "ward.D", 
#	dist.method = "euclidean",
#	index.method = "kl",
#	max.clust = 25,
#	min.clust = 2,
#	dims = 1:10)

Visualize data after clustering results

# plot clusters (in the figures below clustering is done based on KNetL) 
# example: # my.obj <- iclust(my.obj, k = 150, data.type = "knetl") 

A <- cluster.plot(my.obj,plot.type = "pca",interactive = F,cell.size = 0.5,cell.transparency = 1, anno.clust=T)
B <- cluster.plot(my.obj,plot.type = "umap",interactive = F,cell.size = 0.5,cell.transparency = 1,anno.clust=T)
C <- cluster.plot(my.obj,plot.type = "tsne",interactive = F,cell.size = 0.5,cell.transparency = 1,anno.clust=T)
D <- cluster.plot(my.obj,plot.type = "knetl",interactive = F,cell.size = 0.5,cell.transparency = 1,anno.clust=T)

library(gridExtra)
grid.arrange(A,B,C,D)

Re-numbering clusters based on their distances (optional):

This step rearranges clusters so that they appear in a more consecutive order based on gene expression similarities.

This re-ordering can be visually beneficial when analyzing your heatmap after identifying marker genes. Similar cell communities will appear next to each other, making it easier to visually examine and compare them. Additionally, it can help in deciding which clusters may need merging or adjustment.

my.obj <- clust.ord(my.obj,top.rank = 500, how.to.order = "distance")
#my.obj <- clust.ord(my.obj,top.rank = 500, how.to.order = "random")

Re-plot

A= cluster.plot(my.obj,plot.type = "pca",interactive = F,cell.size = 0.5,cell.transparency = 1, anno.clust=T)
B= cluster.plot(my.obj,plot.type = "umap",interactive = F,cell.size = 0.5,cell.transparency = 1,anno.clust=T)
C= cluster.plot(my.obj,plot.type = "tsne",interactive = F,cell.size = 0.5,cell.transparency = 1,anno.clust=T)
D= cluster.plot(my.obj,plot.type = "knetl",interactive = F,cell.size = 0.5,cell.transparency = 1,anno.clust=T)

library(gridExtra)
grid.arrange(A,B,C,D)

Cluster QC

clust.stats.plot(my.obj, plot.type = "box.mito", interactive = F)

clust.stats.plot(my.obj, plot.type = "box.gene", interactive = F)