007‐ Imputation & gene correlations - rezakj/iCellR GitHub Wiki

Run Data Imputation

Data imputation is the process of inferring and filling in missing values in your dataset. This process is often used when dealing with single-cell RNA-seq data, where dropout events (zero or missing expression values) are common. Data imputation is generally not recommended to avoid introducing noise into your analysis. However, when applied correctly, proper data imputation can enhance downstream analyses such as clustering, visualization, and differential gene expression by generating a more complete and coherent dataset.

impute based on PCA or UMAP or KNetL

my.obj <- run.impute(my.obj, dims = 1:10, nn = 10, data.type = "pca")

# or 

my.obj <- run.impute(my.obj, nn = 10, data.type = "knetl")

Gene-Gene Correlation

Gene-gene correlation refers to the relationship or association between the expression levels of two genes across cells or samples. It helps identify patterns of co-expression, which can provide insights into cell type identification, biological pathways, regulatory networks, or functional relationships.

Plots

# main data
A <- gg.cor(my.obj, 
	interactive = F, 
	gene1 = "GNLY",
	gene2 = "NKG7", 
	conds = NULL,
	clusts = NULL,
	data.type = "main")

# imputed data 
B <- gg.cor(my.obj, 
	interactive = F, 
	gene1 = "GNLY",
	gene2 = "NKG7", 
	conds = NULL,
	clusts = NULL,
	data.type = "imputed")

C <- gg.cor(my.obj, 
	interactive = F, 
	gene1 = "GNLY",
	gene2 = "NKG7", 
	conds = NULL,
	clusts = c(3,2),
	data.type = "imputed")


# imputed data 
D <- gg.cor(my.obj, 
	interactive = F, 
	gene1 = "GNLY",
	gene2 = "NKG7", 
	conds = c("WT"),
	clusts = NULL,
	data.type = "imputed")

grid.arrange(A,B,C,D)