Figure 1 - RegnerM2015/scENDO_scOVAR_2020 GitHub Wiki

scRNA-seq processing workflow

The starting input for this workflow is the filtered feature barcode matrix generated by cellranger for each patient tumor sample. The output is a Seurat object saved as an rds object for each patient tumor sample. For each patient tumor sample, we run essentially the same script that performs the following tasks:

/scRNA-seq Processing Scripts/Individual_Samples/Patient*[1-11]_scRNA-seq.R

  1. Preprocessing & QC
  2. Feature selection, dimension reduction and clustering
  3. inferCNV
  4. SingleR reference-based annotation of cell types
  5. Save Seurat object as rds object

After processing each patient tumor sample, we run the following script that combines all individual Seurat objects into one multi-sample Seurat object representing the full cohort of 11 patient samples. The starting input(s) for this script are the individual patient Seurat objects. The output is a fully processed multi-sample (full cohort) Seurat object. The following tasks are performed on this multi-sample Seurat object:

/scRNA-seq Processing Scripts/Full_Cohort/Patients1-11_scRNA-seq.R

  1. Re-normalize and re-scale
  2. Feature selection, dimension reduction and clustering
  3. Retain inferred CNVs from individual sample processing
  4. Assign cell type labels to clusters based on the majority label within each cluster
  5. Verify SingleR cell type labels with cell type gene signatures from PanglaoDB using Seurat's AddModuleScore()
  6. Differential expression analysis (Seurat's FindAllMarkers)
  7. Save Seurat object as rds object

scATAC-seq processing workflow

The R package ArchR was used extensively for the scATAC-seq analysis. The starting input for this script is the ATAC fragments file from each patient tumor sample generated by cellranger-atac. The output is a multi-sample ArchR Project (including cells from all 11 patients) saved as an rds object that contains 1) a 500 bp genomic tile matrix, 2) an estimated gene activity matrix, 3) an inferred gene expression matrix, and 4) a peak matrix.

/scATAC-seq Processing Scripts/Full_Cohort/Patients1-11_scATAC-seq.R

  1. Preprocessing & QC
  2. Feature selection, dimension reduction and label transferring using scRNA-seq cell type subcluster labels
  3. Plot intermediate UMAP plots
  4. Peak calling within each inferred cell type subcluster
  5. Plot intermediate heatmaps

Helpful graphic of scRNA-seq/scATAC-seq processing workflow:

Note that the last five steps are performed later on.

Figure 1 plotting

The starting inputs for this script are 1) the multi-sample (full cohort) Seurat object and 2) the multi-Sample ArchR project. The outputs are the UMAP plots and proportion bar charts presented in Figure 1.

/Figure_1/Figure_1.R

  1. Plot scRNA-seq/scATAC-seq UMAP plots colored by sample
  2. Plot scRNA-seq/scATAC-seq UMAP plots colored by cell type
  3. Plot proportion bar charts for scRNA-seq and scATAC-seq showing the contribution of each patient to each cell type subcluster

Interested in more exciting research in cancer genomics? Visit https://www.thefrancolab.org/ to learn more!