Figure 3 - RegnerM2015/scENDO_scOVAR_2020 GitHub Wiki
The analyses performed for Figure 3, showcasing the Endometrioid Endometrial Cancer (EEC) cohort of patients, closely follow the workflows presented in the previous wiki pages for Figure 1 and Figure 2.
scRNA-seq processing workflow
After processing each patient tumor sample, we run the following script that combines the Seurat objects for Patients 1-5 into one multi-sample Seurat object representing the EEC cohort of Patients 1-5. The starting input(s) for this script are the individual patient Seurat objects. The output is a fully processed multi-sample (EEC cohort) Seurat object. The following tasks are performed on this multi-sample Seurat object:
/scRNA-seq Processing Scripts/EEC_Cohort/Patients1-5_scRNA-seq.R
- Re-normalize and re-scale
- Feature selection, dimension reduction and clustering
- Retain inferred CNVs from individual sample processing
- Assign cell type labels to clusters based on the majority label within each cluster
- Verify SingleR cell type labels with cell type gene signatures from PanglaoDB using Seurat's AddModuleScore()
- Differential expression analysis (Seurat's
FindAllMarkers
) - Save Seurat object as rds object
scATAC-seq processing workflow
The R package ArchR was used extensively for the scATAC-seq analysis. The starting input for this script is the ATAC fragments file from each patient tumor sample generated by cellranger-atac. The output is a multi-sample ArchR Project (including cells from all 5 patients) saved as an rds object that contains 1) a 500 bp genomic tile matrix, 2) an estimated gene activity matrix, 3) an inferred gene expression matrix, and 4) a peak matrix.
/scATAC-seq Processing Scripts/EEC_Cohort/Patients1-5_scATAC-seq.R
- Preprocessing & QC
- Feature selection, dimension reduction and label transferring using scRNA-seq cell type subcluster labels
- Plot intermediate UMAP plots
- Peak calling within each inferred cell type subcluster
- Plot intermediate heatmaps
Helpful graphic of scRNA-seq/scATAC-seq processing workflow:
Note that the last five steps were not performed in the EEC cohort analysis.
Peak-to-gene correlation analysis with empirically-derived false discovery rate (eFDR) in the EEC cohort
The starting input for this workflow is the multi-sample ArchR Project (including cells from all 5 patients) saved as an rds object that contains 1) a 500 bp genomic tile matrix, 2) an estimated gene activity matrix, 3) an inferred gene expression matrix, and 4) a peak matrix. We generated this multi-sample ArchR Project using /scATAC-seq Processing Scripts/EEC_Cohort/Patients1-5_scATAC-seq.R presented earlier in this wiki page. The output of this script is a new ArchR project that contains all peak-to-gene associations including those that may be statistically insignificant along with a heatmap showing that distal peak accessibility is tightly linked to inferred gene expression. We also write out a peak-to-gene metadata table that lists the peak coordinates, peak type (distal, promoter, intronic,exonic), gene name, correlation value, p-value, variance measures, etc.
/PeaktoGeneLink_Analysis/EEC_Cohort/EEC_Cohort_P2G.R
- Run peak-to-gene correlation analysis under the permuted null condition 100 times record how many associations have a p-value <= alpha threshold
- Run peak-to-gene correlation analysis with the observed data and record how many associations have a p-value <= alpha threshold
- Screen for statistically significant distal peak-to-gene links <= alpha threshold with the estimated eFDR
- Plot distal peak-to-gene heatmap
Screening for cancer-specifc peak-to-gene links in the EEC cohort
The heatmap of distal peak-to-gene links allowed us to visualize which peak-to-gene links were enriched in the cancer cell populations. To determine which of the cancer-enriched distal peak-to-gene links are "cancer-specific," we carried an elaborate genomic coordinate overlap analysis against a series of normal reference epigenomic profiles. H3K27ac (putative enhancer mark) peaks measured from the normal ovarian surface epithelium and fallopian tube secretory epithelium were downloaded from GSE68104. We also overlapped the cancer-enriched peaks against all ENCODE regulatory elements (cCREs) which includes elements active in normal epithelial tissue. Cancer-enriched peaks that did not overlap with any of these profiles were deemed "cancer-specific" and their corresponding gene(s) established a set of putative cancer-specific regulatory relationships.
The script introduced above, /PeaktoGeneLink_Analysis/EEC_Cohort/EEC_Cohort_P2G.R, also performs this overlap analysis after computing the peak-to-gene correlations.
Figure 3 plotting
The starting inputs for this script are 1) the multi-sample (EEC cohort) Seurat object, 2) the multi-Sample ArchR project with EEC peak-to-gene associations, and 3) the EEC peak-to-gene link metadata. The outputs are the UMAP plots, proportion bar charts, expression boxplots for the gene of interest, and the ATAC browser track for the gene of interest presented in Figure 3.
/Figure_3/Figure_3.R
- Plot scRNA-seq/scATAC-seq UMAP plots colored by sample
- Plot scRNA-seq/scATAC-seq UMAP plots colored by cell type
- Plot proportion bar charts for scRNA-seq and scATAC-seq showing the contribution of each patient to each cell type subcluster
- Plot scATAC-seq browser track for gene of interest and corresponding scRNA-seq expression in box plot
Interested in more exciting research in cancer genomics? Visit https://www.thefrancolab.org/ to learn more!