Figure 5 - RegnerM2015/scBreast_scRNA_scATAC_2024 GitHub Wiki
scRNA-seq processing
To analyze the scRNA-seq cells from all cell line samples (HCC1143, SUM149PT, MCF7, and T47D), we merged the Seurat objects from all four cell line samples into one multi-sample Seurat object. This multi-sample scRNA-seq dataset was then re-processed (normalization, feature selection, dimensionality reduction, etc.).
The Slurm and R scripts below were used to create this multi-sample scRNA-seq dataset comprised of HCC1143, SUM149PT, MCF7, and T47D cells:
- scripts/CellLine_Samples_scRNA-Merge_And_ReCluster.sh
- scripts/CellLine_Samples_scRNA-Merge_And_ReCluster.R
scATAC-seq processing and integration with scRNA-seq
To analyze the scATAC-seq cells from all cell line samples (HCC1143, SUM149PT, MCF7, and T47D), we screened for scATAC-seq cells derived from cell line samples (excluding patient samples). We then carried out dimensionality reduction and gene scoring before transferring cell line identity labels and gene expression profiles from the multi-sample scRNA-seq dataset using Seurat's CCA-based cross-modality integration approach. Note that we leveraged the ground truth identities of scATAC-seq cells to assess the performance of the integration procedure by calculating the percentage of scATAC-seq cells correctly assigned to their true cell line identity. After the integration procedure, we called peaks in the scATAC-seq data to identify possible regulatory elements located in regions of accessible chromatin.
The Slurm and R scripts below were used to screen for scATAC-seq cell line cells, followed by dimensionality reduction, gene scoring, integration with scRNA-seq, and peak calling:
- scripts/CellLine_Samples_scATAC-Subset_GeneScoring_DimReduc_TransferLabels_CallPeaks.sh
- scripts/CellLine_Samples_scATAC-Subset_GeneScoring_DimReduc_TransferLabels_CallPeaks.R
Peak-to-gene association analysis in Basal-like and Luminal subtype cell lines
Similar to the first phase of the differential peak-to-gene association analysis performed in the Basal-like and Luminal subtype patient analyses, we performed independent peak-to-gene association analyses in the Basal-like subtype (HCC1143, SUM149PT) and Luminal subtype cell lines (MCF7, T47D). Again, we constructed metacells for each cell line sample, and within each condition (Basal-like/Luminal), we fitted the following linear mixed-effects model for every peak located within 500 kb of each gene:
gene expression ~ peak accessibility + (1|cell line)
The Slurm and R scripts below were used to generate the metacells for each cell line sample and perform the peak-to-gene association analysis within each condition (Basal-like/Luminal):
- scripts/scLME_update-metacells-SingFits_OLS-CellLines.sh
- scripts/scLME_update-metacells-SingFits_OLS-CellLines.R
Figure 5 plotting
To visualize the scRNA-seq and scATAC-seq cells in the cell line analysis, we plotted the UMAP plots for each, color-coded by cell line (as performed in the Basal-like and Luminal subtype patient analyses).
We visualized the results from the peak-to-gene association analysis by generating proportion bar charts showing the genomic distribution and ENCODE annotation status for Basal-specific, Luminal-specific, and shared peak-to-gene associations (as performed in the Basal-like and Luminal subtype patient analyses).
Next, we performed a comparative analysis of the peak-to-gene associations identified in the patient analyses (in vivo) with the peak-to-gene associations identified in the cell line analysis (in vitro). More specifically, we identified putative enhancer-target gene pairs in each setting by screening for significant peak-to-gene associations with effect sizes greater than zero. We performed this analysis by subtype (Basal-like BC in vitro v. in vivo and Luminal BC in vitro v. in vivo) and visualized the overlaps of unique enhancer-regulated genes between in vitro and in vivo subtype-specific BC cells in the form of Venn diagrams.
Similar to the Basal-like and Luminal subtype patient analyses, we performed a gene set over-representation/enrichment analysis of the shared enhancer-regulated genes between in vitro and in vivo subtype-specific BC cells.
We also investigated the distributions of linked genes per enhancer, and vice versa, for in vitro and in vivo subtype-specific BC cells. We visualized these distributions in histograms and proportion bar charts showing the proportions of enhancers linked to three or more genes and the proportions of genes linked to three or more enhancers for in vitro and in vivo subtype-specific BC cells.
The Slurm and R scripts below were used to generate all of the visualizations described above: