Figure 4 - RegnerM2015/scBreast_scRNA_scATAC_2024 GitHub Wiki

scRNA-seq processing

To analyze the Luminal BC cells from Patients 7-15 as well as the mature luminal cells from true normal Patients 1-4, we subsetted these cells from the multi-patient scRNA-seq dataset shown in main Figure 1. The resulting subset was re-processed (normalization, feature selection, dimensionality reduction, etc.) and clustered, as performed in the Basal-like subtype analysis.

The Slurm and R scripts below were used to perform this subsetting operation followed by re-processing and clustering:

scATAC-seq processing and integration with scRNA-seq

To analyze the Luminal BC cells from Patients 7-15 as well as the mature luminal cells from true normal Patients 1-4, we subsetted these cells from the multi-patient scATAC-seq dataset shown in main Figure 1. After subsetting, we performed dimensionality reduction to carry out the integration with the matching scRNA-seq dataset, as performed in the Basal-like subtype analysis. After the integration procedure, we called peaks in the scATAC-seq data as performed in the Basal-like subtype analysis.

The Slurm and R scripts below were used to perform subsetting, dimensionality reduction, integration with scRNA-seq, and peak calling:

Differential peak-to-gene association analysis

As performed in the Basal-like subtype analysis, we carried out the differential peak-to-gene association analysis between Luminal BC and normal mature luminal cells.

The Slurm and R scripts below were used to generate the metacells for each patient and perform both phases of the differential peak-to-gene association analysis:

Differential gene expression, differential peak accessibility, and Figure 4 plotting

Again, differential gene expression and peak accessibility testing were carried out on a pseudo-bulk scale (to overcome pseudoreplication bias) using the DESeq2 R package, as performed in the Basal-like subtype analysis.

To visualize the scRNA-seq and scATAC-seq cells in the Luminal subtype analysis, we plotted the UMAP plots for each, as performed in the Basal-like subtype analysis.

We visualized the results from the first phase of the differential peak-to-gene association analysis with proportion bar charts (shown in Supplement), as performed in the Basal-like subtype analysis. To visualize the results from the second phase of the differential peak-to-gene association analysis, we plotted the scatter plots of effect sizes for the significant differential peak-to-gene associations (shown in Supplement), as performed in the Basal-like subtype analysis. We generated the cancer-specific peak-to-gene association heatmap and performed the gene set over-representation/enrichment analysis, as performed in the Basal-like subtype analysis.

Again, we showed a specific example of a cancer-specific peak-to-gene association by visualizing the browser track of chromatin accessibility and generating the pseudobulk gene expression dot plots in scRNA-seq, as performed in Basal-like subtype analysis. As performed in the Basal-like subtype analysis, we visualized the metacell scatter plot of chromatin accessibility at the peak by the inferred level of gene expression.

The Slurm and R scripts below were used to perform the differential gene expression and peak accessibility testing as well as generate all of the visualizations described above.