quality control - WXlab-NJMU/scrna-recom GitHub Wiki
Quality Control
Tools: Seurat、DoubletFinder、SoupX
Quality control using Seurat
Thresolds:
- barcodes: max.genes, min.genes, max.mt, max.hb
- genes: max.counts, min.counts, and min.cells
Two modes are supported:
- set common thresolds for grouped samples using command parameters
- specify thresolds for a single sample in csv file
Usages
quality-control.R <csv> <outdir> <project> [options]
scRNA-seq quality control using Seurat
positional arguments:
csv csv file including sample, path, qc thresolds(specific to a single sample)
outdir output result folder
project project name
flags:
-h, --help show this help message and exit
optional arguments:
--max.genes nFeature_RNA maximum [default: 5000]
--min.genes nFeature_RNA minimum [default: 200]
--max.counts nCount_RNA maximum [default: 40000]
--min.counts nCount_RNA minimum [default: 500]
--min.cells cell minimum [default: 3]
--max.mt percent of maximum mt genes [default: 20]
--max.hb percent of maximum hb genes [default: 10]
csv format
- sample: sample name
- path: cellranger matrix folder, including genes.tsv, barcodes.tsv, matrix.mtx
- qc: specified thresolds for this sample, use & for multiple parameters
sample,path,qc
ctrl,examples/ctrl,min.genes=10&min.counts=10&max.mt=15
stim,examples/stim,min.genes=10&min.counts=10
Examples
# testdata
## edit the path in `qc.input.csv` to absolute path
# run
quality-control.R examples/qc.input.csv ~/test/qc pbmc
Ouputs
qc
├── pbmc.qc.rds # seurat object after qc
├── pbmc.qc.stat.csv # qc statistics
├── pbmc.qc_after.pdf # plot after qc
├── pbmc.qc_before.pdf # plot before qc
├── ctrl.barcodes.csv # sample barcodes after qc
└── stim.barcodes.csv # sample barcodes after qc


Remove doublet using DoubletFinder
Usages
remove-doublet.R <input> <outdir> <project> [options]
scRNA-seq Doublet Removal using DoubletFinder
positional arguments:
input input seurat rds file
outdir output result folder
project project name
flags:
-h, --help show this help message and exit
optional arguments:
-d, --dims npcs in Seurat::RunPCA, default is 50 [default: 50]
-n, --nfeatures number of variable features to use for scaledata and
pca, default is 2000 [default: 2000]
Examples
remove-doublet.R examples/input.rds ~/tests/doublet-removal test
Ouputs
remove-doublet
├── pbmc.dedoublet.rds # seurat object after doublet removal
├── pbmc.dedoublet.stat.csv # statistics
├── ctrl.dedoublet.dims=30.after.rds # sample ctrl after doublet removal
├── ctrl.dedoublet.dims=30.pdf # sample ctrl figures in doublet removal
├── ctrl.dedoublet.dims=30.stat.csv # sample ctrl statistics in doublet removal
├── stim.dedoublet.dims=30.after.rds # sample stim after doublet removal
├── stim.dedoublet.dims=30.pdf # sample stim figures in doublet removal
└── stim.dedoublet.dims=30.stat.csv # sample stim statistics in doublet removal


Remove background RNA using SoupX
Usages
remove-background.R <raw> <filtered> <outdir> <project>
scRNA-seq Background RNA Removal using SoupX
positional arguments:
raw cellranger raw_feature_bc_matrix folder
filtered cellranger filtered_feature_bc_matrix folder
outdir output result folder
project project name
flags:
-h, --help show this help message and exit
Examples
# testdata
wget https://cf.10xgenomics.com/samples/cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz
tar -zxvf pbmc4k_raw_gene_bc_matrices.tar.gz
wget https://cf.10xgenomics.com/samples/cell-exp/2.1.0/pbmc4k/pbmc4k_filtered_gene_bc_matrices.tar.gz
tar -zxvf pbmc4k_filtered_gene_bc_matrices.tar.gz
# run
remove-background.R ./raw_gene_bc_matrices/GRCh38 ./filtered_gene_bc_matrices/GRCh38 ~/test/background-removal pbmc4k
Outputs
├── pbmc4k.bkremoval.SoupX.pdf # features
└── soupx_filtered_matrix # count matrix after soupx
├── barcodes.tsv
├── genes.tsv
└── matrix.mtx
├── clustering # cluster informations
