WIKI single‐cell‐DNA - gustaveroussy/single-cell-DNA GitHub Wiki
Welcome to the single-cell-DNA wiki!
Pipeline Goal:
Perform single-cell DNA-seq analysis from FastQ files to figures file for missionbio tapestri data.
Steps available:
- Alignment
- Preprocessing (filtering bad quality variants, CNV and cells)
- SNV_CNV (Normalization dimension Reduction and clustering)
- PROTEIN (Normalization dimension Reduction and clustering)
- ALL (Combining DNA-seq analysis & Proteomic analysis)
- Phylogeny (reconstruction of mutations events)

Usage
Usage on Flamingo, the GR's computing cluster
:heavy_exclamation_mark: if you already used the single-cell RNA-seq pipeline it is identical
- make the parameters file according to your needs (see below how to configure the parameter file)
- indicate the path to this file in the path_to_configfile variable
- run the snakemake command
module load singularity
path_to_configfile="<path/to/your_configfile.yaml>"
path_to_pipeline="<path/to/single-cell-dna-seq>"
snakemake --profile ${path_to_pipeline}/profiles/local -s ${path_to_pipeline}/Snakefile --configfile ${path_to_configfile}
Configuration
1. steps & alignment: choose the steps to run
name |
description |
example |
default value |
possible value |
steps |
steps to run |
[Aligment,preprocessing,SNV_CNV,ALL,phylogeny] |
NA |
Aligment,preprocessing,SNV_CNV,PROTEIN,ALL,phylogeny |
tmp |
temporary directory |
/tmp |
NA |
NA |
sample |
sample(s) to run |
[sample_1,sample_2] |
NA |
NA |
reference_genome_path |
path of the reference genome |
"/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/hg19/ucsc_hg19.fa" |
"/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/hg19/ucsc_hg19.fa" |
"/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/hg19/ucsc_hg19.fa","/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v3/hg19/ucsc_hg19.fa" |
reference_genome |
reference genome release |
"hg19" |
"hg19" |
"hg19" |
type_analysis |
select your analysis dna or dna+protein" |
"dna+protein" |
NA |
"dna","dna+protein" |
panel_path |
path of your panel of variants |
"</your/path/to/panel/file/location>" |
"/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/Myeloid" |
"/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/Myeloid","<your/path/panel/location>" |
panel_protein_path |
path of the reference fasta for protein |
"/mnt//beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/protein" |
"/mnt//beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/protein" |
"/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/protein/","/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v3/panels/protein/" |
design_file |
path to design file in order to create proper yaml file for aligment |
"/your/path/to/panel/file/location" |
NA |
NA |
2.filtering: filtering - remove bad quality variants & preprocess your data
name |
description |
example |
default value |
possible value |
filter_na |
filtering Missing Value |
True |
False |
True/False |
filter_na_percent |
remove variants which missing value are superior or equal to the threshold |
35 |
25 |
any integer |
predict_missing_value |
KNN predict missing value variants |
True |
False |
True/false |
filtering_variants |
multiple filter in order to remove bad quality variants & cells |
NA |
NA |
NA |
max_vaf_percent |
filter variants which mean VAF value is superior or equal |
95 |
95 |
any integer |
whitelist |
variants that must be keep (even if their quality is poor) |
["chr20:33868702:T/C"] |
NA |
NA |
2.1. filtering: filtering_variants
name |
description |
example |
default value |
possible value |
min_dp |
The minimum depth (DP) for the call to be considered |
10 |
10 |
any integer |
min_gq |
The minimum genotype quality (GQ) for the call to be considered |
30 |
30 |
any integer |
vaf_ref |
All reference calls (NGT = 0) with VAF > vaf_ref are converted to no calls (NGT = 3) |
5 |
5 |
any integer |
vaf_het |
All hetrozygous calls (NGT = 1) with VAF < vaf_het are converted to no calls (NGT = 3) |
35 |
35 |
any integer |
vaf_hom |
All homozygous calls (NGT = 2) with VAF < vaf_hom are converted to no calls (NGT = 3) |
95 |
95 |
any integer |
min_mut_prct_cells |
The minimum percent of the total cells in which the variant should be mutated, |
1 |
1 |
any integer |
min_prct_cells |
The minimum percent of total cells in which the variant should be present |
50 |
50 |
any integer |
3.SNV: snv_norm_dimred
name |
description |
example |
default value |
possible value |
method_dimred |
select dimension reduction for variants matrix between |
pca |
pca |
fa,pca |
max_dims |
maximum dimensions for the dimension reduction |
6 |
6 |
any integer |
clustering_method |
clustering method to use |
leiden |
dbscan |
graph-community,leiden,dbscan,hdbscan |
3.CNV: cnv_norm_dimred
name |
description |
example |
default value |
possible value |
max_dims |
maximum dimensions for the dimension reduction |
6 |
6 |
any integer |
clustering_method |
clustering method to use |
leiden |
dbscan |
graph-community,leiden,dbscan,hdbscan |
4.PROTEIN: prot_norm_dimred
name |
description |
example |
default value |
possible value |
normalization |
normalization method to correct noise |
DSB |
CLR |
CLR,DSB,asinh,NSP |
clustering_method |
clustering method to use |
leiden |
dbscan |
graph-community,leiden,dbscan,hdbscan |
5.ALL: all_norm_dimred
name |
description |
example |
default value |
possible value |
snv |
SNV parameters to keep in order to combine multi-omics data |
NA |
NA |
NA |
cnv |
CNV parameters to keep in order to combine multi-omics data |
NA |
NA |
NA |
prot |
Protein parameters to keep in order to combine multi-omics data |
NA |
NA |
NA |
variants_of_interest |
takes a list of variants of interest in order to label data |
["EIF6:20:33868702:T:C","TP53:17:7577559:G:T"] |
NA |
|
chr_of_interest |
list of chromsomes to focus on it |
["5","17","7"] |
NA |
any list of number of chromosomes |
5.1. all_norm_dimred - snv
name |
description |
example |
default value |
possible value |
method_dimred |
reduction method to keep |
pca |
pca |
fa,pca |
dims |
number of dimensions to keep |
6 |
6 |
any integer |
clustering_method |
clustering method to keep |
leiden |
dbscan |
graph-community,leiden,dbscan,hdbscan |
res |
resolution for clustering to keep |
NA |
NA |
depend of the algorithm leiden and dbscan take float graph-community and hdbscan take integer |
predict_missing_value |
boolean to predict missing value using KNN method |
True |
False |
True/False |
5.2. all_norm_dimred - cnv
name |
description |
example |
default value |
possible value |
method_dimred |
reduction method to keep |
pca |
pca |
pca |
dims |
number of dimensions to keep |
6 |
6 |
any integer |
clustering_method |
clustering method to keep |
leiden |
dbscan |
graph-community,leiden,dbscan,hdbscan |
res |
resolution for clustering to keep |
NA |
NA |
depend of the algorithm leiden and dbscan take float graph-community and hdbscan take integer |
5.3. all_norm_dimred - prot
name |
description |
example |
default value |
possible value |
normalization |
normalization method to keep |
CLR |
CLR |
CLR,DSB,asinh,NSP |
method_dimred |
reduction method to keep |
pca |
pca |
fa,pca |
dims |
number of dimensions to keep |
6 |
6 |
any integer |
clustering_method |
clustering method to keep |
leiden |
dbscan |
graph-community,leiden,dbscan,hdbscan |
res |
resolution for clustering to keep |
NA |
NA |
depend of the algorithm leiden and dbscan take float graph-community and hdbscan take integer |
6. phylogeny
name |
description |
example |
default value |
possible value |
phylogeny_method |
list of method to use for mutations events reconstruction |
["COMPASS","infSCITE"] |
NA |
COMPASS,infSCITE,BiTSC2 |
6.1 phylogeny - COMPASS
name |
description |
example |
default value |
possible value |
bool_cnv |
add CNV in the reconstruction mutations events |
1 |
0 |
0,1 |
What's coming next ?
- infSCITE and BiTSC2 are not implemented yet but they will be added soon
- Currently the version of mosaic used is 2.4.1, it will be updated to the 3.0.1
Questions
Don't hesitate to contact the bioinformatic plateform at [email protected] or [email protected] if you have any questions/suggestion.