Command line arguments for source data clustering - nicococo/scRNA GitHub Wiki

Setting up the Source Dataset

scRNA-source.sh

Input and output files:

Command line arguments Description
--fname Source data (TSV file)
--fgene-ids Source gene ids (TSV file)
--fout Result files will use this prefix
--flabels (optional) Source cluster labels (TSV file)

Data pre-processing Gene/cell filtering arguments (SC3 inspired):

Command line arguments Description
--min_expr_genes (Cell filter) Minimum number of expressed genes (default 2000)", default=2000, type=int)
--non_zero_threshold (Cell/gene filter) Threshold for zero expression per gene (default 1.0)
--perc_consensus_genes (Gene filter) Filter genes that coincide across a percentage of cells (default 0.98)
--no-cell-filter Disable cell filter
--no-gene-filter Disable gene filter
--no-transform Disable log2(x+1) data transformation

Test settings: The software will test all values specified in cluster-range and store results separately.

Command line arguments Description
--cluster-range Comma separated list of clusters (default 6,7,8)

These are NMF related parameters:

Command line arguments Description
--nmf_alpha Regularization strength (default 1.0)
--nmf_l1 L1 regularization impact [0,1] (default 0.75)
--nmf_max_iter Maximum number of iterations (default 4000)
--nmf_rel_err Relative error threshold must be reached before convergence (default 1e-3)

Additional commands:

Command line arguments Description
--no-tsne Do not plot t-SNE plots as they can be quite time consuming