Module : saturation filtering - ComputationalSystemsBiology/Single-cell-RNA-seq GitHub Wiki

Module : saturation-filtering

This module removes bad quality cells from the dataset based on a given cell's saturation level (ie, sequencing depth).

  • Internal name : saturation-filtering

  • Avalaible : local mode

  • Input Ports :

    • matrix : initial count matrix (tsv)
    • cells : initial cells metadata (tsv)
    • genes : genes metadata (tsv)
  • Output Ports :

    • completcellsoutput : initial cells metadata (tsv) (completed with quality metrics)
    • matrixoutput : filtered count matrix (tsv)
    • cellsoutput : filtered cells metadata (tsv)
  • Optional parameters :

Parameter Type Description Default Value
detection_threshold integer Minimal number of reads to consider a feature as detected 10
expression_option string Type of feature to consider (Endogenous, Nuclear or All) Endogenous
saturation_threshold float Minimal saturation (0.7 seem appropriate for most experiment) 0.7
fit_threshold float Minimal fitting value to keep a cell 0.97
rounding int Magnitude of total count rounding when resampling reads (10000 is appropriate for SMART-Seq data, 100 is better for Drop-Seq data) 10000
n_cores int Number of threads used for saturation analysis 2
prop_mt float Maximum proportion of reads mapping to mitochondrial features 0.1
prop_sp float Maximum proportion of reads mapping to exogenous features 0.5
nb_filters int Minimum number of failures triggering removal 1
  • Configuration example
<step id="QC" skip="false">
	<module>saturation-filtering</module>
	<parameters>
		<parameter>
			<name>fit_threshold</name>
			<value>0.97</value>	
		</parameter>
		<parameter>
			<name>rounding</name>
			<value>1000</value>	
		</parameter>
		<parameter>
			<name>n_cores</name>
			<value>12</value>	
		</parameter>
	  </parameters>
</step>

Interpreting output files

Saturation Plots

For each cell, the module produces a saturation plot. Briefly, considering counts for a cell, the module resamples (ie, random subsets without replacement) increasing number of reads and counts the number of unique features detected. Then it fits a Michaelis-Menten model on the measured values. The measured values are plotted in blue and the model is plotted in brown. After this sampling, two main profiles can be observed : unsaturated

unsaturated cell

Saturated

saturated cell

The module excludes unsaturated cells and keeps saturated ones (most of the information from saturated cells has been captured, whereas unsaturated cells miss a piece of information).

Evaluation of model fitting and misfit removal

In order to evaluate the model's goodness of fit, the module estimates the amount of variance between the measured values and the values explained by the model (ie, R-squared of a linear model where the model's input value corresponds to "measured value" and the model's output value to "predicted value"). The model is plotted on a goodness of fit plot :

FittingPlot

Considering that the model would be fitted for the majority of cells, we expect the distribution of this values to be "nearly Gaussian" :

densityFiltered

Misfits are expected to be outliers showing lesser values (see distribution plot below).

densityAll

We observed that misfits are linked to bad quality cells. Removing these cells using a threshold on filtering, typically 0.97, yields good filtering properties.

The module also plots goodness of fit as a function of a Michaelis-Menten model's maximum, before and after outliers depletion. This allows for a visual inspection of the process.

qualityAll qualityFiltered

Scatter Plot

After cleaning the data, the module produces two scatter plots, showing all cells in terms of number of features (y-axis) and number of reads (x-axis).

Raw_Cellplot

The first one shows all cells. The red ones are those being eliminated.

Filtered_cellplot

The second one shows the remaining cells after filtering. At the end of the filtering, cells should behave like a mixture of Gaussians, i.e. you can wrap them in a given number of ellipses.

⚠️ **GitHub.com Fallback** ⚠️