Calculating the RNA expression for each sample - labbces/sugarcane_RNAome GitHub Wiki
RNA quantification against the pan-transcriptome reference
I developed a Snakemake pipeline to automate the process of generating an RNA-Seq expression matrix from raw RNA-Seq datasets. The Snakemake pipeline was executed with this bash script.
[!NOTE] This pipeline was executed to generate quantification files (
quant.sf
) for samples from 54 contrasting genotypes (as present in the three selected papers mentioned previously). The reference for quantification was the pan-transcriptome of sugarcane clustered with CD-HIT using-c 1
:CD-HIT_48_genotypes_transcriptome_salmonInx
.
The following directed acyclic graph
(DAG) represents the workflow to calculate the RNA expression matrix for each genotype (e.g. SP80-3280).
[!NOTE] Please refer to the Snakefile and the config.yaml configuration file for complete details on implementation and parameters used in each rule. To run the pipeline, make sure to have a properly configured configuration file. To utilize my bash script to execute the pipeline, you must ensure that the
config.yaml
is present in your directory, along with the$genotype_samples.csv
, (e.g. Q200_samples.csv file. This latter contains the SRA/ERR access identifiers for the raw data associated with each genotype. Notably, the Snakefile inherently identifies the genotype's name by extracting it from the$genotype_samples.csv
file.