Calculating the RNA expression for each sample - labbces/sugarcane_RNAome GitHub Wiki

RNA quantification against the pan-transcriptome reference

I developed a Snakemake pipeline to automate the process of generating an RNA-Seq expression matrix from raw RNA-Seq datasets. The Snakemake pipeline was executed with this bash script.

[!NOTE] This pipeline was executed to generate quantification files (quant.sf) for samples from 54 contrasting genotypes (as present in the three selected papers mentioned previously). The reference for quantification was the pan-transcriptome of sugarcane clustered with CD-HIT using -c 1: CD-HIT_48_genotypes_transcriptome_salmonInx.

The following directed acyclic graph (DAG) represents the workflow to calculate the RNA expression matrix for each genotype (e.g. SP80-3280).

runSalmon_SnakefileDAG

[!NOTE] Please refer to the Snakefile and the config.yaml configuration file for complete details on implementation and parameters used in each rule. To run the pipeline, make sure to have a properly configured configuration file. To utilize my bash script to execute the pipeline, you must ensure that the config.yaml is present in your directory, along with the $genotype_samples.csv, (e.g. Q200_samples.csv file. This latter contains the SRA/ERR access identifiers for the raw data associated with each genotype. Notably, the Snakefile inherently identifies the genotype's name by extracting it from the $genotype_samples.csv file.