Configuration - LappalainenLab/RNApipeline GitHub Wiki

The Configuration File

All of the handlers in sequence handling rely on the information stored in the config file. To edit the config file, open it in your favorite text editor such as vim or Visual Studio Code. Follow the instructions in the config file to insert all the relevant information. Ideally, one should be able to reproduce a collaborator's output using only their Config file, raw samples, and the same version of RNApipeline.

Common Variables

These are parameters that are used by multiple handlers.

Variable	Function	Handlers
`RAW_SAMPLES`	The list of raw samples to be processed. This should be a plain text file with one file path per line	Quality_Assessment, Sequence_Trimming
`OUT_DIR`	The output directory for all results and intermediate files. Final directory structure will look like `${OUT_DIR}/Name_of_Handler`	All
`PROJECT`	A name for the current project. This is used to name the batch submissions	All
`EMAIL`	An email address used to receive notifications when a batch submission begins execution, finishes, or is aborted	All
`REF_GEN`	The full file path to the reference genome for your samples. All samples to be processed must use the same reference genome	Read_Mapping, SAM_Processing

Quality_Assessment

Variable	Function
`QA_QSUB`	QSub settings for batch submission

Sequence_Trimming

Variable	Function
`ST_QSUB`	QSub settings for batch submission
`FORWARD_NAMING`	Shared suffix for forward reads. Example: If your files are named `sample1_R1.fastq` and `sample2_R1.fastq`, then `FORWARD_NAMING=_R1.fastq`
`REVERSE_NAMING`	Shared suffix for reverse reads. Example: If your files are named `sample1_R2.fastq` and `sample2_R2.fastq`, then `REVERSE_NAMING=_R2.fastq`
`ADAPTERS`	A plain text or FASTA file with the adapter sequences. These sequences will depend on the technology and platform used for sequencing, but most common adapters for various platforms can be found online
`PHRED64`	Use the phred64 scale instead of the phred33 quality scale

Note: If you have single-end samples, leave FORWARD_NAMING and REVERSE_NAMING filled with values that do not match your samples. If none of your samples match the forward or reverse naming suffixes, Adapter_Trimming will automatically assume that the samples are single-end.

Read_Mapping

Variable	Function
`RM_QSUB`	QSub settings for batch submission
`TRIMMED_LIST`	A list of adapter-trimmed or quality-trimmed samples to read map. This will be `${OUT_DIR}/Sequence_Trimming/${PROJECT}_trimmed.txt` if using Sequence_Trimming
`FORWARD_TRIMMED`	Shared suffix for forward reads. This will be `_forward_paired.fastq.gz` if using Sequence_Trimming
`REVERSE_TRIMMED`	Shared suffix for reverse reads. This will be `_reverse_paired.fastq.gz` if using Sequence_Trimming
`SINGLES_TRIMMED`	Shared suffix for single reads. This will be `_trimmed.fastq.gz` if using Sequence_Trimming
`REF_IND`	Directory with STAR reference index for `REF_GEN`

Note: If running single-end samples, leave FORWARD_TRIMMED and REVERSE_TRIMMED filled with values that do not match your samples. If running paired-end samples, leave SINGLES_TRIMMED filled with values that do not match your samples.

SAM_Processing

Variable	Function	Method
`SP_QSUB`	QSub settings for batch submission
`MAPPED_LIST`	A list of full file paths to the read-mapped samples. This will be `${OUT_DIR}/Read_Mapping/${PROJECT}_Mapped.txt` if using Read_Mapping
`INDEX_TYPE`	Generate either BAI or CSI indices for final BAM file

Quantify_Summarize

Variable	Function
`QS_QSUB`	QSub settings for batch submission
`BAM_LIST`	A list of full file paths to the finished BAM files. This will be `${OUT_DIR}/SAM_Processing/${PROJECT}_bams.txt` if using SAM_Processing
`REF_ANN`	Annotations for reference genome; must be in GTF or GFF3 format
`STRUCTURAL`	Annotations denoting regions of structural RNA; must be in GTF, GFF3, or BED format
`DSTAT_EXPR`	Reference expression table for DSTAT calculations