Configuration - LappalainenLab/RNApipeline GitHub Wiki

The Configuration File

All of the handlers in sequence handling rely on the information stored in the config file. To edit the config file, open it in your favorite text editor such as vim or Visual Studio Code. Follow the instructions in the config file to insert all the relevant information. Ideally, one should be able to reproduce a collaborator's output using only their Config file, raw samples, and the same version of RNApipeline.

Common Variables

These are parameters that are used by multiple handlers.

Variable Function Handlers
RAW_SAMPLES The list of raw samples to be processed. This should be a plain text file with one file path per line Quality_Assessment, Sequence_Trimming
OUT_DIR The output directory for all results and intermediate files. Final directory structure will look like ${OUT_DIR}/Name_of_Handler All
PROJECT A name for the current project. This is used to name the batch submissions All
EMAIL An email address used to receive notifications when a batch submission begins execution, finishes, or is aborted All
REF_GEN The full file path to the reference genome for your samples. All samples to be processed must use the same reference genome Read_Mapping, SAM_Processing

Quality_Assessment

Variable Function
QA_QSUB QSub settings for batch submission

Sequence_Trimming

Variable Function
ST_QSUB QSub settings for batch submission
FORWARD_NAMING Shared suffix for forward reads. Example: If your files are named sample1_R1.fastq and sample2_R1.fastq, then FORWARD_NAMING=_R1.fastq
REVERSE_NAMING Shared suffix for reverse reads. Example: If your files are named sample1_R2.fastq and sample2_R2.fastq, then REVERSE_NAMING=_R2.fastq
ADAPTERS A plain text or FASTA file with the adapter sequences. These sequences will depend on the technology and platform used for sequencing, but most common adapters for various platforms can be found online
PHRED64 Use the phred64 scale instead of the phred33 quality scale

Note: If you have single-end samples, leave FORWARD_NAMING and REVERSE_NAMING filled with values that do not match your samples. If none of your samples match the forward or reverse naming suffixes, Adapter_Trimming will automatically assume that the samples are single-end.

Read_Mapping

Variable Function
RM_QSUB QSub settings for batch submission
TRIMMED_LIST A list of adapter-trimmed or quality-trimmed samples to read map. This will be ${OUT_DIR}/Sequence_Trimming/${PROJECT}_trimmed.txt if using Sequence_Trimming
FORWARD_TRIMMED Shared suffix for forward reads. This will be _forward_paired.fastq.gz if using Sequence_Trimming
REVERSE_TRIMMED Shared suffix for reverse reads. This will be _reverse_paired.fastq.gz if using Sequence_Trimming
SINGLES_TRIMMED Shared suffix for single reads. This will be _trimmed.fastq.gz if using Sequence_Trimming
REF_IND Directory with STAR reference index for REF_GEN

Note: If running single-end samples, leave FORWARD_TRIMMED and REVERSE_TRIMMED filled with values that do not match your samples. If running paired-end samples, leave SINGLES_TRIMMED filled with values that do not match your samples.

SAM_Processing

Variable Function Method
SP_QSUB QSub settings for batch submission
MAPPED_LIST A list of full file paths to the read-mapped samples. This will be ${OUT_DIR}/Read_Mapping/${PROJECT}_Mapped.txt if using Read_Mapping
INDEX_TYPE Generate either BAI or CSI indices for final BAM file

Quantify_Summarize

Variable Function
QS_QSUB QSub settings for batch submission
BAM_LIST A list of full file paths to the finished BAM files. This will be ${OUT_DIR}/SAM_Processing/${PROJECT}_bams.txt if using SAM_Processing
REF_ANN Annotations for reference genome; must be in GTF or GFF3 format
STRUCTURAL Annotations denoting regions of structural RNA; must be in GTF, GFF3, or BED format
DSTAT_EXPR Reference expression table for DSTAT calculations