Configuration - LappalainenLab/RNApipeline GitHub Wiki
The Configuration File
All of the handlers in sequence handling rely on the information stored in the config file.
To edit the config file, open it in your favorite text editor such as vim or Visual Studio Code.
Follow the instructions in the config file to insert all the relevant information.
Ideally, one should be able to reproduce a collaborator's output using only their Config file, raw samples, and the same version of RNApipeline.
Common Variables
These are parameters that are used by multiple handlers.
Variable |
Function |
Handlers |
RAW_SAMPLES |
The list of raw samples to be processed. This should be a plain text file with one file path per line |
Quality_Assessment, Sequence_Trimming |
OUT_DIR |
The output directory for all results and intermediate files. Final directory structure will look like ${OUT_DIR}/Name_of_Handler |
All |
PROJECT |
A name for the current project. This is used to name the batch submissions |
All |
EMAIL |
An email address used to receive notifications when a batch submission begins execution, finishes, or is aborted |
All |
REF_GEN |
The full file path to the reference genome for your samples. All samples to be processed must use the same reference genome |
Read_Mapping, SAM_Processing |
Variable |
Function |
QA_QSUB |
QSub settings for batch submission |
Variable |
Function |
ST_QSUB |
QSub settings for batch submission |
FORWARD_NAMING |
Shared suffix for forward reads. Example: If your files are named sample1_R1.fastq and sample2_R1.fastq , then FORWARD_NAMING=_R1.fastq |
REVERSE_NAMING |
Shared suffix for reverse reads. Example: If your files are named sample1_R2.fastq and sample2_R2.fastq , then REVERSE_NAMING=_R2.fastq |
ADAPTERS |
A plain text or FASTA file with the adapter sequences. These sequences will depend on the technology and platform used for sequencing, but most common adapters for various platforms can be found online |
PHRED64 |
Use the phred64 scale instead of the phred33 quality scale |
Note: If you have single-end samples, leave FORWARD_NAMING
and REVERSE_NAMING
filled with values that do not match your samples. If none of your samples match the forward or reverse naming suffixes, Adapter_Trimming will automatically assume that the samples are single-end.
Variable |
Function |
RM_QSUB |
QSub settings for batch submission |
TRIMMED_LIST |
A list of adapter-trimmed or quality-trimmed samples to read map. This will be ${OUT_DIR}/Sequence_Trimming/${PROJECT}_trimmed.txt if using Sequence_Trimming |
FORWARD_TRIMMED |
Shared suffix for forward reads. This will be _forward_paired.fastq.gz if using Sequence_Trimming |
REVERSE_TRIMMED |
Shared suffix for reverse reads. This will be _reverse_paired.fastq.gz if using Sequence_Trimming |
SINGLES_TRIMMED |
Shared suffix for single reads. This will be _trimmed.fastq.gz if using Sequence_Trimming |
REF_IND |
Directory with STAR reference index for REF_GEN |
Note: If running single-end samples, leave FORWARD_TRIMMED
and REVERSE_TRIMMED
filled with values that do not match your samples. If running paired-end samples, leave SINGLES_TRIMMED
filled with values that do not match your samples.
Variable |
Function |
Method |
SP_QSUB |
QSub settings for batch submission |
|
MAPPED_LIST |
A list of full file paths to the read-mapped samples. This will be ${OUT_DIR}/Read_Mapping/${PROJECT}_Mapped.txt if using Read_Mapping |
|
INDEX_TYPE |
Generate either BAI or CSI indices for final BAM file |
|
Variable |
Function |
QS_QSUB |
QSub settings for batch submission |
BAM_LIST |
A list of full file paths to the finished BAM files. This will be ${OUT_DIR}/SAM_Processing/${PROJECT}_bams.txt if using SAM_Processing |
REF_ANN |
Annotations for reference genome; must be in GTF or GFF3 format |
STRUCTURAL |
Annotations denoting regions of structural RNA; must be in GTF, GFF3, or BED format |
DSTAT_EXPR |
Reference expression table for DSTAT calculations |