Usage scripts - MeryemAk/dRNASeq GitHub Wiki

This usage guide provides instructions on how to use the dRNASeq analysis pipeline. It assumes that you have successfully installed all necessary dependencies as described in the Installation Guide.

Pipeline Overview

The dRNASeq pipeline is designed for the analysis of Nanopore RNA sequencing data, with the goal of characterizing both host and microbial gene expression. The pipeline includes the following steps:

Raw Data Processing – Quality control and trimming of raw Nanopore reads.
Alignment – Mapping reads to a reference genome (human, Candida albicans, and selected bacteria).
Counting – Gene-level quantification using featureCounts.
Taxonomic Classification – Assigning taxonomic labels to unmapped reads using Kraken2.
Differential Analysis – Identifying differentially expressed genes (not yet implemented).

Note: This usage guide explains how to run the scripts. For detailed parameters and logic, please refer to the code and any additional documentation provided in the repository.

Getting Started

1. Activate the Conda Environment

Before running the pipeline, activate the Conda environment:

```bash
conda activate dRNAseq
```

(To deactivate the environment later, use `conda deactivate`).

2. Navigate to the scripts directory:

Change into the directory where the pipeline scripts are located:

```bash
cd /path/to/your/cloned/dRNASeq/scripts      # (Replace `/path/to/your/cloned/dRNASeq/scripts` with the actual path to the repository on your system)
```

Running the Pipeline

Step 1. Merging

Merge all FASTQ files from a single barcode using the 2.merge.sh script:

./merge.sh <input directory>     #Replace <input_directory> with the path to your raw FASTQ files.

Step 2. Quality Control (QC)

Perform QC on the merged fastq files.

./3.qc.sh

Step 3. Trimming

Trim full-length nanopore cDNA reads using Pychopper.

./4.trimming.sh

(Optional: Re-run 3.qc.sh after trimming to assess post-trimming quality. Back up the original QC reports before rerunning the script, as they will be overwritten!)

Step 4. Mapping

The mapping process follows a cascading approach. First, reads are aligned to the human reference genome using minimap2. Any reads that do not align are then remapped to the Candida albicans genome. The remaining unmapped reads are finally mapped to a reference set of bacterial genomes associated with bacterial vaginosis. To run the mapping step, execute:

./5.minimap2.py

Step 5. Counting

Counting the genes is performed with the featureCounts tool. Run the 6.counting.sh script.

./6.counting.sh

Step 6. (optional): Taxonomic classification of unmapped reads with Kraken2.

./7.kraken.sh