3. Reads Processing - labbces/SpliceScape GitHub Wiki

Filtered Reads Processing

Reads processing refers to three main steps: 1) Downloading reads; 2) Checking the integrity of downloaded reads; and 3) Cleaning downloaded sequences with BBDuk. All these processes are organized in a Nextflow script called reads_processing.nf. To execute this script, you will need the SRR identifiers for the reads and the specification of several variables detailed below. The final output of this process is a set of cleaned reads.

reads_processing.nf

The reads_processing.nf Nextflow script is designed to automate downloading and cleaning RNAseq reads. The script performs three main tasks:

Downloading RNAseq reads (fastq.gz files) for specified SRA accessions.
Checking the integrity of downloaded reads.
Cleaning reads with BBDuk.

Requirements:

Category	Requirements
Software	Nextflow
Tools	`ffq` (for fetching metadata), `wget` (for downloading), `md5sum` (for checksum verification), `BBDuk` (for cleaning reads)
Input	SRA accession as a parameter (`params.reads`), reference file for BBDuk (`params.rref`), and various BBDuk parameters (`params.minlength`, `params.trimq`, `params.k`, `params.maxmem`)
Dependencies	Python3 (for `download_from_json.py` script)

Arguments:

Argument	Function	Required	Default	Example
`--reads`	SRA accession to be processed. This is required for the workflow.	Yes	`"SRR28642269"`	`SRR28642269`
`--bbduk`	Path to the BBDuk executable.	Yes	`"/Storage/progs/bbmap_35.85/bbduk2.sh"`	`/path/to/bbduk.sh`
`--rref`	Path to the reference file for BBDuk.	Yes	`"/Storage/progs/Trimmomatic-0.38/adapters/NexteraPE-PE.fa"`	`/path/to/reference.fa`
`--minlength`	Minimum length of reads to keep after trimming.	No	`60`	`60`
`--trimq`	Quality trimming threshold.	No	`20`	`20`
`--k`	K-mer length to find contaminants.	No	`27`	`27`
`--maxmem`	Maximum memory allocation for BBDuk in GB.	No	`20`	`20`
`-log`	Define path to save .log files.	No	`None`	`/absolute/path/to/directory/log/nextflow.log`

Run example:

nextflow -log /absolute/path/to/directory/log/nextflow.log run reads_processing.nf --reads "SRR28642268"

Errors

An (apparently temporary) error with ffq. Resolved the issue by following the guidance from sample search gets NoneType' object has no attribute 'text' #73

Error

[2024-08-06 11:45:03,206] INFO Parsing run SRR6188822 [2024-08-06 11:45:04,491] ERROR 404 Client Error: for url: https://www.ebi.ac.uk/ena/browser/api/xml/SRR6188822/ [2024-08-06 11:45:04,492] ERROR Provided accession is invalid usage: ffq [-h] [-o OUT] [-l LEVEL] [--ftp] [--aws] [--gcp] [--ncbi] [--split] [--verbose] [--version] IDs [IDs ...] ffq: error: For possible failure modes, please see https://github.com/pachterlab/ffq#failure-modes

Solution

pip install git+https://github.com/pachterlab/ffq@devel

Acknowledgment

This script is based on the work developed in nextflow_practice. Part of this script is based on nf-core bbmap_bbduk module.