3. Reads Processing - labbces/SpliceScape GitHub Wiki

Filtered Reads Processing

Reads processing refers to three main steps: 1) Downloading reads; 2) Checking the integrity of downloaded reads; and 3) Cleaning downloaded sequences with BBDuk. All these processes are organized in a Nextflow script called reads_processing.nf. To execute this script, you will need the SRR identifiers for the reads and the specification of several variables detailed below. The final output of this process is a set of cleaned reads.

reads_processing.nf

The reads_processing.nf Nextflow script is designed to automate downloading and cleaning RNAseq reads. The script performs three main tasks:

  1. Downloading RNAseq reads (fastq.gz files) for specified SRA accessions.
  2. Checking the integrity of downloaded reads.
  3. Cleaning reads with BBDuk.
  • Requirements:
Category Requirements
Software Nextflow
Tools ffq (for fetching metadata), wget (for downloading), md5sum (for checksum verification), BBDuk (for cleaning reads)
Input SRA accession as a parameter (params.reads), reference file for BBDuk (params.rref), and various BBDuk parameters (params.minlength, params.trimq, params.k, params.maxmem)
Dependencies Python3 (for download_from_json.py script)
  • Arguments:
Argument Function Required Default Example
--reads SRA accession to be processed. This is required for the workflow. Yes "SRR28642269" SRR28642269
--bbduk Path to the BBDuk executable. Yes "/Storage/progs/bbmap_35.85/bbduk2.sh" /path/to/bbduk.sh
--rref Path to the reference file for BBDuk. Yes "/Storage/progs/Trimmomatic-0.38/adapters/NexteraPE-PE.fa" /path/to/reference.fa
--minlength Minimum length of reads to keep after trimming. No 60 60
--trimq Quality trimming threshold. No 20 20
--k K-mer length to find contaminants. No 27 27
--maxmem Maximum memory allocation for BBDuk in GB. No 20 20
-log Define path to save .log files. No None /absolute/path/to/directory/log/nextflow.log
  • Run example:
nextflow -log /absolute/path/to/directory/log/nextflow.log run reads_processing.nf --reads "SRR28642268"

Errors

An (apparently temporary) error with ffq. Resolved the issue by following the guidance from sample search gets NoneType' object has no attribute 'text' #73

Error

[2024-08-06 11:45:03,206] INFO Parsing run SRR6188822 [2024-08-06 11:45:04,491] ERROR 404 Client Error: for url: https://www.ebi.ac.uk/ena/browser/api/xml/SRR6188822/ [2024-08-06 11:45:04,492] ERROR Provided accession is invalid usage: ffq [-h] [-o OUT] [-l LEVEL] [--ftp] [--aws] [--gcp] [--ncbi] [--split] [--verbose] [--version] IDs [IDs ...] ffq: error: For possible failure modes, please see https://github.com/pachterlab/ffq#failure-modes

Solution

pip install git+https://github.com/pachterlab/ffq@devel

Acknowledgment

This script is based on the work developed in nextflow_practice. Part of this script is based on nf-core bbmap_bbduk module.