3. Reads Processing - labbces/SpliceScape GitHub Wiki
Filtered Reads Processing
Reads processing refers to three main steps: 1) Downloading reads; 2) Checking the integrity of downloaded reads; and 3) Cleaning downloaded sequences with BBDuk. All these processes are organized in a Nextflow script called reads_processing.nf. To execute this script, you will need the SRR identifiers for the reads and the specification of several variables detailed below. The final output of this process is a set of cleaned reads.
reads_processing.nf
The reads_processing.nf
Nextflow script is designed to automate downloading and cleaning RNAseq reads. The script performs three main tasks:
- Downloading RNAseq reads (fastq.gz files) for specified SRA accessions.
- Checking the integrity of downloaded reads.
- Cleaning reads with BBDuk.
- Requirements:
Category | Requirements |
---|---|
Software | Nextflow |
Tools | ffq (for fetching metadata), wget (for downloading), md5sum (for checksum verification), BBDuk (for cleaning reads) |
Input | SRA accession as a parameter (params.reads ), reference file for BBDuk (params.rref ), and various BBDuk parameters (params.minlength , params.trimq , params.k , params.maxmem ) |
Dependencies | Python3 (for download_from_json.py script) |
- Arguments:
Argument | Function | Required | Default | Example |
---|---|---|---|---|
--reads |
SRA accession to be processed. This is required for the workflow. | Yes | "SRR28642269" |
SRR28642269 |
--bbduk |
Path to the BBDuk executable. | Yes | "/Storage/progs/bbmap_35.85/bbduk2.sh" |
/path/to/bbduk.sh |
--rref |
Path to the reference file for BBDuk. | Yes | "/Storage/progs/Trimmomatic-0.38/adapters/NexteraPE-PE.fa" |
/path/to/reference.fa |
--minlength |
Minimum length of reads to keep after trimming. | No | 60 |
60 |
--trimq |
Quality trimming threshold. | No | 20 |
20 |
--k |
K-mer length to find contaminants. | No | 27 |
27 |
--maxmem |
Maximum memory allocation for BBDuk in GB. | No | 20 |
20 |
-log |
Define path to save .log files. | No | None |
/absolute/path/to/directory/log/nextflow.log |
- Run example:
nextflow -log /absolute/path/to/directory/log/nextflow.log run reads_processing.nf --reads "SRR28642268"
Errors
An (apparently temporary) error with ffq
. Resolved the issue by following the guidance from sample search gets NoneType' object has no attribute 'text' #73
Error
[2024-08-06 11:45:03,206] INFO Parsing run SRR6188822 [2024-08-06 11:45:04,491] ERROR 404 Client Error: for url: https://www.ebi.ac.uk/ena/browser/api/xml/SRR6188822/ [2024-08-06 11:45:04,492] ERROR Provided accession is invalid usage: ffq [-h] [-o OUT] [-l LEVEL] [--ftp] [--aws] [--gcp] [--ncbi] [--split] [--verbose] [--version] IDs [IDs ...] ffq: error: For possible failure modes, please see https://github.com/pachterlab/ffq#failure-modes
Solution
pip install git+https://github.com/pachterlab/ffq@devel
Acknowledgment
This script is based on the work developed in nextflow_practice. Part of this script is based on nf-core bbmap_bbduk module.