Trimming - SabaLab/RNASeq_Scripts GitHub Wiki
Usage
trimBatch.py
Run an entire folder of rawReads through trimming with cutadapt. Allows you to summarize avg read length and # of reads in rawRead files and then trimmedRead files when finished.
Usage: trimBatch.py [options] inputPath
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-c, --count-raw When set script will count/summarize the raw files
first
--cutadapt-path=CUTADAPTPATH
set cutadapt path
--cutadapt-q=CUTADAPTQ
set cutadapt -q to trim based on quality score
--cutadapt-m=CUTADAPTM
set cutadapt -m to set the minimum read length.
--cutadapt-M=CUTADAPTM
set cutadapt -M to set the maximum read length.
-a ADAPTA, --cutadapt-a=ADAPTA
set cutadapt -a set 3' adapter for reads or read1 if
paired
-A ADAPTA, --cutadapt-A=ADAPTA
set cutadapt -A set 3' adapter for reads or read2 if
paired
-b ADAPTB, --cutadapt-b=ADAPTB
set cutadapt -b set 5' adapter for reads or read1 if
paired
-B ADAPTB, --cutadapt-B=ADAPTB
set cutadapt -B set 5' adapter for reads or read2 if
paired
-g ADAPTG, --cutadapt-g=ADAPTG
set cutadapt -g set 3' or 5' adapter for reads or
read1 if paired
-G ADAPTG, --cutadapt-G=ADAPTG
set cutadapt -G set 3' or 5' adapter for reads or
read2 if paired
-P, --paired pass rsem the --paired-end parameter and look for
paired end files to pass appropriate paired end files
to rsem
-U, --unpaired pass rsem appropriate unpaired files/parameters
-d SAMPLEDELIM, --delim=SAMPLEDELIM
A delimiter to detect the end of the sample label,
default is _L00 to parse everything before the lane as
the sample name.
-p MAXP set number of processes to run at once.
--pair-prefix=PAIRPREFIX
A prefix before the paired label part of the file name
for paired reads, defaults to _R and assumes _R1 -
first read and _R2 is second read
-o OUTPUT, --output=OUTPUT
The output folder. Output will go to a folder with
the extracted Sample Name in this location.
-i INSUFFIX, --input-suffix=INSUFFIX
The suffix to look for in the inpute files. ex .fq.gz
or .fastq
Example
/usr/local/scripts/trimBatch.py -P -c -p 6 -o /data/hi-seq/HRDP.Liver.totalRNA.2018-01-10/trimmedReads/v2 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTATGCCGTCTTCTGCTTG -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -i .fastq.gz -d _R /data/hi-seq/HRDP.Liver.totalRNA.2018-01-10/rawReads
-P -- paired-end reads
-c -- count rawReads
-p 6 -- use 6 processes at once so process 6 samples through cutadapt at once
-o /data/hi-seq/HRDP.Liver.totalRNA.2018-01-10/trimmedReads/v2 -- output path
-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTATGCCGTCTTCTGCTTG -- pass the adapter sequence for trimming with the cutadapt -a parameter for read1.
-A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -- pass the adapter sequence for trimming with the cutadapt -A parameter for read2.
-i .fastq.gz -- process files in the inputDir that end in .fastq.gz
-d _R -- truncate the sample name after _R
inputDir -- the input directory where the rawReads are to count/trim.