RSEM - SabaLab/RNASeq_Scripts GitHub Wiki
Usage
runRSEM_batch.py
Run an entire folder of either trimmed/cleaned samples through RSEM against a static or strain specific index.
Usage: runRSEM_batch.py [options] inputPath indexPath indexFile numProcessors
Options:
--version show program's version number and exit
-h, --help show this help message and exit
-a, --input-sample-dir
When set input is assumed to be in
/path/sample/unmapped.end1.fq etc. default assumes
/path/sample.fq
--rsem-time pass rsem the --time parameter
--rsem-seedLen=RSEMSEEDLEN
pass rsem the --seed-length parameter and value
--rsem-seed=RSEMSEED pass rsem the --seed parameter and value
--rsem-bowtie2 pass rsem the --bowtie2 parameters
--rsem-noBam pass rsem the --no-bam-ouput parameter
--rsem-fwProb=RSEMFWPROB
pass rsem the --forward-prob parameter and value
-P, --paired pass rsem the --paired-end parameter and look for
paired end files to pass appropriate paired end files
to rsem
-U, --unpaired pass rsem appropriate unpaired files/parameters
-d SAMPLEDELIM, --delim=SAMPLEDELIM
A delimiter to detect the end of the sample label,
default is _L00 to parse everything before the lane as
the sample name.
--pair-prefix=PAIRPREFIX
A prefix before the paired label part of the file name
for paired reads, defaults to _R and assumes _R1 -
first read and _R2 is second read
-o OUTPUT, --output=OUTPUT
The output folder. Output will go to a folder with
the extracted Sample Name in this location.
-i INSUFFIX, --input-suffix=INSUFFIX
The suffix to look for in the inpute files. ex .fq.gz
or .fastq
-s, --index-ssg When set the begining of the filename is expected to
denote strain. The strain will be parsed and used to
align to a strain specific genome.
Example
/usr/local/scripts/runRSEM_batch.py -a --rsem-bowtie2 --rsem-noBam --rsem-fwProb 0.0 -P -s -i .fq.gz -o /data/hi-seq/HRDP.Brain.totalRNA.2017-09-01/RSEM_test/ /data/hi-seq/HRDP.Brain.totalRNA.2017-09-01/cleanedReads/rn6.v1/ /data/rn6/index/rsem.ssg.ens/ .rn6.spikes 16
-a use /data/hi-seq/HRDP.Brain.totalRNA.2017-09-01/cleanedReads/rn6.v1/Sample/unmapped.end1.fq.gz as input files
-P paired input - also using .../Sample/unmapped.end2.fq.gz as R2 input
-s use strain specific genomes initial file name part delimited by either _ or - must match a strain name in the index location.
-i look for files with this suffix .fq.gz to allow differences in name such as .fastq.gz or .fq (please don't use noncompressed fastq files)
-o output folder
--rsem-bowtie2 RSEM parameter for bowtie2
--rsem-noBam RSEM parameter do not output BAM file
--rsem-fwProb 0.0 RSEM parameter for strandedness
Input Folder Where is the input
Index Folder Where are RSEM index files
Index Suffix If strain specific alignment what is the common suffix of the file names ex .rn6.spikes for BXH2.rn6.spikes,BN.rn6.spikes,etc.
Number of Processors to use