RSEM - SabaLab/RNASeq_Scripts GitHub Wiki

Usage

runRSEM_batch.py

Run an entire folder of either trimmed/cleaned samples through RSEM against a static or strain specific index.

Usage: runRSEM_batch.py [options] inputPath indexPath indexFile numProcessors  
  
 Options:  
  --version             show program's version number and exit  
  -h, --help            show this help message and exit  
  -a, --input-sample-dir  
                        When set input is assumed to be in  
                        /path/sample/unmapped.end1.fq etc.  default assumes  
                        /path/sample.fq  
  --rsem-time           pass rsem the --time parameter  
  --rsem-seedLen=RSEMSEEDLEN  
                        pass rsem the --seed-length parameter and value  
  --rsem-seed=RSEMSEED  pass rsem the --seed parameter and value  
  --rsem-bowtie2        pass rsem the --bowtie2 parameters
  --rsem-noBam          pass rsem the --no-bam-ouput parameter
  --rsem-fwProb=RSEMFWPROB
                        pass rsem the --forward-prob parameter and value
  -P, --paired          pass rsem the --paired-end parameter and look for
                        paired end files to pass appropriate paired end files
                        to rsem
  -U, --unpaired        pass rsem appropriate unpaired files/parameters
  -d SAMPLEDELIM, --delim=SAMPLEDELIM
                        A delimiter to detect the end of the sample label,
                        default is _L00 to parse everything before the lane as
                        the sample name.
  --pair-prefix=PAIRPREFIX
                        A prefix before the paired label part of the file name
                        for paired reads, defaults to _R and assumes _R1 -
                        first read and _R2 is second read
  -o OUTPUT, --output=OUTPUT
                        The output folder.  Output will go to a folder with
                        the extracted Sample Name in this location.
  -i INSUFFIX, --input-suffix=INSUFFIX
                        The suffix to look for in the inpute files.  ex .fq.gz
                        or .fastq
  -s, --index-ssg       When set the begining of the filename is expected to
                        denote strain.  The strain will be parsed and used to
                        align to a strain specific genome.

Example

/usr/local/scripts/runRSEM_batch.py -a --rsem-bowtie2 --rsem-noBam --rsem-fwProb 0.0 -P -s -i .fq.gz -o /data/hi-seq/HRDP.Brain.totalRNA.2017-09-01/RSEM_test/ /data/hi-seq/HRDP.Brain.totalRNA.2017-09-01/cleanedReads/rn6.v1/ /data/rn6/index/rsem.ssg.ens/ .rn6.spikes 16

-a use /data/hi-seq/HRDP.Brain.totalRNA.2017-09-01/cleanedReads/rn6.v1/Sample/unmapped.end1.fq.gz as input files
-P paired input - also using .../Sample/unmapped.end2.fq.gz as R2 input
-s use strain specific genomes initial file name part delimited by either _ or - must match a strain name in the index location.
-i look for files with this suffix .fq.gz to allow differences in name such as .fastq.gz or .fq (please don't use noncompressed fastq files)
-o output folder
--rsem-bowtie2 RSEM parameter for bowtie2
--rsem-noBam RSEM parameter do not output BAM file
--rsem-fwProb 0.0 RSEM parameter for strandedness
Input Folder Where is the input
Index Folder Where are RSEM index files
Index Suffix If strain specific alignment what is the common suffix of the file names ex .rn6.spikes for BXH2.rn6.spikes,BN.rn6.spikes,etc.
Number of Processors to use