Running AltAnalyze on an HPC - nsalomonis/altanalyze GitHub Wiki

Introduction

To run AltAnalyze from command-line, you will need to have installed AltAnalyze and make sure the source code is in the AltAnalyze program main directory. If you have downloaded the python-source code or Linux version, this will be the case, otherwise, you will need to copy the contents of the folder "Source_code" to the parent AltAnalyze directory. Before supplying the command-line argument to this program, you will need to open a command prompt and change to the directory with the AltAnalyze source code. The below instructions are designed for an LSF cluster.

Generate BAM files from FASTQ files using STAR - ideally with strand predictions

FASTQ1=$1
FASTQ2=${FASTQ1/_read1/_read2}
SAMPLE=$(basename $FASTQ1 .fastq.gz)

DIR=$(pwd)

cat <<EOF
#BSUB -L /bin/bash
#BSUB -W 10:00
#BSUB -n 4
#BSUB -R "span[ptile=4]"
#BSUB -M 98000
#BSUB -J $SAMPLE

cd $DIR
module load STAR/2.6.1

STAR --genomeDir /data/Hs/Grch38-STAR-index --readFilesIn $FASTQ1 $FASTQ2 --readFilesCommand gunzip -c --outFileNamePrefix $DIR/$SAMPLE --runThreadN 4 --outSAMstrandField intronMotif --outSAMtype BAM SortedByCoordinate --sjdbGTFfile /data/Hs/Star-Index-GRCH38/Homo_sapiens.GRCh38.85.gtf --limitBAMsortRAM 97417671648
EOF

### Run as: for i in *_read1_*.fastq.gz; do ./STARhg38.sh $i | bsub; done

Downloading and installing a species specific database (human)

module load python/2.7.5
python AltAnalyze.py --species Hs --update Official --version EnsMart100 --additional all

Exporting a Junction and Intron BED reference file for BedTools

BAM=$1
SAMPLE=$(basename $BAM .bam)
DIR=$(pwd)

cat <<EOF
#BSUB -L /bin/bash
#BSUB -W 10:00
#BSUB -n 2
#BSUB -R "span[ptile=2]"
#BSUB -M 16000
#BSUB -J $SAMPLE

cd $DIR
module load python/2.7.5
module load samtools

#Export exon-exon junction counts
python /data/AltAnalyze/import_scripts/BAMtoJunctionBED.py --i $BAM --species Hs --r /data/AltAnalyze/AltDatabase/EnsMart100/ensembl/Hs/Hs_Ensembl_exon.txt

#Export exon-intron junction counts
python /data/AltAnalyze/import_scripts/BAMtoExonBED.py --i $BAM --r /data/AltAnalyze/AltDatabase/EnsMart100/ensembl/Hs/Hs_Ensembl_exon.txt --s Hs

EOF
### Run as: for i in *.bam; do BAMtoBEDhg38.sh $i | bsub; done

Create Sample Groups and Comparison Files

See the instructions here. These must have consistent names with the expname noted below (groups. and comps.). Ideally, these should be stored in the same directory as the BAM files to allow for automated SashimiPlot creation

Perform Differential Gene and Splicing Analyses


cat <<EOF
#BSUB -L /bin/bash
#BSUB -W 60:00
#BSUB -n 4
#BSUB -R "span[ptile=4]"
#BSUB -M 96000

module load python/2.7.5
module load R

python /data/AltAnalyze/AltAnalyze.py --species Hs --platform RNASeq --bedDir "/data/experiment/" --groupdir "/data/experiment/groups.TumorsAndControls.txt" --compdir "/data/experiment/comps.TumorsAndControls.txt" --output "/data/experiment" --expname "TumorsAndControls" --GEelitefold 1.5 --GEelitepval 0.05 --GEeliteptype "adjp" --multiProcessing yes

### Run as: ./AltAnalyze.sh | bsub

The primary outputs of AltAnalyze will contain:

  1. Gene expression quantification as gene-level junction RPKMs (ExpressionInput/exp.TumorsAndControls-steady-state.txt)
  2. Junction-level counts (ExpressionInput/counts.TumorsAndControls.txt)
  3. Differential expression analysis results (ExpressionOutput/DATASET-TumorsAndControls.txt)
  4. Gene-set and pathway enrichment results (GO-Elite) (GO-Elite)
  5. Transcriptional regulatory networks (GO-Elite/regulated/networks)
  6. Alternative splicing PSI values (AltResults/AlternativeOutput/Hs_RNASeq_top_alt_junctions-PSI_EventAnnotation.txt)
  7. Differential splicing results (AltResults/AlternativeOutput/Events-dPSI_0.1_rawp)
  8. MarkerGenes (DataPlots/MarkerFinder)
  9. QC results (DataPlots)
  10. SashimiPlots (SashimiPlots)