RNA Seq Data Analysis Tutorials - ricket-sjtu/bioinformatics GitHub Wiki

Nowadays, RNA-seq technology plays a pivotal role in characterizing the transcriptome in a given sample. Quantification of gene/transcript expression, identification of novel transcripts, and detection of fusion transcripts are the three major applications of RNA-Seq.

The RNA-seq data analysis can be grouped into three categories:

  • Model-based approaches with both reference genome and transcriptome information.
  • Semi-model approaches with only the reference genome information.
  • Non-model approaches without reference genome and transcriptome information.

1. ALIGNMENT: Short read mapping

Accurate mapping of RNA-seq reads to the reference genome/transcriptome is the critical step for downstream analysis of transcript assembly, isoform detection, quantification and fusion detection. The running speed, sensitivity and specificity is the three essential metrics for the performance assessment.

1.1 algorithms

  • Hash-based
  • Burrows-Wheeler-Transform (BWT)-based
  • FM-index
  • Graph FM-index

1.2 Mapping tools

  1. Bowtie2
  2. TopHat2
  3. HISAT2
  4. Kallisto
  5. Salmon
  6. Sailfish
  7. SeqMap
  8. STAR

2. MODEL: Estimation of Gene and transcript expression

  1. BitSeq
  2. cufflinks
  3. htseq
  4. IsoEM
  5. Kallisto
  6. RSEM
  7. rSeq
  8. Sailfish
  9. Salmon
  10. STAR
  11. Stringtie
  12. eXpress

3. DEA: Differential Expression Analysis (DEA)

  1. Ballgown
  2. baySeq
  3. BitSeq
  4. cuffdiff
  5. DESeq2
  6. EBseq
  7. edgeR: Exact test
  8. limma+vst/voom transformation
  9. NBPseq
  10. NOISeqBIO
  11. SAMseq
  12. Sleuth

4. Fusion Detection

To detect the transcript-level fusion events, we should have a look at both the paired-end reads that aberrantly cross different genomic regions, and also the single reads that span the fusion junction.

4. WORKFLOW

4.0 PREPROCESSING

  • FastQC
  • Trimmomatic
  • Cutadapt

4.1 tuxedo

  • tophat2 + cufflinks + cuffdiff
  • hisat2 + Stringtie + ballgown
  • bowtie2 + rsem + edgeR
  • Kallisto + sleuth

4.2 Genome/transcriptome datasets

4.3 RNA-seq data

  • GEO
  • SRA
  • Download the data: prefetch -v SRR3126346; ascp
  • $ASPERA/bin/ascp -i /root/.aspera/connect/etc/asperaweb_id_dsa.putty -pQTk1 -l 300m [email protected]:data/sracloud/srapub/SRR3126346 /root/ncbi/public/sra/SRR3126346.sra

REFERENCE