RNA Seq Data Analysis Tutorials - ricket-sjtu/bioinformatics GitHub Wiki
Nowadays, RNA-seq technology plays a pivotal role in characterizing the transcriptome in a given sample. Quantification of gene/transcript expression, identification of novel transcripts, and detection of fusion transcripts are the three major applications of RNA-Seq.
The RNA-seq data analysis can be grouped into three categories:
- Model-based approaches with both reference genome and transcriptome information.
- Semi-model approaches with only the reference genome information.
- Non-model approaches without reference genome and transcriptome information.
1. ALIGNMENT: Short read mapping
Accurate mapping of RNA-seq reads to the reference genome/transcriptome is the critical step for downstream analysis of transcript assembly, isoform detection, quantification and fusion detection. The running speed, sensitivity and specificity is the three essential metrics for the performance assessment.
1.1 algorithms
- Hash-based
- Burrows-Wheeler-Transform (BWT)-based
- FM-index
- Graph FM-index
1.2 Mapping tools
- Bowtie2
- TopHat2
- HISAT2
- Kallisto
- Salmon
- Sailfish
- SeqMap
- STAR
2. MODEL: Estimation of Gene and transcript expression
- BitSeq
- cufflinks
- htseq
- IsoEM
- Kallisto
- RSEM
- rSeq
- Sailfish
- Salmon
- STAR
- Stringtie
- eXpress
3. DEA: Differential Expression Analysis (DEA)
- Ballgown
- baySeq
- BitSeq
- cuffdiff
- DESeq2
- EBseq
- edgeR: Exact test
- limma+vst/voom transformation
- NBPseq
- NOISeqBIO
- SAMseq
- Sleuth
4. Fusion Detection
To detect the transcript-level fusion events, we should have a look at both the paired-end reads that aberrantly cross different genomic regions, and also the single reads that span the fusion junction.
4. WORKFLOW
4.0 PREPROCESSING
- FastQC
- Trimmomatic
- Cutadapt
4.1 tuxedo
- tophat2 + cufflinks + cuffdiff
- hisat2 + Stringtie + ballgown
- bowtie2 + rsem + edgeR
- Kallisto + sleuth
4.2 Genome/transcriptome datasets
- Homo_sapiens genome
- Homo_sapiens transcriptome
- Build index
- Gapped Alignment
- Estimation of the abundance
- Differential expression analysis
- Enrichment analysis
4.3 RNA-seq data
- GEO
- SRA
- Download the data:
prefetch -v SRR3126346
;ascp
$ASPERA/bin/ascp -i /root/.aspera/connect/etc/asperaweb_id_dsa.putty -pQTk1 -l 300m [email protected]:data/sracloud/srapub/SRR3126346 /root/ncbi/public/sra/SRR3126346.sra
REFERENCE
- Conesa A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016
- Oshlack A. et al. From RNA-seq reads to differential expression results. Genome Biol. 2010
- Dobin A1 and Gingeras TR. Mapping RNA-seq Reads with STAR. Curr Protoc Bioinformatics. 2015
- Shailesh Kumar et al. Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data. Sci Rep. 2016