2__Bioinformatics - xinshuaiqi/My_books GitHub Wiki

Xinshuai Qi's Summary and Notes on Bioinformatics

-- by Xinshuai Qi

[TOC]

（last update on 12-4-2017）

Transcrptome Assembly

Raw reads clean
De novo Assembly
- Trinity
  - 279 citation since 2011
- Velvet
  - 6837 citation since 2008
SOAPdenovoTrans
- 359 since 2014
Reference-based Assembly
- Samtools
- VCFtools
- BCFtools
Picard
- manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Genome assembly

ABySS
- paper
- 2458 citation since 2009
AllPATH LG
- designed for at least 2 short reads library, high coverage; not support polyploid
- graph based results
- from the Computational Research and Development group at the Broad Institute
- 773 citation since 2008

PacBio assembly

FALCON
HGAP
- developed by PacBio
PBJelly

improvement

mummer4

mummer3 manual

mummerplots with ggplot2

evaluation of the quality

Genome

QUEST
- paper
- 2013 citation: 879
BUSCO: Benchmarking Universal Single-Copy Orthologs, named BUSCO. *
PEAPRforcus on the error rate.
- paper
GAGE (Genome Assembly Gold-standard Evaluations)
- data quality is more important than the assembler
- 各自软件差异很大

paper: # A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies

string-based assemblers
overlap-layout-consensus (OLC) assemblers
De Bruijn graph-based assemblers: good for large short-reads dataset

Transcriptome

transrate
detonate

polyploid SNP calling

polycat## evaluate the quality of assembly

Genome Evolution and Genomics

OrthoFinder
Circos
SnpEff
Gene Ontology
LeafJ
LTR retriever
Omictools

Phylogenetics

RAxML
PAML
BEAST
TreeMix
Detecting trait-dependent evolutionary rate shifts in sequence sites
(ABBA/BABA test)[http://www.popgen.dk/angsd/index.php/Abbababa]

Population Genetics and phylogeography

Population genetics and genomics in R

STRUCTURE
PCA and smartPCA
Provean
BAD-Mutations
HAPMIX
∂a∂i
DIY-ABC
fastSimCoal
SLiM2
TCS
Ecological Niche Modeling
- WorldClim
SMC++ github
- a program for estimating the size history of populations from whole genome sequence data. * ABBA-BABA test
- also called the D-statistic
- tests for ancient admixture

RNASeq

wiki tools for RNASeq https://en.wikipedia.org/wiki/List_of_RNA-Seq_bioinformatics_tools

RNASeq Course

Fastaq trim

Sickle
SnoWhite
Trimmonatic

denovo assembly

Volvet
Trinity
SOAPdenovoTrans

evaluation:

DETONATE score
TransRate
Ultra-conserved elements (UCEs)

RNA-Seq aligner => generate SAM file RNASeq Course

Differential Expression

TopHat
Cufflinks
STAR RNA-Seq aligner
- to use cufflinks, you need to set FLAGS while run.
HISAT
- Tophat的升级版
* use hierarchical , large set of small indexes. NOT one global index for the genome.
- build on Bowtie2
  - use FM-index
- salmon
- RSEM
- HyLite gene expression in hybrid or poly
  - paper
Subreads, Limma and EdgeR
kallisto and sleuth by pachterlab

HISAT, StringTie and Ballgown
- A replacement of the old TOPHAT and Cufflinks solution.

HISAT vs STAR vs TopHat(https://plus.google.com/+MarkZiemann1/posts/FcoyDzJ7khU) 基本上差不多

Samtools: SAM to BAM

evaluation of RNA-Seq alignment

mapped reads %

Functional analysis (enrichment, co-expression)

Functional visualization --Guangchuang Yu UHK
ClusterProfiler Differential Expression
- TopHat
- Cufflinks
- Subreads, Limma and EdgeR
- kallisto and sleuth by pachterlab

eQTL

eQTL mapping using RNA-Seq data

Enrichment and Coexpression

eQTL

eQTL mapping using RNA-Seq data

Alternative splicing

classification
- Skipped exon
  - A-B-C
  - A-_-C
- Alternative 5' splice site
  - A-C
  - B-C
- Alternative 3' splice site
  - A-B
  - A-C
- Mutually exclusive exons
  - A-B-D
  - A-C-D
- Retained intron
  - A-B
  - A-(intron)-B
Steps:
- exon reads GDE
- isoforms GDE
- junctions

Tools

RSEM
- aligns reads to transcripts using Bowtie
- Output isoform level expression level
DEXSeq
- Analyzing RNA-seq data for differential exon usage with the "DEXSeq" package
Cufflinks
MATS
SpliceR

Application of RNA-Seq in Diagnostics

Translating RNA sequencing into clinical diagnostics: opportunities and challenges

Genetic testing: The diagnostic power of RNA-seq

Examples using RNA-Seq for Diagnosis:

novel pathway in PAH

New Directions of RNA-Seq analysis

RNA-Seq in different tissue
Different time
single-cell RNA-Seq
integrate RNA-Seq with GWAS
```
  # Quantitative Genetics, GWAS, and Statistics
```
- PLink (tutorial[http://zzz.bwh.harvard.edu/plink/tutorial.shtml#t6)

QQ plot图——评价你的统计模型是否合理
GWAS training by CBI
- GWAS Adjusting for Covariates and Stratification
Candidate Gene Association Study

The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within pre-specified genes of interest and phenotypes or disease states. This is in contrast to genome-wide association studies (GWAS), which scan the entire genome for common genetic variation.

GWAS local visualization

ascertainment bias: make sure use "clearly defined phenotypes for case and control"
population stratification:
- subtle ancestral differences in case and control __ gene~ethnicity association
  - Using Principal Components Analysis (PCA)as a Surrogate for Genetic Ancestry
  - adjusting for principal components of genetic ancestry.
- gender, env
Bonferroni correction 5*10-7
- Standard Bonferroni correction
- Test each SNP at the α* =α /m1 level
- Where m1 = number of markers tested
- Assuming m1 = 500,000, a Bonferroni-corrected threshold of α*= 0.05/500,000 = 1x10–7
- Conservative when the tests are correlated
HWE: For a rare disease (or no/modest genetic effects), genotype frequencies in controls should (nearly) follow HWE -imputation: Using LD and Hapmap/1000 Genomes to Impute Untyped SNPs

Phenotyping

plantCV

2__Bioinformatics - xinshuaiqi/My_books GitHub Wiki

Xinshuai Qi's Summary and Notes on Bioinformatics

-- by Xinshuai Qi

Transcrptome Assembly

Genome assembly

PacBio assembly

mummer4

evaluation of the quality

polyploid SNP calling

Genome Evolution and Genomics

Phylogenetics

Population Genetics and phylogeography

RNASeq

Fastaq trim

denovo assembly

RNA-Seq aligner => generate SAM file RNASeq Course

Differential Expression

* use hierarchical , large set of small indexes. NOT one global index for the genome.

Samtools: SAM to BAM

evaluation of RNA-Seq alignment

Functional analysis (enrichment, co-expression)

eQTL

Enrichment and Coexpression

eQTL

Alternative splicing

Tools

Application of RNA-Seq in Diagnostics

New Directions of RNA-Seq analysis

Phenotyping