1__ML in genomics - xinshuaiqi/My_books GitHub Wiki
Machine Learning in genomics
[TOC]
Wiki: Machine learning in bioinformatics
Entry level knowledge:
Basic
[莫烦python (python 入门+ Machine Learning)
TensorFlow 官方文档中文版
Keras Documentation](https://keras.io/) and 中文文档
People in ML
Geoff Hinton, google, godfather of Deep learning
University of Toronto Computer Science
https://en.wikipedia.org/wiki/Geoffrey_Hinton
Yann LeCun, facebook, father of CNN
https://en.wikipedia.org/wiki/Yann_LeCun
Yoshua Bengio
Canaca; ANN; DL;
Yair Weiss
Advanced knowledge
Useful resources:
A list of deep learning implementations in biology: link
A curated list of awesome deep learning applications in the field of computational biology link
**Practical Deep Learning For Coders**
Udemy: Deep Learning A-Z™: Hands-On Artificial Neural Networks $$
YouTube
Deep learning for genomics - Anshul Kundaje
James Zou: "Deep learning for genomics: Introduction and examples"
James Zhou Teaching
CS 329M Algorithms of advanced machine learning. **(website)**https://canvas.stanford.edu/courses/51037
CS 273B Deep learning in genomics and bio-medicine. (website) https://canvas.stanford.edu/courses/66218/
see my github for the downloaed course material
Anshul Kundaje Teaching
Tutorials: https://sites.google.com/site/anshulkundaje/inotes
Chinese in ML
Yifei Chen
https://scholar.google.com/citations?user=ygovHrsAAAAJ&hl=en
Department of Computer Science, University of California, Irvine, CA 92697, USA
JIAN ZHOU
http://www.princeton.edu/~jzthree/
James Zhou
https://sites.google.com/site/jamesyzou/home
Papers:
Evolutionarily informed deep learning methods: Predicting transcript abundance from DNA sequence
Deep Learning for Genomics: A Concise Overview
Deep learning for biology
Deep Learning for Population Genetic Inference
* jointly inferring natural selection and demography
* We find many regions of the genome that have experienced hard sweeps, and fewer under selection on standing variation (soft sweep) or balancing selection.
Deep learning for computational biology
C Angermueller, T Pärnamaa, L Parts… - Molecular systems …, 2016 - msb.embopress.org
# the most accurate prediction of gene expression levels is currently made from a broad set of epigenetic features using sparse linear models (Karlic et al, 2010; Cheng et al, 2011) or random forests (Li et al, 2015)
Most of these applications can be described within the canonical machine learning workflow, which involves four steps:
* data cleaning and pre‐processing,
* feature extraction,
* model fitting and
* evaluation
A major recent advance in machine learning is automating this critical step by learning a suitable representation of the data with deep artificial neural networks
Briefly, a deep neural network takes the raw data at the lowest (input) layer and transforms them into increasingly abstract feature representations by successively combining outputs from the preceding layer in a data‐driven manner, encapsulating highly complicated functions in the process
depth of the layer X the width of the layer
backward propagation algorithm ultimately enabling efficient training of neural networks using stochastic gradient descent
其他的DNN变形
convolutional neural networks, which are widely used for **modelling images**
recurrent neural networks for **sequential data**
restricted Boltzmann machines and autoencoders for **unsupervised learning**
comprehensive background on all technical details, which can be found in the more specialized literature (Bengio, 2012; Bengio et al, 2013; Deng, 2014; Schmidhuber, 2015; Goodfellow et al, 2016).
Machine learning* applications in genetics and genomics
MW Libbrecht, WS Noble - Nature Reviews Genetics, 2015 - nature.com
Machine Learning in Genomics – Current Efforts and Future Applications
Deep learning meets genome biology
An interview with Brendan Frey about realizing new possibilities in genomic medicine.
TRAINING AND APPLYING GENOMIC DEEP LEARNING MODELS
Opportunities and obstacles for deep learning in biology and medicine
T Ching, DS Himmelstein… - Journal of The …, 2018 - rsif.royalsocietypublishing.org
# Abstract:
Convolutionary neural network (CNN) is a popular choice for supervised DNA motif prediction due to its excellent performances. To employ CNN, the input DNA sequences are required to be encoded as numerical values and represented as either vectors or multi-dimensional matrices. This paper evaluated a simple and more compact ordinal encoding method versus the popular one-hot encoding for DNA sequences.
We compared the performances of both encoding methods using three sets of datasets enriched with DNA motifs. We found that
1. the ordinal encoding performs comparable to the one-hot method but with significant reduction in training time.
2. In addition, the one-hot encoding performances were rather consistent across various datasets but would require suitable CNN configuration to perform well.
3. The ordinal encoding with matrix representation performed best in some of the evaluated datasets.
This study implied that the performances of CNN for DNA motif discovery depends on the suitable design of the sequence encoding and representation. The good performances of the ordinal encoding method demonstrates that there are still rooms for improvement for the one-hot encoding method.
Nice examples
By Lex Flagel
The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference https://www.biorxiv.org/content/biorxiv/early/2018/10/22/336073.full.pdf https://github.com/flag0010/pop_gen_cnn
Evaluation of convolutionary neural networks modeling of DNA sequences using ordinal versus one-hot encoding method
"Whole-genome deep learning analysis reveals causal role of noncoding mutations in autism", Zhou et al 2018 https://t.co/gv8KodjhDr
@@Deep learning of genomic variation and regulatory network data
A Telenti, C Lippert, PC Chang… - Human molecular …, 2018 - academic.oup.com
Books
Hands-On Machine Learning with Scikit-Learn and TensorFlow
The Elements of Statistical Learning
Tools
One_Hot_Encoder
**One_Hot_Encoder**
Nucleus:基因组学的 TensorFlow 工具包(TensorFlow 开发顶峰 2018)
[Video](https://www.youtube.com/watch?v=7wi9NdGh9oI&feature=share)
start-up
- formerly known as Google Life Sciences
https://www.deepgenomics.com/ predict the effects of a particular mutation based on its analyses of hundreds of thousands of examples of other mutations; even if there’s not already a record of what those mutations do.
develop a database that provides predictions for how more than 300 million genetic variations could affect a genetic code
track record of publishing at top machine learning conferences (NIPS, ICML, ICLR) or has applied deep learning to genomics in a top life sciences journal.
Overall start-up field in genomics
https://medicalfuturist.com/top-companies-genomics
- Personal genomics
- Pharmacogenomics
- Genomics combined with Artificial Intelligence
- Precision Oncology/Medicine
- Genetic ancestry
- Sequencing tech
- Edico Genome: having sequenced a human genome in just 26 hours in 2015
- the first data processor to ever be solely designed for genome sequencing, called DRAGEN.
- CRISPR gene editing
The Incredible Convergence Of Deep Learning And Genomics.pdf https://github.com/xinshuaiqi/My_books/blob/master/The%20Incredible%20Convergence%20Of%20Deep%20Learning%20And%20Genomics.pdf