Day 1 - QCB-Collaboratory/W17.MachineLearning GitHub Wiki
- Slides are available here
- The video is available here
-
Here is a (static) Jupyter Notebook with all commands from the first day.
The live version above does not require accounts or virtually anything installed on your own computer. It usually takes a few minutes (on average 3 min., certainly less than 10) for the notebook to be ready. But once it's on your screen, it runs smoothly.
- In-class practice for Decision Trees and Random Forests.
- Click here to download the data used in the Decision Tree practice.
- Breast Cancer Wisconsin (Diagnostic) Data Set available on UCI Machine Learning Database.
- Original paper that published this dataset.
- You can find here the video from slide 82, published by Mack et al. in Nature Communications.
In order to make Jupyter work with the language R, you need to install the R kernel. A kernel will be the interface between the Notebook and the language.
The kernel for R is called IRKernel. If you are using Anaconda distribution, you can install it directly by following directly this link. If you do not use Anaconda, then you need to install it directly from IRKernel's website.
You will find below a list of great examples of notebooks to use as inspiration for your own work. Because all of these notebooks are publicly available, you can download them and open locally to examine them. If you want even more notebooks, check out this gallery of notebooks provided by the Jupyter project.
Genomics and NGA
- Python for Bioinformatics, associated with the book by Tiago Antao
- Data Analysis on Gene-genome Correlation using Regression Models, by Abella et al.
- Using t-SNE to visualize Gold Standard, by Nico Chaves
- Reproducible Genomic Interpretation Tools for Translational Medicine: Application to An N of 1 Case Study, by C. Mazzaferro and K. Fisch
- An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study, by Zichen Wang and Avi Ma'ayan
- Lung Cancer Post-Translational Modification and Gene Expression Regulation, by the Ma'ayan Lab
- Clustering methods applied to TCGA Ovarian Cancer Coexpression Matrix, by Brin Rosenthal
- 5 Analyzing Core Diversity, by Amanda Birmingham
Cell and molecular biology
- Autoencoders for calcium fluorescence, by Benjamin Bolte
- Calcium Imaging Segmentation with Neural Networks, by Alex Klibisz
- Analysis of time lapse images of plates with growing colonies, by Jorge Riveros Vergara
- Interlab Study for iGEM 2015, Brazil-USP team.
Ecology and evolutionary biology
- Reverse Ecology of Uncultivated Freshwater Actinobacteria, by Joshua Hamilton
- Example of feature extraction from images, by Ben Weinstein
Data visualization
- Amazing notebook with 21 examples of plots using various Python libraries
- Visualization of Gene Expression using cluster grammer
- 2010 US Census data, by James Bednar
- Overview of Plotly's Python API
- Overview of Bokeh, a Python interactive visualization library.
- Visualizing complex valued functions
Generic data analysis and introductory notebooks
- Introduction to Data Analysis used as training material during the 2015 iGEM competition.
- Overview of Pandas, by Hedaro.com
- Example of Machine Learning with the Iris dataset.
If you want to learn more about object orientation in Python, you can find below some resources to help you getting started.
- An Introduction to Programming for Bioscientists: A Python-Based Primer, by Ekmekci et al.
- Introduction to classes in Python by Udacity
- Object-oriented programming for scientists, by Tjelvar Olsson