[TOC]

# Machine learning A-Z**

Regression

Simple Linear Regression

Relationships between two continuous (quantitative) variables y = b0 + b1X1 in R:

regressor=lm(formula=Salary~ YearsExperience, 
					data = training_set)
summary(regressor)
# the coefficients significance indicates how strong does x associate with the Y.

X independent variable

Y dependent variable coefficients (least square coefficients, LSC; estimation based on the observed data)

evaluate the accuracy of model

residual SD : the average amount that the response will deviate from the true regression line.

the absolute measure of lack of fit of the model.

R² : the proportion of variance explained (PVE) by the regression. Range from [0,1]

close to 0, either the model is wrong, or the inherent error σ² is high.

Multiple Linear Regression

Assumptions of linear regression

Linearity
Homoscedasticity 方差齐性[不同样本的总体方差是否相同]
Multivariate normality
Independence of errors
Lack of multicollinearity

Taking care of the dummy variables

Such as : NYC and CA => 0, 1 You can not including both dummy variables in the multi linear regression.

Because of dummy variable trap: always omit one dummy variable (n -1)

How to build a model

Why choose variables

garbage IN, garbage OUT
you have to explain, only keep the right variables

Methods to build a model

All in
Backward elimination (stepwise regression) fastest one among the five
1. Select a significant level, such as 0.05
2. fit the model with all possible predictors
3. remove the variable with the highest P value that >0.05
4. fit the model again (after remove step 3 variable)
5. Until all P value smaller than 0.05
Forward selection (stepwise regression)
1. Select a significant level, such as 0.05
2. all simple linear regression models, select the one with the smallest P value
3. two variable, 3, 4, variable linear regression.
4. when > 0.05, the previous model is the best model
Bidirectional elimination (stepwise regression)
1. select 0.05
2. forward selection, add a new variable
3. Backward elimination
4. forward, Back, forward, Back.
5. No more variables can enter, no old variables can exit.
All possible models
1. select goodness of fit (eg Akaike criterion)
2. construct all possible models (2square n -1) 10 columns =>1023 models
3. Select the one with the best criterion.
score comparison

Polynomial Regression

y = b0 + b1x1 + b2X1² This is still linear regression. The linear refers to the power of coefficients.

Support Vector Regression (SVR)

Decision Tree Regression

CART

A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences

qxs: after subsets the plots, get distinct mean values among the subsets.

*[qxs]: Xinshuai Qi

Random Forest Regression

Ensemble Learning. Like bootstrap

pick K points from the whole dataset
build a decision tree based on the subset
build another, another decision tree...
get N results, take average of these N results.

Evaluating Regression Models Performance

R-squared:

SS_res = (yi-yi`)²

SS_total = (yi-y_ave`)²

R² =1-(SS_res-SS_total )

goodness of fit

How good is your line (compare to the average line)!
The closer to 1, the better your model is.

adjusted R-squared

Problem of R²:
add more variable always increase the R².
penalize for add additional variable.

Classification

Logistic Regression

Estimate the probability of 1 or 0

K-Nearest Neighbors (K-NN)

choose the number of K
using Euclidean distance find the nearest neighbors.
count the numbers of nearest points of the new point in each group.
put the new point to the group with the most nearest-neighbors.

Support Vector Machine (SVM)

find the best boundary; find the support vectors the line in the middle called: maximum margin hyperplane/classifier Merits

apple like orange, orange like apple

Kernel SVM

when data is not linear separatable
use 3D; map to a higher dimension
can be highly compute-intensive
Gaussian RBF Kernel

Types of Kernel Function

Gaussian RBF Kernel
Sigmoid Kernel
Polynomial Kernel

ML Kernels

Naive Bayes

Why "Naive"?

Bayes requires the variables are independent, but in many cases, that is not true. Thus the assumptions are "naive".
P (Drives | X ) = P (X |drives) * P(drives) / P(x)
poster probability: P (Drives | X ) ; PP 样本X中，事件发生 ("1") 的概率
likelihood: P (X |drives); sample X are people who drive
prior probability: 在总样本中，事件发生 ("1") 的概率
P(X) 样本占总体的比例　可以忽略，当你compare 0 和　１的概率.

Decision Tree Classification

Random Forest Classification

Evaluating Classification Models Performance

CLUSTERING

K-Means Clustering

Hierarchical Clustering

very similar or same as K-mean clustering.

Two types of clustering:

Agglomerative
- each point is one cluster, in total N clusters
- merge the two closest points to one cluster. N-1
Divisive (revise of Agglomerative)
Until you reach only one cluster.

ASSOCIATION RULE LEARNING

Apriori

Eclat

Reinforcement learning

Also called: online learning

Solve interacting problems where the data observed up to time t is considered to decide which action to take at time t + 1

一种试图使用包含复杂结构或由多重非线性变换构成的多个处理层对数据进行高层抽象的算法

use a cascade of multiple layers of nonlinear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input.

Deep learning

ANN - Artificial neural network

not a new thing, but need a LOT of data
Geoffrey Hinton: the GODfather of Deep Learning. [Check his video on YouTube]
Mimic human brain
Artificial Neural Network:
- input layer
  - columns in the data
- Hidden layer
  - weight the input. like regression, least square...
  - each of the neuron weight the input layers differently
  - a combination of the weighed decision of all these neuron can provide a powerful output layer.
- Output layer****
- Hyperbolic Tangent (tanh)

Neuron

Activation function

Threshold (yes/no)
Sigmoid (commonly used in the output layer)
- 类似 logistic regression trendline
Rectifier (commonly used in the hidden layers)
- 折线图: __/
Hyperbolic Tangent (tanh)
- 类似sigmoid, logistic (-1,1)

How it works.

How it learn

once get the output value, compare with the actual value, then feedback on the weight of each neuron. (cost function)
- cost function: what is the gap between prediction and actual value
- reduce the cost, adjust the weight
Adjust the weight of each neuron.
Then feed roles in, then again, again, and again.

list of cost funcstions- Adjust the weight of each neuron.

Then feed roles in, then again, again, and again.

How to decrease the cost function:

Gradient Descent

minimize the cost of function in a very efficient way 碗里的小球，最后停在了碗底

Stochastic Gradient Descent

multiple local minimum, not the best global minimum => use Stochastic gradient descent **steps of deep learning:**

randomly initialise the weights, small number, close to 0
=> forward propagation; calculate the errors
<= back propagation, adjust weights at the same time
repeat 1 and 2 after each observation (reinforcement learning ) or after a batch of observation (batch learning)
then you got an epoch. Redo more epochs.

example

many customer leaved the bank in the past 6 months sample of 10,000 customer, did they left or not Now, why?#

Install libraries

GPU is better for ANN and deep learning. Good at parallel

Theano
TensorFlow
Keras (develop deep learning using Theano and TensorFlow within a few lines)

make ANN

classifier =Sequential()
classifier.add(Dense(output_dim = 6, init = "uniform", activation = 'relu', input_dim = 11))
classifier.add(Dense(output_dim = 6, init = "uniform", activation = 'relu')) 
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier.fit(X_train, y_train, batch_size = 10, nb_epoch = 100)

use the (input column + output column)/2 as your output_dim
the second hidden layer does not need to know the input_dim
if you are dealing with dependent variable (y) with multiple categories, use softmax, instead of sigmoid for the activation function

CNN- Convolutional Neural Networks

image recognition
self-driving car
Yann Lecun, students of Geoffrey Hinton, gradfather of CNN

How it works

(find features in the image)

use a feature detector (3x3, 5x5, 7x7) / filter, go through the image, see if it match or not, create a feature map. Now the image is smaller.
then create many feature maps, together called convolutional layer
apply pooling/downsampling to each of the convolutional layer
repeat. make another convolutional layer and pooling layer
flattening the matrix, as the input layer of a ANN.
add a new ANN. input layer => Full connected layer => output layer

ReLU layer ［rectified linear unit (ReLU)］

Rectifier increase non-linearity in the image play with this link

Forward propagation

Backpropagation 反向传播

Dimensionality reduction

Why two independent variables?

to visualize better There are two types of Dimensionality Reduction techniques:

1 Feature Selection

Backward Elimination, Forward Selection, Bidirectional Elimination, Score Comparison and more. We covered these techniques in Part 2 - Regression.

2 Feature Extraction

Principal Component Analysis (PCA)
Linear Discriminant Analysis (LDA)
Kernel PCA
Quadratic Discriminant Analysis (QDA)

PCA

Linear Discriminant Analysis (LDA) 线性判别分析

wiki a linear combination of features that characterizes or separates two or more classes of objects or events. Udemy: from the n independent variables, LDA extracts p (<n) new independent variables that separate the most classes of the dependent variable. A supervised model.

Kernel PCA

Model Selection

Evaluate the model performance and improve the performance

Model Selection techniques including:

k-Fold Cross Validation

split the data into K iteration, like bootstrap, get mean and SD for the accuracy of the model.

Applying k-Fold Cross Validation

from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10)
accuracies.mean()
accuracies.std()

cv is the K-fold parameter

in R

library(caret)
folds = createFolds(training_set$Purchased, k = 10)
cv = lapply(folds, function(x) {
  training_fold = training_set[-x, ]
  test_fold = training_set[x, ]
  classifier = svm(formula = Purchased ~ .,
                   data = training_fold,
                   type = 'C-classification',
                   kernel = 'radial')
  y_pred = predict(classifier, newdata = test_fold[-3])
  cm = table(test_fold[, 3], y_pred)
  accuracy = (cm[1,1] + cm[2,2]) / (cm[1,1] + cm[2,2] + cm[1,2] + cm[2,1])
  return(accuracy)
})
accuracy = mean(as.numeric(cv))

Grid Search

improve model performance find the optimal value for parameters which model is the best? linear or non-linear

Find the parameters that you want to improve in your model fitting, such as "C", "kernel". Also use K-fold cross validation to estimate the accuracy.

# Applying Grid Search to find the best model and the best parameters
from sklearn.model_selection import GridSearchCV
parameters = [{'C': [1, 10, 100, 1000], 'kernel': ['linear']},
              {'C': [1, 10, 100, 1000], 'kernel': ['rbf'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}]
grid_search = GridSearchCV(estimator = classifier,
                           param_grid = parameters,
                           scoring = 'accuracy',
                           cv = 10,
                           n_jobs = -1) 
						# n_jobs: in case working on large dataset
grid_search = grid_search.fit(X_train, y_train)
best_accuracy = grid_search.best_score_
best_parameters = grid_search.best_params_

XGBoost

Eventually we will finish this course by a last bonus section included in this part, dedicated to one of the most powerful Machine Learning model, that has become more and more popular: XGBoost.

High performance on large dataset

# Fitting XGBoost to the Training set
from xgboost import XGBClassifier
classifier = XGBClassifier()
classifier.fit(X_train, y_train)

Reinforcement learning

robot walking

Natural Language processing

Spoken and Written word
translator
book categories, review,

Machine learning resources

Udemy Machine Learning A-Z

superdatascience Python DataScience Handbook

Top 15 Python Libraries for Data Science in 2017

NumPy: Data wrangling
SciPy: Data wrangling
Pandas: Data wrangling
Matplotlib: Visualization
Seaborn: Visualization
Bokeh: Visualization
Plotly: Visualization
SciKit-Learn: Machine learning
Keras: Machine learning
TensorFlow: Machine learning
- qxs EverNote
Scrapy: Data scraping
NLTK: NLP(natural language processing)
Gensim: NLP
Statsmodels: Statistics

PyTorch
a deep learning framework; Tensor; Deep Neural Networks

Top 20 R Machine Learning and Data Science packages

e1071 Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier etc (142479 downloads)
rpart Recursive Partitioning and Regression Trees. (135390)
igraph A collection of network analysis tools. (122930)
nnet Feed-forward Neural Networks and Multinomial Log-Linear Models. (108298)
randomForest Breiman and Cutler's random forests for classification and regression. (105375)
caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. (87151)
kernlab Kernel-based Machine Learning Lab. (62064)
glmnet Lasso and elastic-net regularized generalized linear models. (56948)
ROCR Visualizing the performance of scoring classifiers. (51323)
gbm Generalized Boosted Regression Models. (44760)
party A Laboratory for Recursive Partitioning. (43290)
arules Mining Association Rules and Frequent Itemsets. (39654)
tree Classification and regression trees. (27882)
klaR Classification and visualization. (27828)
RWeka R/Weka interface. (26973)
ipred Improved Predictors. (22358)
lars Least Angle Regression, Lasso and Forward Stagewise. (19691)
earth Multivariate Adaptive Regression Spline Models. (15901)
CORElearn Classification, regression, feature evaluation and ordinal evaluation. (13856)
mboost Model-Based Boosting. (13078)

other Machine learning resources

漫画解读：轻松看懂机器学习十大常用算法

Machine learning in genomics

buckler lab machine learning paper

Advanced Machine Learning

morvanzhou machine-learning

MorvanZhou/tutorials github

Theano

Machine Learning Crash Course by Google

TensorFlow https://www.tensorflow.org/

TensorFlow
- [TensorFlow 官方文档中文版](tional etor Too T 官方http://m/tensorflow/)
- TensorFlow 中文
https://github.com/xinshuaiqi/TensorFlow-Course
Keras 中文文档
TensorFlow's Eager API
Tensor board basic
Thensor board advanced

Underfitting: still room to improve the test data.

model is not powerful enough
over-regularized
has simply not been trained long enough

How to avoid overfitting?

more training data
if not possible, regularization：constraints on the quantity and type of information your model can store. Focus on the most prominent patterns.

weight regularization
dropout

Always keep this in mind: deep learning models tend to be good at fitting to the training data, but the real challenge is generalization, not fitting.

L1 regularization, where the cost added is proportional to the absolute value of the weights coefficients (i.e. to what is called the "L1 norm" of the weights).

L2 regularization, where the cost added is proportional to the square of the value of the weights coefficients (i.e. to what is called the "L2 norm" of the weights). L2 regularization is also called weight decay in the context of neural networks. Don't let the different name confuse you: weight decay is mathematically the exact same as L2 regularization.

kernel_regularizer=keras.regularizers.l2(0.001)

Dropout: randomly "dropping out" (i.e. set to zero) a number of output features of the layer during training

keras.layers.Dense(16, activation=tf.nn.relu, input_shape=(NUM_WORDS,)), keras.layers.Dropout(0.5), keras.layers.Dense(16, activation=tf.nn.relu), keras.layers.Dropout(0.5), keras.layers.Dense(1, activation=tf.nn.sigmoid)

Theano

Theano教程

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

PyTorch

PyTorch教程

CNTK

Computational Network Toolkit (CNTK) 是微软出品的开源深度学习工具包。

官方入门教程 https://github.com/Microsoft/CNTK/wiki/Tutorial 本文也主要以这里的教程为例
官方论坛 https://github.com/Microsoft/CNTK/issues
官方论文 # Keras [Keras 中文](https://research.microsoft.com/pubs/226641/CNTKBook-20160217..pdf 这个有150页，我是当作字典来用，遇到问题的时候就在里面搜keras-cn.readthedocs.io/en/latest/)

TensorFlow和Theano的比较（以及PyTorch, Caffe）

从TensorFlow到Theano：横向对比七大深度学习框架对比深度学习十大框架：TensorFlow最流行但并不是最好 {23个深度学习库大排名}(https://www.jiqizhixin.com/articles/2017-10-24-6)

1__Machine learning A Z - xinshuaiqi/My_books GitHub Wiki

Regression

Simple Linear Regression

evaluate the accuracy of model

Multiple Linear Regression

Taking care of the dummy variables

How to build a model

Methods to build a model

Polynomial Regression

Support Vector Regression (SVR)

Decision Tree Regression

Random Forest Regression

Evaluating Regression Models Performance

Classification

Logistic Regression

K-Nearest Neighbors (K-NN)

Support Vector Machine (SVM)

Kernel SVM

Naive Bayes

Decision Tree Classification

Random Forest Classification

Evaluating Classification Models Performance

CLUSTERING

K-Means Clustering

Hierarchical Clustering

ASSOCIATION RULE LEARNING

Apriori

Eclat

Reinforcement learning

Deep learning

ANN - Artificial neural network

Neuron

Activation function

How it works.

How it learn

How to decrease the cost function:

Gradient Descent

Stochastic Gradient Descent

example

Install libraries

make ANN

CNN- Convolutional Neural Networks

How it works

ReLU layer ［rectified linear unit (ReLU)］

Forward propagation

Backpropagation 反向传播

Dimensionality reduction

PCA

Linear Discriminant Analysis (LDA) 线性判别分析

Kernel PCA

Model Selection

k-Fold Cross Validation

Grid Search

XGBoost

Reinforcement learning

Natural Language processing

Machine learning resources

Top 15 Python Libraries for Data Science in 2017

Top 20 R Machine Learning and Data Science packages

other Machine learning resources

Machine learning in genomics

Advanced Machine Learning

morvanzhou machine-learning

MorvanZhou/tutorials github

Theano

Machine Learning Crash Course by Google

TensorFlow https://www.tensorflow.org/

Theano

PyTorch

CNTK

TensorFlow和Theano的比较 （以及PyTorch, Caffe）

⚠️ **GitHub.com Fallback** ⚠️

TensorFlow和Theano的比较（以及PyTorch, Caffe）

⚠️ GitHub.com Fallback ⚠️