Regression - niranjv/ml-notes GitHub Wiki
- 
Linear Methods
- Linear Regression
- Logistic Regression
- Stochastic Gradient Descent
 
- 
Shrinkage methods
- Ridge Regression
- LASSO
- LARS
 
- 
Derived Input Methods
- Principal Components Regression
- Partial Least Squares Regression
 
- 
Non-linear methods
- MARS
- Polynomial Regression
- LOESS
- Splines
- Generalized Additive Models (GAMS)
- Isotonic Regression
 
- References
- If features are correlated, run PCA first and then regress on a few PCs
- Focus here is only on 1-d linear ordered isotonic regression, not general isotonic regression
- Non-parametric regression method to fit a non-decreasing function to data
- Similar to inexact smoothing splines except that monotonicity instead of smoothness is used to remove noise
- Fit a free-from line to a set of data s.t. line is non-decreasing everywhere and minimizes MSE on data
- No assumptions about target function (e.g., linearity like in a linear model)
- Can we weighted or unweighted (all weights must be > 0); no contradictory constraints
- Non-parametrics
- Fast
- Simple
- Points at ends of intervals can be noisy
- Works best when n > 10,000 (can smooth outcome to improve performance when n< 10,000)
- For fitting non-parametric model to data that is expected to be ordered
- Improve calibration of probabilistic classifier - correct probabilities output by classifiers like random forests, boosted trees, SVMs, etc. (but not neural networks which are well calibrated)
- Calibration of recommendation models
- Non-metric multi-dimensional scaling - isotonic regression is used to find distance as a function of item-item similarity
- Linear ordered isotonic regression is solved using Pool Adjacent Violators Algorithm (PAVA)
- scikit-learn: IsotonicRegression
- R: isoreg,Isopackage,isotonepackage,isoMDS
- Spark: IsotonicRegression,IsotonicRegressionModel
- Ad Click Prediction: a View from the Trenches, KDD 2013
- Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine, ICML 2010
- Predicting Good Probabilities With Supervised Learning
- Fastest Isotonic Regression Algorithms
- Kaggle: Give Me Some Credit contest
- Platt Scaling - Calibration of probabilistics classifier using sigmoid functions). Better than isotonic regression for n< 5,000. Slower than isotonic regression for alln.
Notes:
- Ridge regression and LASSO are forms of penalized estimation. They introduce bias into estimation of model parameters to reduce variance of estimate. They have lower MSE than OLS when multi-collinearity is present. These methods are used mainly for prediction and not for inference since it is difficult to account for bias
- scikit-learn Generalized Linear Models
- Wikipedia - Linear Regression
- Numerical Python, Robert Johansson, APress, 2015