Andrew NG ML Course Notes - sakthiram/100DaysOfMLCode GitHub Wiki

Week 1

Machine Learning
- Arthur Samuel described it as: "the field of study that gives computers the ability to learn without being explicitly programmed."
- Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."
  - Example: playing checkers. E = the experience of playing many games of checkers T = the task of playing checkers. P = the probability that the program will win the next game.
Supervised & Unsupervised learning
- Supervised: Correct answer y(i) will be given for each training data sample x(i). Cost function is derived by how much the prediction varies from this answer.
  - Regression: Mapping input variables (features) to continuous function using a hypothesis function.
  - Classification: Mapping input variables to discrete categories
- Unsupervised: Clustering/grouping of data based on relationships among input variables (features)
[Linear Regression] Why is the cost function about the sum of SQUARES, rather than sum of absolute/cubes?
- It isn't the only possible cost function, but it has many nice properties.
- Overestimates (+) and Underestimates (-) are punished equally because of squaring.
- Big errors gets punished more than small ones.
- Squaring function is smooth & yields linear forms after differentiation. (nice for optimization)
- "Convex" property => guarantees "global min" => algorithms will converge.
[Linear Regression] Why can’t I use 4th powers in the cost function? Don’t they have the nice properties of squares?
- Distance in Cartesian coordinates is found by srqt(sum of squares of x,y,.. distances from origin) (dist=error)
- Even when the axes (x,y,..) are rotated, the sum of squares value remains same for a given point.
- So 4th powers lack this property.
Why does 1/(2 * m) make math easier?
- When we differentiate the cost to calculate the gradient, we get a factor of 2 due to the exponent inside the sum. The two factors will cancel out, giving a slightly simpler formula
Linear Algebra Reference
- Khan academy Tutorials

Andrew NG ML Course Notes - sakthiram/100DaysOfMLCode GitHub Wiki

Week 1

Machine Learning

Supervised & Unsupervised learning

[Linear Regression] Why is the cost function about the sum of SQUARES, rather than sum of absolute/cubes?

[Linear Regression] Why can’t I use 4th powers in the cost function? Don’t they have the nice properties of squares?

Why does 1/(2 * m) make math easier?

Linear Algebra Reference