Andrew NG ML Course Notes - sakthiram/100DaysOfMLCode GitHub Wiki

Week 1

  • Machine Learning

    • Arthur Samuel described it as: "the field of study that gives computers the ability to learn without being explicitly programmed."
    • Tom Mitchell provides a more modern definition: "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."
      • Example: playing checkers. E = the experience of playing many games of checkers T = the task of playing checkers. P = the probability that the program will win the next game.
  • Supervised & Unsupervised learning

    • Supervised: Correct answer y(i) will be given for each training data sample x(i). Cost function is derived by how much the prediction varies from this answer.
      • Regression: Mapping input variables (features) to continuous function using a hypothesis function.
      • Classification: Mapping input variables to discrete categories
    • Unsupervised: Clustering/grouping of data based on relationships among input variables (features)
  • [Linear Regression] Why is the cost function about the sum of SQUARES, rather than sum of absolute/cubes?

    • It isn't the only possible cost function, but it has many nice properties.
    • Overestimates (+) and Underestimates (-) are punished equally because of squaring.
    • Big errors gets punished more than small ones.
    • Squaring function is smooth & yields linear forms after differentiation. (nice for optimization)
    • "Convex" property => guarantees "global min" => algorithms will converge.
  • [Linear Regression] Why can’t I use 4th powers in the cost function? Don’t they have the nice properties of squares?

    • Distance in Cartesian coordinates is found by srqt(sum of squares of x,y,.. distances from origin) (dist=error)
    • Even when the axes (x,y,..) are rotated, the sum of squares value remains same for a given point.
    • So 4th powers lack this property.
  • Why does 1/(2 * m) make math easier?

    • When we differentiate the cost to calculate the gradient, we get a factor of 2 due to the exponent inside the sum. The two factors will cancel out, giving a slightly simpler formula
  • Linear Algebra Reference