Supervised VS Unsupervised Learning - Statistics-and-Machine-Learning-with-R/Statistical-Methods-and-Machine-Learning-in-R GitHub Wiki
We will focus on understanding the basic difference between two very common types of machine learning.
- Supervised Learning
- Unsupervised Learning
Before we dig into the technical part, I’ll take a simple example of how
a small baby learns things. Well, for example, we have shown two pictures to a baby.
We told the baby that, the first picture is an apple and the second picture is a banana.
While learning these two things, the baby keeps in mind that if the color is red and the shape
is a circle,
then it is an apple, and if the color is yellow and the shape is not a circle then it is a banana.
That’s how a baby learns. Then we showed the third picture and ask the baby to find the fruit either apple or banana. So the moment you showed the third picture, he will identify “Yeah it’s a banana :)”. Because we have already labeled the two pictures into two categories. so the baby knows what is apple and what is banana already. This is how supervised learning works.
The basic idea for supervised learning is, your data provides examples of situations and for each example, it specifies an outcome. Then the machine will use the training data to build the model which can predict the outcome of the new data based on past examples. So let’s consider a simple data set of houses recently sold.
Our first example house could be 3,125 sqft with 5 bedrooms and 3 baths and we might tell the algorithm that this house sold for $530,000. Next we might provide an example of 2100 sqft house with 4 bedrooms and 2 baths that sold for $460,000. Likewise 1200 sqft house with 3 bedrooms and 1.5 baths sold for $250,000. After we trained the machine with the existing above data, we ask the machine to predict the price of another house that has 6 bedrooms and 4 baths. The important thing about supervised learning is, it has a very specific structure shown as below.
We have rows of data, each of which is an example of something we are using to train the model. Each row has a column that with a known outcome. we refer it as a ‘Label’. In the above house example Price is a label. If the label is categorical the model is known as a “classification” If the label is numeric, the model is known as a “regression”. We can use below algorithms for supervised learning. * Logistic Regression * Model/ Ensemble * Time series
Unsupervised learning algorithms are used to group cases based on similar attributes, or naturally occurring trends, patterns, or relationships in the data
- Finding similar instances (clustering)
- Anomaly detection (finding outliers/noise)
- Finding correlation, significant differences or similarities between variables or groups of instances