9.4.3.Density based Clustering - sj50179/IBM-Data-Science-Professional-Certificate GitHub Wiki
DBSCAN
Density-based clustering
-
Spherical-shape clusters
-
Arbitrary-shape clusters
Most of the traditional clustering techniques such as K-Means, hierarchical, and Fuzzy clustering can be used to group data in an unsupervised way. However, when applied to tasks with arbitrary shaped clusters or clusters within clusters, traditional techniques might not be able to achieve good results, that is elements in the same cluster might not share enough similarity or the performance may be poor.
k-Means vs Density-based clustering
-
k-Means assigns all points to a cluster even if they do not belong in any
-
Density-based clustering locates regions of high density, and separates outliers
Question
Which of the following are the characteristics of density-based clustering?
- Density-based clustering algorithms are proper for arbitrary shape clusters.
Density-based clustering algorithms have no notion of outliers.- Density-based clustering algorithms locate regions of high density that are separated from one another by regions of low density.
Correct
DBSCAN for class identification
DBSCAN is particularly effective for tasks like class identification on a spatial context. The wonderful attributes of the DBSCAN algorithm is that it can find out any arbitrary shaped cluster without getting effected by noise. For example, this map shows the location of weather stations in Canada. DBSCAN can be used here to find the group of stations which show the same weather condition. As you can see, it not only finds different arbitrary shaped clusters it can find the denser part of data-centered samples by ignoring less dense areas or noises.
What is DBSCAN?
-
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
- Is one of the most common clustering algorithms
- Works based on density of objects
-
R (Radius of neighborhood)
-
Radius (R) that if includes enough number of points within, we call it a dense area
-
-
M (Min number of neighbors)
-
The minimum number of data points we want in a neighborhood to define a cluster
-
DBSCAN algorithm - core point
R = 2unit, M = 6
DBSCAN algorithm - border points?
DBSCAN algorithm - border points
DBSCAN algorithm - next core point
DBSCAN algorithm - outliers
DBSCAN algorithm - identify all points
DBSCAN algorithm - clusters?
Advantages of DBSCAN
- Arbitrarily shaped clusters
- Robust to outliers
- Does not require specification of the number of clusters