9.4.3.Density based Clustering - sj50179/IBM-Data-Science-Professional-Certificate GitHub Wiki

DBSCAN

Density-based clustering

Spherical-shape clusters
Arbitrary-shape clusters

Most of the traditional clustering techniques such as K-Means, hierarchical, and Fuzzy clustering can be used to group data in an unsupervised way. However, when applied to tasks with arbitrary shaped clusters or clusters within clusters, traditional techniques might not be able to achieve good results, that is elements in the same cluster might not share enough similarity or the performance may be poor.

k-Means vs Density-based clustering

k-Means assigns all points to a cluster even if they do not belong in any
Density-based clustering locates regions of high density, and separates outliers

Question

Which of the following are the characteristics of density-based clustering?

Density-based clustering algorithms are proper for arbitrary shape clusters.
~~Density-based clustering algorithms have no notion of outliers.~~
Density-based clustering algorithms locate regions of high density that are separated from one another by regions of low density.

Correct

DBSCAN for class identification

DBSCAN is particularly effective for tasks like class identification on a spatial context. The wonderful attributes of the DBSCAN algorithm is that it can find out any arbitrary shaped cluster without getting effected by noise. For example, this map shows the location of weather stations in Canada. DBSCAN can be used here to find the group of stations which show the same weather condition. As you can see, it not only finds different arbitrary shaped clusters it can find the denser part of data-centered samples by ignoring less dense areas or noises.

Untitled

What is DBSCAN?

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
- Is one of the most common clustering algorithms
- Works based on density of objects
R (Radius of neighborhood)
- Radius (R) that if includes enough number of points within, we call it a dense area
M (Min number of neighbors)
- The minimum number of data points we want in a neighborhood to define a cluster