9.4.3.Density based Clustering - sj50179/IBM-Data-Science-Professional-Certificate GitHub Wiki

DBSCAN

Density-based clustering

  • Spherical-shape clusters

    Untitled

  • Arbitrary-shape clusters

    Untitled

Most of the traditional clustering techniques such as K-Means, hierarchical, and Fuzzy clustering can be used to group data in an unsupervised way. However, when applied to tasks with arbitrary shaped clusters or clusters within clusters, traditional techniques might not be able to achieve good results, that is elements in the same cluster might not share enough similarity or the performance may be poor.

k-Means vs Density-based clustering

  • k-Means assigns all points to a cluster even if they do not belong in any

    Untitled

  • Density-based clustering locates regions of high density, and separates outliers

    Untitled

Question

Which of the following are the characteristics of density-based clustering?

  • Density-based clustering algorithms are proper for arbitrary shape clusters.
  • Density-based clustering algorithms have no notion of outliers.
  • Density-based clustering algorithms locate regions of high density that are separated from one another by regions of low density.

Correct

DBSCAN for class identification

DBSCAN is particularly effective for tasks like class identification on a spatial context. The wonderful attributes of the DBSCAN algorithm is that it can find out any arbitrary shaped cluster without getting effected by noise. For example, this map shows the location of weather stations in Canada. DBSCAN can be used here to find the group of stations which show the same weather condition. As you can see, it not only finds different arbitrary shaped clusters it can find the denser part of data-centered samples by ignoring less dense areas or noises.

Untitled

What is DBSCAN?

  • DBSCAN (Density-Based Spatial Clustering of Applications with Noise)

    • Is one of the most common clustering algorithms
    • Works based on density of objects
  • R (Radius of neighborhood)

    • Radius (R) that if includes enough number of points within, we call it a dense area

      Untitled

  • M (Min number of neighbors)

    • The minimum number of data points we want in a neighborhood to define a cluster

      Untitled

DBSCAN algorithm - core point

Untitled

R = 2unit, M = 6

DBSCAN algorithm - border points?

Untitled

DBSCAN algorithm - border points

Untitled

DBSCAN algorithm - next core point

Untitled

DBSCAN algorithm - outliers

Untitled

DBSCAN algorithm - identify all points

Untitled

DBSCAN algorithm - clusters?

Untitled

Advantages of DBSCAN

Untitled

  1. Arbitrarily shaped clusters
  2. Robust to outliers
  3. Does not require specification of the number of clusters