Disjoint Sets - rFronteddu/general_wiki GitHub Wiki

Disjoint sets are used to group distinct elements into a collection of non-overlapping sets. Key operations include:

  1. Finding the Set:: Determine the unique set containing a given element
  2. Uniting Sets: Merge two sets into a single set.

Disjoint-set operations

A disjoint-set data structure maintains a collection of disjoint sets, where each set has a representative, or "root". The primary operations are:

  • MAKE_SET(x): Creates a new set with x as its sole member and representative. x must not already be in another set.
  • UNION(x, y): Merges the sets containing x and y. Typically, the elements of one set are merged into the other.
  • FIND_SET(x): Returns the representative of the set containing x.

Each UNION operation reduces the number of sets by one.

Complexity Analysis

  • let n be the number of MAKE-SET operations,
  • let m the total number of MAKE-SET, UNION, and FIND-SET operations. Note $m \geq n$, and UNION operations are at most n-1.

Connected Components Algorithm

To find the connected components in a graph G:

  1. Initialize makeSet(v) for each vertex v.
  2. Process edges: if findset(u) != findSet(v) then union(u, v)

To check if two vertices u and v are in the same connected component, use sameComponent(u, v), which returns findSet(u) == findSet(v)

Linked List representation of disjoint sets

In a linked-list implementation of disjoint-sets:

  • union operation: Append the smaller list to the larger one.

  • time complexity: The worst-case running time is O(m+ n log n) because each UNION updates representatives and handles lists efficiently.

  • code

Disjoint-set forests

In a optimized implementation:

  • representation: sets are represented by rooted trees (forests). Each node points to its parent, and the root is its own parent.
  • operations:
  • MAKE_SET(x): creates a tree with x as the root.
  • FIND_SET(x): Follows parent pointers to the root.
  • UNION(x, y): Makes the root of one tree point to the root of the other.

Heuristics for Efficiency

  1. Union by Rank: keep track of the rank (approximate height) of each tree. During UNION, attach the smaller rank tree to the larger one, and increase the rank if they are equal.
  2. Path Compression: During FIND_SET, make all nodes on the path point directly to the root, reducing future lookup times.
MAKE-SET(x)
    x.p = x
    x.rank = 0

UNION(x, y)
    LINK(FIND-SET(x), FIND-SET(y))

LINK(x, y)
    if x.rank > y.rank
        y.p = x
    else 
        x.p = y
        if x.rank == y.rank
            y.rank = y.rank + 1

FIND-SET(x)
    if x != x.p
        x.p = FIND-SET(x.p) // Path compression
    return x.p