SerialReads

Union-Find (Disjoint-Set Union): Theory & Python Implementation

May 29, 2025

1 Why Union-Find?

Dynamic connectivity problems ask: “Are x and y in the same group?” Typical use-cases include:

Domain Example
Graph algorithms Kruskal’s MST, connected-components
Percolation / image segmentation Label connected pixels/regions
Dynamic network sims “Do these hosts share a VLAN after N cable changes?”

We need three operations:


2 Data representation

A forest of rooted trees encoded in two parallel arrays

Array Meaning After make(n)
parent[i] Immediate ancestor (root points to itself) i
rank[i] or size[i] Height estimate or element count—stored only on roots 0 / 1

We’ll show both variants; pick the one that best fits your needs.


3 Baseline algorithms (no optimisations)

find(x):
    while parent[x] ≠ x:
        x ← parent[x]
    return x

union(x, y):
    rx ← find(x);  ry ← find(y)
    if rx == ry: return False
    parent[rx] ← ry
    return True

Worst-case cost: O(n) – a chain can grow as tall as n − 1. Good for tiny inputs or white-board demos, terrible at scale.


4 Two classic heuristics = near-O(1)

Heuristic Where used 1-liner intuition
Path compression inside find “While climbing, re-point every visited node straight to the root.”
Union-by-rank / size inside union “Attach the shorter/smaller tree under the taller/larger root.”

Add both and Tarjan proved the amortised complexity over any sequence of m ops on n elements is O((n + m) · α(n)) where α(n) (inverse Ackermann) < 5 for any feasible n. Effectively constant-time.


5 Python 3 implementation

class DSUPlain:
    """Bare-bones Union–Find without optimisations (O(n) worst case)."""

    def __init__(self, n: int):
        self.parent = list(range(n))

    def find(self, x: int) -> int:
        while self.parent[x] != x:
            x = self.parent[x]
        return x

    def union(self, a: int, b: int) -> bool:
        ra, rb = self.find(a), self.find(b)
        if ra == rb:
            return False
        self.parent[ra] = rb
        return True

5.2 Optimised (path compression + union-by-size)

class DSU:
    """Disjoint-Set Union with path compression + union-by-size."""
    __slots__ = ("parent", "size")          # saves memory in CPython

    def __init__(self, n: int):
        self.parent = list(range(n))
        self.size   = [1] * n               # size[i] valid only if i is root

    def find(self, x: int) -> int:
        """Iterative path compression (stack-safe)."""
        root = x
        while self.parent[root] != root:    # climb to root
            root = self.parent[root]

        # second pass: compress
        while self.parent[x] != root:
            nxt = self.parent[x]
            self.parent[x] = root
            x = nxt
        return root

    def union(self, a: int, b: int) -> bool:
        ra, rb = self.find(a), self.find(b)
        if ra == rb:
            return False                    # already unified

        # attach smaller under larger
        if self.size[ra] < self.size[rb]:
            ra, rb = rb, ra                 # ensure ra is larger
        self.parent[rb] = ra
        self.size[ra] += self.size[rb]
        return True

    # handy helpers
    def connected(self, a: int, b: int) -> bool:
        return self.find(a) == self.find(b)

    def set_size(self, x: int) -> int:
        return self.size[self.find(x)]

Swap in a rank array (initialised to 0) if you only need height heuristics:

# inside union:
if self.rank[ra] < self.rank[rb]:
    self.parent[ra] = rb
elif self.rank[ra] > self.rank[rb]:
    self.parent[rb] = ra
else:
    self.parent[rb] = ra
    self.rank[ra] += 1

6 Mini smoke test

d = DSU(10)
assert d.union(1, 2)
assert d.union(3, 4)
assert d.connected(1, 2)
assert not d.connected(1, 3)
d.union(2, 3)
assert d.connected(1, 4)
assert d.set_size(1) == 4
print("all good ✓")

7 When to choose which variant

Scenario Use plain DSU Use path compression only Use both heuristics
≤ 1 k ops, quick demo ✔︎
Coding-interview task (n ≤ 1e5) ✔︎ (okay) ✔︎ (safer)
Production Kruskal on 1e7 edges ✔︎ absolutely

8 Key take-aways


Happy coding—your Disjoint-Set Union is now production-ready!

data-structures-and-algorithms