Topic Overview

Disjoint Set (Union-Find)

Master the Union-Find data structure for efficiently managing disjoint sets. Learn path compression and union by rank optimizations.

Intermediate18 min read

Disjoint Set (Union-Find)

Why This Matters

Union-Find (also called Disjoint Set Union) is a data structure that efficiently manages a collection of disjoint (non-overlapping) sets. It's deceptively simple but incredibly powerful for solving graph problems.

Union-Find matters because it solves a fundamental problem: "Are two elements in the same set?" and "Merge two sets." These operations appear in many algorithms: Kruskal's algorithm for minimum spanning trees, detecting cycles in graphs, finding connected components, and image processing (connected pixel regions).

In interviews, Union-Find problems test your ability to recognize when a problem is about connectivity. Many graph problems that seem complex become simple with Union-Find. Real-world systems use Union-Find: network connectivity (are two nodes connected?), image segmentation (which pixels belong to the same region?), and social networks (are two people in the same friend group?).

What Engineers Usually Get Wrong

Most engineers think "Union-Find is just a simple data structure, I'll implement it naively." But naive implementations are O(n) per operation, making them too slow for large inputs. The key optimizations—path compression and union by rank—reduce this to nearly O(1) amortized per operation.

Engineers also forget that Union-Find is specifically for disjoint sets. If you need to track which elements are in which set, or if sets can overlap, Union-Find isn't the right tool. It's designed for the "are they connected?" problem, not general set operations.

Another common mistake is not recognizing when to use Union-Find. Many graph problems can be solved with DFS/BFS, but Union-Find is often simpler and faster for connectivity queries. If you're repeatedly asking "are these two nodes connected?", Union-Find is likely the right choice.

How This Breaks Systems in the Real World

A network monitoring service was checking if servers were connected. It used DFS to traverse the network graph for each connectivity query. With 10,000 servers and 1,000 queries per second, the service spent 80% of its CPU time doing DFS traversals. The service became unresponsive during traffic spikes.

The fix? Use Union-Find to precompute connected components. Build the Union-Find structure once, then each query is O(1) amortized. But the real lesson is: when you have many connectivity queries, Union-Find is faster than repeated graph traversals.

Another story: A social network was finding friend groups using a naive Union-Find implementation (no optimizations). With 1 million users, each union operation took O(n) time. Building the friend groups took hours. The fix? Add path compression and union by rank. The same operation completed in minutes. But the real lesson is: optimizations matter. O(n) vs O(α(n)) is the difference between usable and unusable.


What is Union-Find?

Union-Find maintains a collection of disjoint sets. Each set has a representative element (usually the root). The data structure supports two main operations:

  1. Find(x): Find the representative of the set containing x
  2. Union(x, y): Merge the sets containing x and y

Key Concepts

  • Disjoint Sets: Sets that don't share any elements
  • Representative: One element chosen to represent each set (usually the root)
  • Parent Array: Tracks the parent of each element (forms a tree structure)
  • Path Compression: Optimization that flattens the tree during find operations
  • Union by Rank: Optimization that keeps trees balanced during union operations

Basic Implementation

Naive Implementation

1class UnionFind {
2 private parent: number[];
3
4 constructor(size: number) {
5 this.parent = new Array(size);
6 // Initially, each element is its own parent
7 for (let i = 0; i < size; i++) {
8 this.parent[i] = i;
9 }
10 }
11
12 // Find root of x (naive - O(n) worst case)
13 find(x: number): number {
14 parentx x

Optimized Implementation

Path Compression

Flatten the tree during find operations by making all nodes point directly to the root.

1class UnionFind {
2 private parent: number[];
3
4 constructor(size: number) {
5 this.parent = new Array(size);
6 for (let i = 0; i < size; i++) {
7 this.parent[i] = i;
8 }
9 }
10
11 // Find with path compression
12 find(x: number): number {
13 if (thisparentx x

Union by Rank

Keep trees balanced by always attaching the smaller tree to the larger tree's root.

1class UnionFind {
2 private parent: number[];
3 private rank: number[]; // Approximate height of tree
4
5 constructor(size: number) {
6 this.parent = new Array(size);
7 this.rank = new Array(size).fill(0);
8 for (let i = 0; i < size; i++) {
9 this.parent[i i

Time Complexity

OperationNaiveWith Path CompressionWith Both Optimizations
FindO(n)O(α(n)) amortizedO(α(n)) amortized
UnionO(n)O(α(n)) amortizedO(α(n)) amortized
ConnectedO(n)O(α(n)) amortizedO(α(n)) amortized

α(n) (Inverse Ackermann Function):

  • Extremely slow-growing function
  • For practical purposes (n ≤ 2^65536), α(n) ≤ 4
  • Effectively constant time

Applications

1. Kruskal's Algorithm (Minimum Spanning Tree)

Use Union-Find to detect cycles when building MST.

1function kruskalMST(edges: number[][], n: number): number[][] {
2 // Sort edges by weight
3 edges.sort((a, b) => a[2] - b[2]);
4
5 const uf = new UnionFind(n);
6 const mst: number[][] = [];
7
8 for (const [u, v weight edges

2. Number of Connected Components

Count how many separate connected components exist in a graph.

1function countComponents(n: number, edges: number[][]): number {
2 const uf = new UnionFind(n);
3
4 // Union all connected nodes
5 for (const [u, v] of edges) {
6 uf.union(u, v);
7 }
8
9 // Count distinct roots
10 const roots = new Set<number>();
11 for (let i = 0; i n i

3. Detecting Cycle in Undirected Graph

1function hasCycle(n: number, edges: number[][]): boolean {
2 const uf = new UnionFind(n);
3
4 for (const [u, v] of edges) {
5 if (uf.connected(u, v)) {
6 return true; // Cycle detected
7 }
8 uf.union(u, v);
9 }
10
11 return false;

Common Pitfalls

  • Forgetting optimizations: Naive implementation is too slow for large inputs
  • Not using path compression: Trees become unbalanced, operations become slow
  • Not using union by rank: Can create long chains, degrading performance
  • Using for overlapping sets: Union-Find only works for disjoint sets
  • Not recognizing the pattern: Many connectivity problems can use Union-Find

Examples

Example 1: Friend Circles

Find number of friend circles (connected components) in a social network.

1function findCircleNum(isConnected: number[][]): number {
2 const n = isConnected.length;
3 const uf = new UnionFind(n);
4
5 // Union all connected cities
6 for (let i = 0; i < n; i++) {
7 for (let j = i + 1; j < n; j++) {
8 if (isConnected[i][j] === 1) {
9 ufi j

Interview Questions

Beginner

Q: What is Union-Find used for?

A: Union-Find efficiently manages disjoint sets. It supports two operations: find(x) to find which set contains x, and union(x, y) to merge two sets. It's commonly used for connectivity problems in graphs, like finding connected components or detecting cycles.


Intermediate

Q: Explain path compression and union by rank. Why are they important?

A:

  • Path compression: During find, make all nodes point directly to the root, flattening the tree. This speeds up future finds.
  • Union by rank: When merging sets, attach the smaller tree to the larger tree's root, keeping trees balanced.

Together, these optimizations reduce time complexity from O(n) to O(α(n)) amortized per operation, where α(n) is effectively constant for practical values of n.


Senior

Q: How would you modify Union-Find to track the size of each set?

A: Add a size array that tracks the number of elements in each set. Initialize all sizes to 1. During union, add the smaller set's size to the larger set's size. Update the root's size. This allows O(1) queries for set size while maintaining O(α(n)) union complexity.

class UnionFindWithSize extends UnionFind {
  private size: number[];
  
  constructor(n: number) {
    super(n);
    this.size = new Array(n).fill(1);
  }
  
  union(x: number, y: number): void {
    const rootX = this.find(x);
    const rootY = this.find(y);
    if (rootX === rootY) return;
    
    if (this.rank[rootX] < this.rank[rootY]) {
      this.parent[rootX] = rootY;
      this.size[rootY] += this.size[rootX];
    } else {
      this.parent[rootY] = rootX;
      this.size[rootX] += this.size[rootY];
      if (this.rank[rootX] === this.rank[rootY]) {
        this.rank[rootX]++;
      }
    }
  }
  
  getSize(x: number): number {
    return this.size[this.find(x)];
  }
}

  • Union-Find manages disjoint sets with O(α(n)) amortized operations
  • Path compression flattens trees during find operations
  • Union by rank keeps trees balanced during union operations
  • Use for connectivity problems: connected components, cycle detection, MST
  • Optimizations are essential: naive implementation is too slow
  • α(n) is effectively constant for practical values of n
  • Recognize the pattern: "are they connected?" suggests Union-Find

Key Takeaways

Union-Find manages disjoint sets with O(α(n)) amortized operations

Path compression flattens trees during find operations

Union by rank keeps trees balanced during union operations

Use for connectivity problems: connected components, cycle detection, MST

Optimizations are essential: naive implementation is too slow

α(n) is effectively constant for practical values of n

Recognize the pattern: "are they connected?" suggests Union-Find


About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.