Topic Overview
Disjoint Set (Union-Find)
Master the Union-Find data structure for efficiently managing disjoint sets. Learn path compression and union by rank optimizations.
Disjoint Set (Union-Find)
Why This Matters
Union-Find (also called Disjoint Set Union) is a data structure that efficiently manages a collection of disjoint (non-overlapping) sets. It's deceptively simple but incredibly powerful for solving graph problems.
Union-Find matters because it solves a fundamental problem: "Are two elements in the same set?" and "Merge two sets." These operations appear in many algorithms: Kruskal's algorithm for minimum spanning trees, detecting cycles in graphs, finding connected components, and image processing (connected pixel regions).
In interviews, Union-Find problems test your ability to recognize when a problem is about connectivity. Many graph problems that seem complex become simple with Union-Find. Real-world systems use Union-Find: network connectivity (are two nodes connected?), image segmentation (which pixels belong to the same region?), and social networks (are two people in the same friend group?).
What Engineers Usually Get Wrong
Most engineers think "Union-Find is just a simple data structure, I'll implement it naively." But naive implementations are O(n) per operation, making them too slow for large inputs. The key optimizations—path compression and union by rank—reduce this to nearly O(1) amortized per operation.
Engineers also forget that Union-Find is specifically for disjoint sets. If you need to track which elements are in which set, or if sets can overlap, Union-Find isn't the right tool. It's designed for the "are they connected?" problem, not general set operations.
Another common mistake is not recognizing when to use Union-Find. Many graph problems can be solved with DFS/BFS, but Union-Find is often simpler and faster for connectivity queries. If you're repeatedly asking "are these two nodes connected?", Union-Find is likely the right choice.
How This Breaks Systems in the Real World
A network monitoring service was checking if servers were connected. It used DFS to traverse the network graph for each connectivity query. With 10,000 servers and 1,000 queries per second, the service spent 80% of its CPU time doing DFS traversals. The service became unresponsive during traffic spikes.
The fix? Use Union-Find to precompute connected components. Build the Union-Find structure once, then each query is O(1) amortized. But the real lesson is: when you have many connectivity queries, Union-Find is faster than repeated graph traversals.
Another story: A social network was finding friend groups using a naive Union-Find implementation (no optimizations). With 1 million users, each union operation took O(n) time. Building the friend groups took hours. The fix? Add path compression and union by rank. The same operation completed in minutes. But the real lesson is: optimizations matter. O(n) vs O(α(n)) is the difference between usable and unusable.
What is Union-Find?
Union-Find maintains a collection of disjoint sets. Each set has a representative element (usually the root). The data structure supports two main operations:
- Find(x): Find the representative of the set containing x
- Union(x, y): Merge the sets containing x and y
Key Concepts
- Disjoint Sets: Sets that don't share any elements
- Representative: One element chosen to represent each set (usually the root)
- Parent Array: Tracks the parent of each element (forms a tree structure)
- Path Compression: Optimization that flattens the tree during find operations
- Union by Rank: Optimization that keeps trees balanced during union operations
Basic Implementation
Naive Implementation
1class UnionFind {2 private parent: number[];34 constructor(size: number) {5 this.parent = new Array(size);6 // Initially, each element is its own parent7 for (let i = 0; i < size; i++) {8 this.parent[i] = i;9 }10 }1112 // Find root of x (naive - O(n) worst case)13 find(x: number): number {14 parentx x
Optimized Implementation
Path Compression
Flatten the tree during find operations by making all nodes point directly to the root.
1class UnionFind {2 private parent: number[];34 constructor(size: number) {5 this.parent = new Array(size);6 for (let i = 0; i < size; i++) {7 this.parent[i] = i;8 }9 }1011 // Find with path compression12 find(x: number): number {13 if (thisparentx x
Union by Rank
Keep trees balanced by always attaching the smaller tree to the larger tree's root.
1class UnionFind {2 private parent: number[];3 private rank: number[]; // Approximate height of tree45 constructor(size: number) {6 this.parent = new Array(size);7 this.rank = new Array(size).fill(0);8 for (let i = 0; i < size; i++) {9 this.parent[i i
Time Complexity
| Operation | Naive | With Path Compression | With Both Optimizations |
|---|---|---|---|
| Find | O(n) | O(α(n)) amortized | O(α(n)) amortized |
| Union | O(n) | O(α(n)) amortized | O(α(n)) amortized |
| Connected | O(n) | O(α(n)) amortized | O(α(n)) amortized |
α(n) (Inverse Ackermann Function):
- Extremely slow-growing function
- For practical purposes (n ≤ 2^65536), α(n) ≤ 4
- Effectively constant time
Applications
1. Kruskal's Algorithm (Minimum Spanning Tree)
Use Union-Find to detect cycles when building MST.
1function kruskalMST(edges: number[][], n: number): number[][] {2 // Sort edges by weight3 edges.sort((a, b) => a[2] - b[2]);45 const uf = new UnionFind(n);6 const mst: number[][] = [];78 for (const [u, v weight edges
2. Number of Connected Components
Count how many separate connected components exist in a graph.
1function countComponents(n: number, edges: number[][]): number {2 const uf = new UnionFind(n);34 // Union all connected nodes5 for (const [u, v] of edges) {6 uf.union(u, v);7 }89 // Count distinct roots10 const roots = new Set<number>();11 for (let i = 0; i n i
3. Detecting Cycle in Undirected Graph
1function hasCycle(n: number, edges: number[][]): boolean {2 const uf = new UnionFind(n);34 for (const [u, v] of edges) {5 if (uf.connected(u, v)) {6 return true; // Cycle detected7 }8 uf.union(u, v);9 }1011 return false;
Common Pitfalls
- Forgetting optimizations: Naive implementation is too slow for large inputs
- Not using path compression: Trees become unbalanced, operations become slow
- Not using union by rank: Can create long chains, degrading performance
- Using for overlapping sets: Union-Find only works for disjoint sets
- Not recognizing the pattern: Many connectivity problems can use Union-Find
Examples
Example 1: Friend Circles
Find number of friend circles (connected components) in a social network.
1function findCircleNum(isConnected: number[][]): number {2 const n = isConnected.length;3 const uf = new UnionFind(n);45 // Union all connected cities6 for (let i = 0; i < n; i++) {7 for (let j = i + 1; j < n; j++) {8 if (isConnected[i][j] === 1) {9 ufi j
Interview Questions
Beginner
Q: What is Union-Find used for?
A: Union-Find efficiently manages disjoint sets. It supports two operations: find(x) to find which set contains x, and union(x, y) to merge two sets. It's commonly used for connectivity problems in graphs, like finding connected components or detecting cycles.
Intermediate
Q: Explain path compression and union by rank. Why are they important?
A:
- Path compression: During
find, make all nodes point directly to the root, flattening the tree. This speeds up future finds. - Union by rank: When merging sets, attach the smaller tree to the larger tree's root, keeping trees balanced.
Together, these optimizations reduce time complexity from O(n) to O(α(n)) amortized per operation, where α(n) is effectively constant for practical values of n.
Senior
Q: How would you modify Union-Find to track the size of each set?
A: Add a size array that tracks the number of elements in each set. Initialize all sizes to 1. During union, add the smaller set's size to the larger set's size. Update the root's size. This allows O(1) queries for set size while maintaining O(α(n)) union complexity.
class UnionFindWithSize extends UnionFind {
private size: number[];
constructor(n: number) {
super(n);
this.size = new Array(n).fill(1);
}
union(x: number, y: number): void {
const rootX = this.find(x);
const rootY = this.find(y);
if (rootX === rootY) return;
if (this.rank[rootX] < this.rank[rootY]) {
this.parent[rootX] = rootY;
this.size[rootY] += this.size[rootX];
} else {
this.parent[rootY] = rootX;
this.size[rootX] += this.size[rootY];
if (this.rank[rootX] === this.rank[rootY]) {
this.rank[rootX]++;
}
}
}
getSize(x: number): number {
return this.size[this.find(x)];
}
}
- Union-Find manages disjoint sets with O(α(n)) amortized operations
- Path compression flattens trees during find operations
- Union by rank keeps trees balanced during union operations
- Use for connectivity problems: connected components, cycle detection, MST
- Optimizations are essential: naive implementation is too slow
- α(n) is effectively constant for practical values of n
- Recognize the pattern: "are they connected?" suggests Union-Find
- Graphs - Union-Find applications in graph algorithms
- Trees - Union-Find uses tree structure internally
- Graph Traversal (DFS/BFS) - Alternative approach for connectivity
- Minimum Spanning Tree - Kruskal's algorithm uses Union-Find
Key Takeaways
Union-Find manages disjoint sets with O(α(n)) amortized operations
Path compression flattens trees during find operations
Union by rank keeps trees balanced during union operations
Use for connectivity problems: connected components, cycle detection, MST
Optimizations are essential: naive implementation is too slow
α(n) is effectively constant for practical values of n
Recognize the pattern: "are they connected?" suggests Union-Find
What's next?