Topic Overview

Consistent Hashing

Learn consistent hashing for distributed systems. Understand how it enables efficient data distribution, handles node failures, and minimizes data movement during rebalancing.

Consistent hashing is a distributed hashing technique that minimizes data movement when nodes are added or removed from a distributed system. It's widely used in distributed caches, databases, and load balancers.


Problem with Traditional Hashing

Traditional Hashing Problem

Simple hash function:

hash(key) % num_nodes = node_index

Problem when nodes change:

Initial: 3 nodes
  Key "user1" → hash("user1") % 3 = 1 → Node 1
  Key "user2" → hash("user2") % 3 = 2 → Node 2
  Key "user3" → hash("user3") % 3 = 0 → Node 0

Add Node 3: Now 4 nodes
  Key "user1" → hash("user1") % 4 = 1 → Node 1 (same)
  Key "user2" → hash("user2") % 4 = 2 → Node 2 (same)
  Key "user3" → hash("user3") % 4 = 3 → Node 3 (MOVED!)

Result: Most keys need to be remapped (O(n) data movement)

Issues:

  • Massive remapping: Adding/removing nodes remaps most keys
  • Inefficient: High data movement cost
  • Load imbalance: Uneven distribution

Consistent Hashing Solution

Hash Ring

Concept: Map both nodes and keys to a circle (ring) using hash function.

Hash Ring (0 to 2^32-1):
  
  0
  |
  |    Key1 → Node A
  |    Key2 → Node B
  |    Key3 → Node C
  |
  2^32-1

How it works:

  1. Hash nodes: Hash each node to position on ring
  2. Hash keys: Hash each key to position on ring
  3. Assign key to node: Key belongs to first node clockwise

Example

Hash Ring:
  
  Node A: hash("nodeA") = 100
  Node B: hash("nodeB") = 200
  Node C: hash("nodeC") = 300
  
  Key "user1": hash("user1") = 150 → Node B (first node >= 150)
  Key "user2": hash("user2") = 250 → Node C (first node >= 250)
  Key "user3": hash("user3") = 50 → Node A (wraps around, first node >= 50)

Adding a node:

Add Node D: hash("nodeD") = 175
  
  Key "user1": 150 → Node D (was Node B, only this key moves!)
  Key "user2": 250 → Node C (unchanged)
  Key "user3": 50 → Node A (unchanged)
  
Result: Only keys between Node B and Node D move (minimal remapping)

Virtual Nodes

Problem: Basic consistent hashing can cause load imbalance.

Solution: Use virtual nodes (multiple hash positions per physical node).

Physical Node A:
  Virtual Node A1: hash("A-vnode-1") = 100
  Virtual Node A2: hash("A-vnode-2") = 400
  Virtual Node A3: hash("A-vnode-3") = 700

Physical Node B:
  Virtual Node B1: hash("B-vnode-1") = 200
  Virtual Node B2: hash("B-vnode-2") = 500
  Virtual Node B3: hash("B-vnode-3") = 800

Benefits:

  • Better load distribution: More even key distribution
  • Easier rebalancing: Move virtual nodes, not physical nodes
  • Fault tolerance: Failure spreads across virtual nodes

Examples

Basic Consistent Hashing

import hashlib

class ConsistentHash:
    def __init__(self, nodes=None, replicas=3):
        self.replicas = replicas  # Virtual nodes per physical node
        self.ring = {}
        self.sorted_keys = []
        
        if nodes:
            for node in nodes:
                self.add_node(node)
    
    def _hash(self, key):
        """Hash function (MD5, then mod 2^32)"""
        return int(hashlib.md5(key.encode()).hexdigest(), 16) % (2**32)
    
    def add_node(self, node):
        """Add node to hash ring"""
        for i in range(self.replicas):
            virtual_key = f"{node}-{i}"
            hash_value = self._hash(virtual_key)
            self.ring[hash_value] = node
            self.sorted_keys.append(hash_value)
        
        self.sorted_keys.sort()
    
    def remove_node(self, node):
        """Remove node from hash ring"""
        for i in range(self.replicas):
            virtual_key = f"{node}-{i}"
            hash_value = self._hash(virtual_key)
            del self.ring[hash_value]
            self.sorted_keys.remove(hash_value)
    
    def get_node(self, key):
        """Get node for key"""
        if not self.ring:
            return None
        
        hash_value = self._hash(key)
        
        # Find first node >= hash_value
        for key_hash in self.sorted_keys:
            if key_hash >= hash_value:
                return self.ring[key_hash]
        
        # Wrap around (first node)
        return self.ring[self.sorted_keys[0]]

# Usage
ch = ConsistentHash(nodes=['node1', 'node2', 'node3'], replicas=3)

# Get node for key
node = ch.get_node('user123')
print(f"Key 'user123' → {node}")

# Add node
ch.add_node('node4')
node = ch.get_node('user123')
print(f"After adding node4, 'user123' → {node}")

# Remove node
ch.remove_node('node2')
node = ch.get_node('user123')
print(f"After removing node2, 'user123' → {node}")

Distributed Cache with Consistent Hashing

class DistributedCache:
    def __init__(self, nodes):
        self.hash_ring = ConsistentHash(nodes, replicas=100)
        self.nodes = {node: {} for node in nodes}
    
    def get(self, key):
        """Get value from cache"""
        node = self.hash_ring.get_node(key)
        return self.nodes[node].get(key)
    
    def set(self, key, value):
        """Set value in cache"""
        node = self.hash_ring.get_node(key)
        self.nodes[node][key] = value
    
    def add_node(self, node):
        """Add new node"""
        self.hash_ring.add_node(node)
        self.nodes[node] = {}
        
        # Rebalance: Move keys from other nodes
        self.rebalance()
    
    def remove_node(self, node):
        """Remove node"""
        # Move keys to other nodes
        keys_to_move = list(self.nodes[node].keys())
        del self.nodes[node]
        self.hash_ring.remove_node(node)
        
        # Reassign keys
        for key in keys_to_move:
            value = self.nodes[node].get(key)  # Get before removal
            new_node = self.hash_ring.get_node(key)
            self.nodes[new_node][key] = value
    
    def rebalance(self):
        """Rebalance keys after node addition"""
        # Only keys that should move are rebalanced
        # (minimal data movement)
        pass

Common Pitfalls

  • Not using virtual nodes: Causes load imbalance. Fix: Use virtual nodes (100-200 per physical node)
  • Poor hash function: Causes clustering. Fix: Use good hash function (MD5, SHA-1)
  • Not handling node failures: System breaks when node fails. Fix: Implement replication, health checks
  • Incorrect key assignment: Keys assigned to wrong node. Fix: Verify hash function, check ring logic
  • Not rebalancing: Keys not moved after node changes. Fix: Implement rebalancing logic
  • Hash collisions: Multiple nodes at same position. Fix: Use larger hash space, handle collisions

Interview Questions

Beginner

Q: What is consistent hashing and why is it used in distributed systems?

A:

Consistent Hashing is a distributed hashing technique that minimizes data movement when nodes are added or removed.

Why used:

  1. Minimal remapping: Only keys near added/removed node move
  2. Efficient scaling: Easy to add/remove nodes
  3. Load balancing: Distributes keys evenly across nodes
  4. Fault tolerance: Handles node failures gracefully

How it works:

  1. Hash ring: Map nodes and keys to circle (0 to 2^32-1)
  2. Key assignment: Key belongs to first node clockwise
  3. Node changes: Only nearby keys remap (minimal movement)

Example:

Hash Ring:
  Node A: 100
  Node B: 200
  Node C: 300
  
  Key "user1": 150 → Node B
  Key "user2": 250 → Node C
  
Add Node D: 175
  Key "user1": 150 → Node D (only this key moves!)
  Key "user2": 250 → Node C (unchanged)

Benefits vs traditional hashing:

  • Traditional: hash(key) % n → Most keys remap when n changes
  • Consistent: Only keys near changed node remap (O(k/n) vs O(n))

Intermediate

Q: Explain how consistent hashing handles node additions and removals. What are virtual nodes and why are they used?

A:

Node Addition:

Before: Node A (100), Node B (200), Node C (300)
  Key "user1" (150) → Node B

Add Node D (175):
  Key "user1" (150) → Node D (moves from Node B)
  Other keys: Unchanged
  
Result: Only keys between Node B and Node D move

Node Removal:

Before: Node A (100), Node B (200), Node C (300)
  Key "user1" (150) → Node B

Remove Node B:
  Key "user1" (150) → Node C (moves to next node)
  Other keys: Unchanged
  
Result: Only keys on removed node move

Virtual Nodes:

Problem: Basic consistent hashing can cause load imbalance.

Solution: Each physical node has multiple virtual nodes on ring.

Physical Node A:
  Virtual A1: 100
  Virtual A2: 400
  Virtual A3: 700

Physical Node B:
  Virtual B1: 200
  Virtual B2: 500
  Virtual B3: 800

Benefits:

  1. Better distribution: More even key distribution
  2. Easier rebalancing: Move virtual nodes, not physical
  3. Fault tolerance: Failure spreads across virtual nodes

Example:

Without virtual nodes:
  Node A: 50% of keys
  Node B: 30% of keys
  Node C: 20% of keys (imbalanced!)

With virtual nodes (100 per node):
  Node A: 33.3% of keys
  Node B: 33.3% of keys
  Node C: 33.3% of keys (balanced!)

Senior

Q: Design a distributed key-value store using consistent hashing that handles 100 million keys across 1000 nodes. How do you handle replication, rebalancing, and node failures?

A:

class DistributedKeyValueStore {
  private hashRing: ConsistentHashRing;
  private nodes: Map<string, Node>;
  private replicationFactor: number = 3;
  
  constructor(nodes: string[]) {
    this.hashRing = new ConsistentHashRing({
      virtualNodes: 200, // 200 virtual nodes per physical node
      hashFunction: 'sha256'
    });
    
    this.nodes = new Map();
    for (const node of nodes) {
      this.addNode(node);
    }
  }
  
  // 1. Consistent Hashing with Virtual Nodes
  class ConsistentHashRing {
    private ring: SortedMap<number, string>; // hash → node
    private virtualNodes: number;
    
    constructor(config: { virtualNodes: number, hashFunction: string }) {
      this.virtualNodes = config.virtualNodes;
      this.ring = new SortedMap();
    }
    
    addNode(node: string): void {
      for (let i = 0; i < this.virtualNodes; i++) {
        const virtualKey = `${node}-vnode-${i}`;
        const hash = this.hash(virtualKey);
        this.ring.set(hash, node);
      }
    }
    
    getNodes(key: string, count: number): string[] {
      const hash = this.hash(key);
      const nodes: string[] = [];
      
      // Find first node
      let entry = this.ring.ceiling(hash);
      if (!entry) {
        entry = this.ring.first(); // Wrap around
      }
      
      // Get 'count' nodes (for replication)
      for (let i = 0; i < count; i++) {
        nodes.push(entry.value);
        entry = this.ring.next(entry);
        if (!entry) {
          entry = this.ring.first(); // Wrap around
        }
      }
      
      return nodes;
    }
  }
  
  // 2. Replication
  async set(key: string, value: string): Promise<void> {
    // Get primary and replica nodes
    const nodes = this.hashRing.getNodes(key, this.replicationFactor);
    const primary = nodes[0];
    const replicas = nodes.slice(1);
    
    // Write to primary and replicas
    await Promise.all([
      this.nodes.get(primary).write(key, value),
      ...replicas.map(node => this.nodes.get(node).write(key, value))
    ]);
  }
  
  async get(key: string): Promise<string | null> {
    // Get primary and replica nodes
    const nodes = this.hashRing.getNodes(key, this.replicationFactor);
    
    // Read from primary (or first available)
    for (const node of nodes) {
      try {
        const value = await this.nodes.get(node).read(key);
        if (value) {
          return value;
        }
      } catch (error) {
        continue; // Try next node
      }
    }
    
    return null;
  }
  
  // 3. Node Addition (Rebalancing)
  async addNode(node: string): Promise<void> {
    // Add to hash ring
    this.hashRing.addNode(node);
    this.nodes.set(node, new Node(node));
    
    // Rebalance: Move keys that should be on new node
    await this.rebalance();
  }
  
  async rebalance(): Promise<void> {
    // Only rebalance keys that should move
    const keysToMove: Array<{ key: string, from: string, to: string }> = [];
    
    for (const [nodeId, node] of this.nodes.entries()) {
      for (const key of node.getKeys()) {
        const correctNodes = this.hashRing.getNodes(key, this.replicationFactor);
        if (!correctNodes.includes(nodeId)) {
          keysToMove.push({
            key,
            from: nodeId,
            to: correctNodes[0]
          });
        }
      }
    }
    
    // Move keys
    for (const { key, from, to } of keysToMove) {
      const value = await this.nodes.get(from).read(key);
      await this.nodes.get(to).write(key, value);
      await this.nodes.get(from).delete(key);
    }
  }
  
  // 4. Node Failure Handling
  async handleNodeFailure(failedNode: string): Promise<void> {
    // Remove from hash ring
    this.hashRing.removeNode(failedNode);
    
    // Replicate data from failed node
    const failedNodeData = this.nodes.get(failedNode).getAllKeys();
    
    for (const key of failedNodeData) {
      // Get new replica nodes
      const nodes = this.hashRing.getNodes(key, this.replicationFactor);
      
      // Replicate from other replicas
      for (const node of nodes) {
        if (node !== failedNode) {
          const value = await this.nodes.get(node).read(key);
          // Replicate to new node
          const newNode = nodes.find(n => n !== failedNode && !this.nodes.has(n));
          if (newNode) {
            await this.nodes.get(newNode).write(key, value);
          }
        }
      }
    }
    
    // Remove failed node
    this.nodes.delete(failedNode);
  }
  
  // 5. Health Monitoring
  async monitorHealth(): Promise<HealthStatus> {
    const stats = {
      totalKeys: 0,
      nodeLoads: new Map<string, number>(),
      replicationStatus: new Map<string, number>()
    };
    
    for (const [nodeId, node] of this.nodes.entries()) {
      const keyCount = node.getKeyCount();
      stats.totalKeys += keyCount;
      stats.nodeLoads.set(nodeId, keyCount);
      
      // Check replication
      const replicationCount = await this.checkReplication(nodeId);
      stats.replicationStatus.set(nodeId, replicationCount);
    }
    
    return stats;
  }
}

Features:

  1. Virtual nodes: 200 per physical node for even distribution
  2. Replication: 3 replicas per key for fault tolerance
  3. Rebalancing: Only move keys that should move (minimal data movement)
  4. Node failure: Automatic replication from other replicas
  5. Health monitoring: Track load, replication status
  6. Scalability: Handles 100M keys across 1000 nodes

Key Takeaways

  • Consistent hashing: Minimizes data movement when nodes added/removed
  • Hash ring: Map nodes and keys to circle, assign key to first node clockwise
  • Virtual nodes: Multiple hash positions per physical node for better distribution
  • Node addition: Only keys between old and new node positions move
  • Node removal: Only keys on removed node move to next node
  • Replication: Store key on multiple nodes (primary + replicas) for fault tolerance
  • Rebalancing: Move only keys that should move (minimal data movement)
  • Best practices: Use virtual nodes (100-200), implement replication, handle failures gracefully

About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.