Topic Overview

Replication Strategies

Learn database replication strategies: master-slave, master-master, synchronous vs asynchronous replication, and handling replication lag.

Database replication creates copies of data across multiple servers for high availability, performance, and disaster recovery. Different replication strategies offer different trade-offs.


What is Replication?

Replication creates copies of data on multiple servers:

  • High availability: If one server fails, others available
  • Performance: Distribute read load across replicas
  • Disaster recovery: Data backed up on multiple servers
  • Geographic distribution: Serve users from nearby servers

Types:

  • Master-Slave (Primary-Replica): One master, multiple replicas
  • Master-Master (Multi-Master): Multiple masters
  • Synchronous: Wait for all replicas
  • Asynchronous: Don't wait for replicas

Master-Slave Replication

Master-Slave has one master (writes) and multiple slaves (reads).

Architecture

Master (Primary)
  ├─ Write operations
  ├─ Replicate to slaves
  └─ Handle all writes

Slave 1 (Replica)
  ├─ Read operations
  └─ Receive updates from master

Slave 2 (Replica)
  ├─ Read operations
  └─ Receive updates from master

How it Works

1. Write to master
2. Master logs change (binlog)
3. Master sends change to slaves
4. Slaves apply change
5. Reads can go to any replica

Advantages

  • Simple: Clear master-slave relationship
  • Read scaling: Distribute reads across replicas
  • Backup: Slaves can be used for backup
  • Failover: Promote slave to master on failure

Disadvantages

  • Single point of failure: Master fails, no writes
  • Replication lag: Slaves may be slightly behind
  • Write bottleneck: All writes go to master

Master-Master Replication

Master-Master has multiple masters, all can accept writes.

Architecture

Master 1
  ├─ Accepts writes
  └─ Replicates to Master 2

Master 2
  ├─ Accepts writes
  └─ Replicates to Master 1

Advantages

  • No single point of failure: Multiple masters
  • Write scaling: Writes can go to any master
  • Geographic distribution: Masters in different regions

Disadvantages

  • Conflict resolution: Writes to same data on different masters
  • Complexity: More complex than master-slave
  • Consistency: Eventual consistency

Synchronous vs Asynchronous

Synchronous Replication

Wait for all replicas to acknowledge:

Write → Master
Master → Replica 1 (wait for ack)
Master → Replica 2 (wait for ack)
Master → Client: Success (after all acks)

Characteristics:

  • Strong consistency: All replicas have same data
  • Slower: Must wait for replicas
  • Availability: If replica down, write fails

Asynchronous Replication

Don't wait for replicas:

Write → Master
Master → Client: Success (immediately)
Master → Replica 1 (async)
Master → Replica 2 (async)

Characteristics:

  • Faster: Don't wait for replicas
  • Eventual consistency: Replicas may be slightly behind
  • Availability: Replica down doesn't block writes

Replication Lag

Replication lag is the delay between master and replica.

Causes:

  • Network latency: Slow network between master and replica
  • High write load: Master can't replicate fast enough
  • Replica performance: Replica can't apply changes fast enough

Problems:

  • Stale reads: Read from replica may be outdated
  • Inconsistent data: Different replicas have different data

Solutions:

  • Read from master: For critical reads
  • Monitor lag: Alert if lag too high
  • Optimize replication: Faster network, better hardware

Examples

Master-Slave Replication

class MasterSlaveReplication:
    def __init__(self):
        self.master = Database('master')
        self.slaves = [
            Database('slave1'),
            Database('slave2'),
            Database('slave3')
        ]
    
    async def write(self, data):
        """Write to master, replicate to slaves"""
        # Write to master
        await self.master.write(data)
        
        # Replicate to slaves (async)
        for slave in self.slaves:
            asyncio.create_task(self.replicate(slave, data))
    
    async def read(self, query):
        """Read from slave (load balance)"""
        # Select slave (round robin or least loaded)
        slave = self.select_slave()
        return await slave.read(query)
    
    async def replicate(self, slave, data):
        """Replicate data to slave"""
        await slave.apply(data)

Synchronous Replication

class SynchronousReplication:
    async def write(self, data):
        """Write with synchronous replication"""
        # Write to master
        await self.master.write(data)
        
        # Wait for all replicas
        await asyncio.gather(*[
            self.replica1.acknowledge(data),
            self.replica2.acknowledge(data),
            self.replica3.acknowledge(data)
        ])
        
        # All replicas updated
        return 'success'

Common Pitfalls

  • Replication lag: Not monitoring lag. Fix: Monitor lag, read from master for critical reads
  • Split-brain: Multiple masters think they're primary. Fix: Use consensus algorithm
  • Not handling conflicts: Master-master conflicts. Fix: Conflict resolution strategy
  • Single point of failure: Master fails, no writes. Fix: Automatic failover, multiple masters

Interview Questions

Beginner

Q: What is database replication and what are the different strategies?

A:

Database replication creates copies of data on multiple servers.

Why used:

  • High availability: If one server fails, others available
  • Performance: Distribute read load
  • Disaster recovery: Data backed up

Strategies:

1. Master-Slave (Primary-Replica):

Master: Handles all writes, replicates to slaves
Slaves: Handle reads, receive updates from master

2. Master-Master (Multi-Master):

Multiple masters: All can accept writes
Replicate to each other

3. Synchronous:

Wait for all replicas to acknowledge
Strong consistency, slower

4. Asynchronous:

Don't wait for replicas
Faster, eventual consistency

Example:

Master-Slave:
  Write → Master → Replicate → Slaves
  Read → Slaves (load balanced)

Intermediate

Q: Explain master-slave vs master-master replication. What are the trade-offs?

A:

Master-Slave (Primary-Replica):

Architecture:

Master: All writes, replicates to slaves
Slaves: Reads only, receive updates

Advantages:

  • Simple: Clear master-slave relationship
  • Read scaling: Distribute reads
  • Backup: Slaves for backup
  • Failover: Promote slave to master

Disadvantages:

  • Single point of failure: Master fails, no writes
  • Replication lag: Slaves may be behind
  • Write bottleneck: All writes to master

Master-Master (Multi-Master):

Architecture:

Master 1: Accepts writes, replicates to Master 2
Master 2: Accepts writes, replicates to Master 1

Advantages:

  • No single point of failure: Multiple masters
  • Write scaling: Writes to any master
  • Geographic distribution: Masters in different regions

Disadvantages:

  • Conflict resolution: Writes to same data
  • Complexity: More complex
  • Consistency: Eventual consistency

When to use:

  • Master-Slave: Simple, read-heavy workloads
  • Master-Master: High availability, write scaling needed

Senior

Q: Design a replication system for a global database handling millions of writes per day. How do you handle replication lag, failover, and ensure consistency?

A:

class GlobalReplicationSystem {
  private masters: Master[];
  private replicas: Replica[];
  private replicationManager: ReplicationManager;
  private failoverManager: FailoverManager;
  
  constructor() {
    // Multi-master for high availability
    this.masters = [
      new Master('us-east'),
      new Master('us-west'),
      new Master('europe')
    ];
    
    // Replicas in each region
    this.replicas = this.createReplicas();
    this.replicationManager = new ReplicationManager();
    this.failoverManager = new FailoverManager();
  }
  
  // 1. Multi-Master Replication
  class ReplicationManager {
    async write(data: Data, region: string): Promise<void> {
      const master = this.getMaster(region);
      
      // Write to local master
      await master.write(data);
      
      // Replicate to other masters (async)
      await Promise.all(
        this.masters
          .filter(m => m.region !== region)
          .map(m => this.replicate(m, data))
      );
      
      // Replicate to local replicas
      await this.replicateToReplicas(region, data);
    }
    
    async replicate(master: Master, data: Data): Promise<void> {
      // Conflict resolution
      const conflict = await this.detectConflict(master, data);
      if (conflict) {
        await this.resolveConflict(master, data, conflict);
      } else {
        await master.apply(data);
      }
    }
  }
  
  // 2. Replication Lag Monitoring
  class LagMonitor {
    async monitorLag(): Promise<void> {
      for (const replica of this.replicas) {
        const lag = await this.measureLag(replica);
        
        if (lag > 1000) { // 1 second
          this.alert(`High replication lag: ${lag}ms`);
        }
      }
    }
    
    async measureLag(replica: Replica): Promise<number> {
      const masterTime = await this.master.getLastWriteTime();
      const replicaTime = await replica.getLastAppliedTime();
      return masterTime - replicaTime;
    }
  }
  
  // 3. Failover
  class FailoverManager {
    async handleMasterFailure(failedMaster: Master): Promise<void> {
      // Detect failure
      if (!await this.healthCheck(failedMaster)) {
        // Promote replica to master
        const newMaster = await this.promoteReplica(failedMaster.region);
        
        // Update routing
        await this.updateRouting(failedMaster, newMaster);
        
        // Replicate from other masters
        await this.catchUp(newMaster);
      }
    }
  }
  
  // 4. Consistency
  class ConsistencyManager {
    async ensureConsistency(): Promise<void> {
      // Read from master for critical reads
      // Read from replica for non-critical
      // Monitor lag, route accordingly
    }
  }
}

Features:

  1. Multi-master: High availability, write scaling
  2. Lag monitoring: Track and alert on lag
  3. Failover: Automatic master promotion
  4. Consistency: Route reads based on requirements

Key Takeaways

  • Replication: Create copies of data on multiple servers
  • Master-Slave: One master (writes), multiple slaves (reads)
  • Master-Master: Multiple masters, all can write
  • Synchronous: Wait for replicas (strong consistency, slower)
  • Asynchronous: Don't wait (faster, eventual consistency)
  • Replication lag: Delay between master and replica
  • Best practices: Monitor lag, handle failover, ensure consistency

About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.