Topic Overview
Replication Strategies
Learn database replication strategies: master-slave, master-master, synchronous vs asynchronous replication, and handling replication lag.
Database replication creates copies of data across multiple servers for high availability, performance, and disaster recovery. Different replication strategies offer different trade-offs.
What is Replication?
Replication creates copies of data on multiple servers:
- High availability: If one server fails, others available
- Performance: Distribute read load across replicas
- Disaster recovery: Data backed up on multiple servers
- Geographic distribution: Serve users from nearby servers
Types:
- Master-Slave (Primary-Replica): One master, multiple replicas
- Master-Master (Multi-Master): Multiple masters
- Synchronous: Wait for all replicas
- Asynchronous: Don't wait for replicas
Master-Slave Replication
Master-Slave has one master (writes) and multiple slaves (reads).
Architecture
Master (Primary)
├─ Write operations
├─ Replicate to slaves
└─ Handle all writes
Slave 1 (Replica)
├─ Read operations
└─ Receive updates from master
Slave 2 (Replica)
├─ Read operations
└─ Receive updates from master
How it Works
1. Write to master
2. Master logs change (binlog)
3. Master sends change to slaves
4. Slaves apply change
5. Reads can go to any replica
Advantages
- Simple: Clear master-slave relationship
- Read scaling: Distribute reads across replicas
- Backup: Slaves can be used for backup
- Failover: Promote slave to master on failure
Disadvantages
- Single point of failure: Master fails, no writes
- Replication lag: Slaves may be slightly behind
- Write bottleneck: All writes go to master
Master-Master Replication
Master-Master has multiple masters, all can accept writes.
Architecture
Master 1
├─ Accepts writes
└─ Replicates to Master 2
Master 2
├─ Accepts writes
└─ Replicates to Master 1
Advantages
- No single point of failure: Multiple masters
- Write scaling: Writes can go to any master
- Geographic distribution: Masters in different regions
Disadvantages
- Conflict resolution: Writes to same data on different masters
- Complexity: More complex than master-slave
- Consistency: Eventual consistency
Synchronous vs Asynchronous
Synchronous Replication
Wait for all replicas to acknowledge:
Write → Master
Master → Replica 1 (wait for ack)
Master → Replica 2 (wait for ack)
Master → Client: Success (after all acks)
Characteristics:
- Strong consistency: All replicas have same data
- Slower: Must wait for replicas
- Availability: If replica down, write fails
Asynchronous Replication
Don't wait for replicas:
Write → Master
Master → Client: Success (immediately)
Master → Replica 1 (async)
Master → Replica 2 (async)
Characteristics:
- Faster: Don't wait for replicas
- Eventual consistency: Replicas may be slightly behind
- Availability: Replica down doesn't block writes
Replication Lag
Replication lag is the delay between master and replica.
Causes:
- Network latency: Slow network between master and replica
- High write load: Master can't replicate fast enough
- Replica performance: Replica can't apply changes fast enough
Problems:
- Stale reads: Read from replica may be outdated
- Inconsistent data: Different replicas have different data
Solutions:
- Read from master: For critical reads
- Monitor lag: Alert if lag too high
- Optimize replication: Faster network, better hardware
Examples
Master-Slave Replication
class MasterSlaveReplication:
def __init__(self):
self.master = Database('master')
self.slaves = [
Database('slave1'),
Database('slave2'),
Database('slave3')
]
async def write(self, data):
"""Write to master, replicate to slaves"""
# Write to master
await self.master.write(data)
# Replicate to slaves (async)
for slave in self.slaves:
asyncio.create_task(self.replicate(slave, data))
async def read(self, query):
"""Read from slave (load balance)"""
# Select slave (round robin or least loaded)
slave = self.select_slave()
return await slave.read(query)
async def replicate(self, slave, data):
"""Replicate data to slave"""
await slave.apply(data)
Synchronous Replication
class SynchronousReplication:
async def write(self, data):
"""Write with synchronous replication"""
# Write to master
await self.master.write(data)
# Wait for all replicas
await asyncio.gather(*[
self.replica1.acknowledge(data),
self.replica2.acknowledge(data),
self.replica3.acknowledge(data)
])
# All replicas updated
return 'success'
Common Pitfalls
- Replication lag: Not monitoring lag. Fix: Monitor lag, read from master for critical reads
- Split-brain: Multiple masters think they're primary. Fix: Use consensus algorithm
- Not handling conflicts: Master-master conflicts. Fix: Conflict resolution strategy
- Single point of failure: Master fails, no writes. Fix: Automatic failover, multiple masters
Interview Questions
Beginner
Q: What is database replication and what are the different strategies?
A:
Database replication creates copies of data on multiple servers.
Why used:
- High availability: If one server fails, others available
- Performance: Distribute read load
- Disaster recovery: Data backed up
Strategies:
1. Master-Slave (Primary-Replica):
Master: Handles all writes, replicates to slaves
Slaves: Handle reads, receive updates from master
2. Master-Master (Multi-Master):
Multiple masters: All can accept writes
Replicate to each other
3. Synchronous:
Wait for all replicas to acknowledge
Strong consistency, slower
4. Asynchronous:
Don't wait for replicas
Faster, eventual consistency
Example:
Master-Slave:
Write → Master → Replicate → Slaves
Read → Slaves (load balanced)
Intermediate
Q: Explain master-slave vs master-master replication. What are the trade-offs?
A:
Master-Slave (Primary-Replica):
Architecture:
Master: All writes, replicates to slaves
Slaves: Reads only, receive updates
Advantages:
- Simple: Clear master-slave relationship
- Read scaling: Distribute reads
- Backup: Slaves for backup
- Failover: Promote slave to master
Disadvantages:
- Single point of failure: Master fails, no writes
- Replication lag: Slaves may be behind
- Write bottleneck: All writes to master
Master-Master (Multi-Master):
Architecture:
Master 1: Accepts writes, replicates to Master 2
Master 2: Accepts writes, replicates to Master 1
Advantages:
- No single point of failure: Multiple masters
- Write scaling: Writes to any master
- Geographic distribution: Masters in different regions
Disadvantages:
- Conflict resolution: Writes to same data
- Complexity: More complex
- Consistency: Eventual consistency
When to use:
- Master-Slave: Simple, read-heavy workloads
- Master-Master: High availability, write scaling needed
Senior
Q: Design a replication system for a global database handling millions of writes per day. How do you handle replication lag, failover, and ensure consistency?
A:
class GlobalReplicationSystem {
private masters: Master[];
private replicas: Replica[];
private replicationManager: ReplicationManager;
private failoverManager: FailoverManager;
constructor() {
// Multi-master for high availability
this.masters = [
new Master('us-east'),
new Master('us-west'),
new Master('europe')
];
// Replicas in each region
this.replicas = this.createReplicas();
this.replicationManager = new ReplicationManager();
this.failoverManager = new FailoverManager();
}
// 1. Multi-Master Replication
class ReplicationManager {
async write(data: Data, region: string): Promise<void> {
const master = this.getMaster(region);
// Write to local master
await master.write(data);
// Replicate to other masters (async)
await Promise.all(
this.masters
.filter(m => m.region !== region)
.map(m => this.replicate(m, data))
);
// Replicate to local replicas
await this.replicateToReplicas(region, data);
}
async replicate(master: Master, data: Data): Promise<void> {
// Conflict resolution
const conflict = await this.detectConflict(master, data);
if (conflict) {
await this.resolveConflict(master, data, conflict);
} else {
await master.apply(data);
}
}
}
// 2. Replication Lag Monitoring
class LagMonitor {
async monitorLag(): Promise<void> {
for (const replica of this.replicas) {
const lag = await this.measureLag(replica);
if (lag > 1000) { // 1 second
this.alert(`High replication lag: ${lag}ms`);
}
}
}
async measureLag(replica: Replica): Promise<number> {
const masterTime = await this.master.getLastWriteTime();
const replicaTime = await replica.getLastAppliedTime();
return masterTime - replicaTime;
}
}
// 3. Failover
class FailoverManager {
async handleMasterFailure(failedMaster: Master): Promise<void> {
// Detect failure
if (!await this.healthCheck(failedMaster)) {
// Promote replica to master
const newMaster = await this.promoteReplica(failedMaster.region);
// Update routing
await this.updateRouting(failedMaster, newMaster);
// Replicate from other masters
await this.catchUp(newMaster);
}
}
}
// 4. Consistency
class ConsistencyManager {
async ensureConsistency(): Promise<void> {
// Read from master for critical reads
// Read from replica for non-critical
// Monitor lag, route accordingly
}
}
}
Features:
- Multi-master: High availability, write scaling
- Lag monitoring: Track and alert on lag
- Failover: Automatic master promotion
- Consistency: Route reads based on requirements
Key Takeaways
- Replication: Create copies of data on multiple servers
- Master-Slave: One master (writes), multiple slaves (reads)
- Master-Master: Multiple masters, all can write
- Synchronous: Wait for replicas (strong consistency, slower)
- Asynchronous: Don't wait (faster, eventual consistency)
- Replication lag: Delay between master and replica
- Best practices: Monitor lag, handle failover, ensure consistency