Database Topic
Data Replication
Master database replication strategies for high availability and performance. Essential for distributed systems and database interviews.
Data replication is the process of copying and maintaining database objects across multiple database servers to improve availability, performance, and fault tolerance.
Why Replicate?
High Availability
If one server fails, others can continue serving requests.
Primary Server (fails)
↓
Secondary Server (takes over)
Performance
Distribute read load across multiple servers.
Read Request → Load Balancer → Replica 1, 2, or 3
Write Request → Primary Server
Geographic Distribution
Place data closer to users.
US Users → US Replica
EU Users → EU Replica
Asia Users → Asia Replica
Disaster Recovery
Backup copies in different locations.
Replication Models
Master-Slave (Primary-Replica)
One primary server accepts writes, replicas handle reads.
Primary (Writes + Reads)
↓ (replication)
Replica 1 (Reads only)
Replica 2 (Reads only)
Replica 3 (Reads only)
Characteristics:
- Simple to understand
- Single point of failure (primary)
- Asynchronous or synchronous replication
- Replicas can lag behind primary
Use case: Read scaling, backup
Master-Master (Multi-Master)
Multiple servers can accept writes.
Master 1 (Writes + Reads) ←→ Master 2 (Writes + Reads)
↓ ↓
Replica 1 Replica 2
Characteristics:
- No single point of failure
- More complex conflict resolution
- Can have write conflicts
- Requires careful design
Use case: Multi-region deployments, high availability
Replication Methods
Statement-Based Replication (SBR)
Replicate SQL statements.
-- Primary executes:
UPDATE users SET balance = balance - 100 WHERE id = 1;
-- Replica executes same statement:
UPDATE users SET balance = balance - 100 WHERE id = 1;
Pros:
- Compact (only SQL statements)
- Easy to understand
Cons:
- Non-deterministic functions (NOW(), RAND())
- Stored procedures/triggers can cause issues
Row-Based Replication (RBR)
Replicate changed rows.
Primary: Row (id=1, balance=900) changed to (id=1, balance=800)
Replica: Apply same row change
Pros:
- More reliable (exact data changes)
- Handles non-deterministic functions
Cons:
- Larger log size
- More network bandwidth
Mixed Replication
Uses statement-based by default, switches to row-based when needed.
Synchronization Modes
Synchronous Replication
Primary waits for replica to confirm before committing.
1. Primary writes to log
2. Primary sends to replica
3. Replica writes to log
4. Replica confirms to primary
5. Primary commits transaction
Pros:
- No data loss if primary fails
- Strong consistency
Cons:
- Slower writes (waits for replica)
- Replica failure blocks primary
Use case: Financial systems, critical data
Asynchronous Replication
Primary doesn't wait for replica confirmation.
1. Primary writes to log
2. Primary commits transaction
3. Primary sends to replica (async)
4. Replica applies changes
Pros:
- Fast writes
- Primary not blocked by replica issues
Cons:
- Possible data loss if primary fails
- Replica lag (eventual consistency)
Use case: Most web applications, read scaling
Semi-Synchronous Replication
Primary waits for at least one replica (not all).
Primary waits for at least 1 of N replicas to confirm
Balance: Between sync and async
Replication Topologies
Single Replica
Primary → Replica
Simple, but single point of failure for replica.
Chain Replication
Primary → Replica 1 → Replica 2 → Replica 3
Reduces load on primary, but longer replication chain.
Star Topology
Primary
/ | \
Replica1 Replica2 Replica3
All replicas get updates directly from primary.
Tree Topology
Primary
/ \
Replica1 Replica2
| |
Replica3 Replica4
Hierarchical structure, good for geographic distribution.
Consistency Models
Strong Consistency
All replicas have same data at all times.
Write to Primary → Immediately visible on all replicas
Challenge: Requires synchronous replication, slower
Eventual Consistency
Replicas will converge to same state, but may differ temporarily.
Write to Primary → Eventually visible on all replicas
Read from Replica → May see slightly stale data
Challenge: Application must handle stale reads
Read Your Writes
User always sees their own writes, even if reading from replica.
User writes → Primary
User reads → Primary (or replica with user's writes)
Implementation: Route user's reads to primary or wait for replica to catch up
Common Patterns
Read Replicas
Use replicas for read-heavy workloads.
# Write to primary
db.write(user_data) # Goes to primary
# Read from replica
user = replica_db.read(user_id) # Goes to replica
Failover
Automatically promote replica to primary if primary fails.
1. Monitor primary health
2. Detect failure
3. Promote replica to primary
4. Update DNS/load balancer
5. Application reconnects to new primary
Geographic Replication
Replicate to different regions.
US Primary → EU Replica → Asia Replica
Benefits:
- Lower latency for local users
- Disaster recovery
- Compliance (data residency)
Database-Specific Examples
PostgreSQL Streaming Replication
-- On primary: Enable WAL archiving
wal_level = replica
max_wal_senders = 3
-- On replica: Configure replication
primary_conninfo = 'host=primary.example.com port=5432 user=replicator'
MySQL Replication
-- On primary: Enable binary logging
[mysqld]
log-bin=mysql-bin
server-id=1
-- On replica: Configure replication
CHANGE MASTER TO
MASTER_HOST='primary.example.com',
MASTER_USER='replicator',
MASTER_PASSWORD='password',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=154;
MongoDB Replica Set
// Initialize replica set
rs.initiate({
_id: "rs0",
members: [
{ _id: 0, host: "primary:27017" },
{ _id: 1, host: "replica1:27017" },
{ _id: 2, host: "replica2:27017" }
]
});
Monitoring Replication
Key Metrics
- Replication lag: How far behind replicas are
- Throughput: Replication rate (bytes/second)
- Errors: Replication failures
- Health: Replica status (up/down)
Monitoring Queries
-- PostgreSQL: Check replication lag
SELECT
client_addr,
state,
pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) as lag_bytes
FROM pg_stat_replication;
-- MySQL: Check replication status
SHOW SLAVE STATUS\G
Best Practices
- Monitor lag: Set up alerts for replication lag
- Test failover: Regularly test failover procedures
- Backup replicas: Use replicas for backups to reduce primary load
- Geographic distribution: Place replicas close to users
- Consistent backups: Backup from replicas, not primary
- Handle stale reads: Design application to tolerate eventual consistency
Common Challenges
Replication Lag
Replicas fall behind primary, causing stale reads.
Solutions:
- Monitor and alert on lag
- Route critical reads to primary
- Use read-your-writes consistency
Split-Brain
Multiple primaries think they're the primary.
Solutions:
- Use quorum-based election
- Proper failover procedures
- Network partitioning handling
Write Conflicts
In multi-master setups, simultaneous writes to same data.
Solutions:
- Conflict resolution strategies
- Application-level conflict handling
- Last-write-wins or merge strategies
Trade-offs
Synchronous Replication
- Pros: No data loss, strong consistency
- Cons: Slower writes, primary blocked by replica issues
Asynchronous Replication
- Pros: Fast writes, primary not blocked
- Cons: Possible data loss, eventual consistency
Multiple Replicas
- Pros: High availability, read scaling
- Cons: More complexity, more resources, consistency challenges
Interview Questions
1. Beginner Question
Q: What is database replication, and why is it important?
A: Database replication is the process of copying and maintaining database data across multiple servers.
Why important:
- High availability: If one server fails, others can continue serving
- Performance: Distribute read load across multiple servers
- Geographic distribution: Place data closer to users
- Disaster recovery: Backup copies in different locations
Example:
Primary Server (handles writes)
↓ (replication)
Replica 1 (handles reads)
Replica 2 (handles reads)
Replica 3 (handles reads)
Benefits:
- Reads can be distributed across replicas (read scaling)
- If primary fails, a replica can be promoted (high availability)
- Replicas can be in different regions (lower latency)
2. Intermediate Question
Q: Explain the difference between synchronous and asynchronous replication. When would you use each?
A:
Synchronous replication:
- Primary waits for replica to confirm before committing
- Pros: No data loss if primary fails, strong consistency
- Cons: Slower writes (waits for replica), primary blocked if replica fails
- Use when: Financial systems, critical data where consistency > performance
Asynchronous replication:
- Primary doesn't wait for replica confirmation
- Pros: Fast writes, primary not blocked by replica issues
- Cons: Possible data loss if primary fails, eventual consistency (replica lag)
- Use when: Most web applications, read scaling, performance > strong consistency
Example:
-- Synchronous: Primary waits
Primary: Write → Send to Replica → Wait for confirmation → Commit
-- If replica is slow, write is slow
-- Asynchronous: Primary doesn't wait
Primary: Write → Commit → Send to Replica (async)
-- Write is fast, but replica may lag behind
Trade-off: Consistency vs. Performance. Choose based on requirements.
3. Senior-Level System Question
Q: Design a globally distributed database system for a social media platform with 1B users. The system needs to support: low latency worldwide, high availability, and handle 10M writes/second. How would you implement replication?
A:
Multi-region replication strategy:
-
Regional primaries with cross-region replication:
US Primary → US Replicas (3) ↓ (async replication) EU Primary → EU Replicas (3) ↓ (async replication) Asia Primary → Asia Replicas (3) -
Sharding by user_id:
# Route users to their region def get_user_shard(user_id): region = get_user_region(user_id) # Based on signup location return f"{region}_primary" -
Replication topology:
- Within region: Synchronous replication (low latency, strong consistency)
- Cross-region: Asynchronous replication (high latency, eventual consistency)
-
Read strategy:
def read_user_data(user_id): user_region = get_user_region(user_id) # Read from local replica (low latency) replica = get_replica(user_region) return replica.query_user(user_id) -
Write strategy:
def write_user_data(user_id, data): user_region = get_user_region(user_id) primary = get_primary(user_region) # Write to local primary (synchronous within region) primary.write_user(user_id, data) # Replicate to other regions (asynchronous) async_replicate_to_other_regions(user_id, data) -
Conflict resolution:
# Last-write-wins with vector clocks def resolve_conflict(user_data_us, user_data_eu): if user_data_us.timestamp > user_data_eu.timestamp: return user_data_us else: return user_data_eu -
Failover strategy:
# Automatic failover within region def handle_primary_failure(region): # Promote replica to primary new_primary = elect_replica(region) # Update DNS/load balancer update_routing(region, new_primary) # Notify other regions notify_regions_of_failover(region, new_primary) -
Monitoring:
- Replication lag (alert if > 5 seconds)
- Primary health (heartbeat monitoring)
- Cross-region latency
- Conflict rates
Trade-offs:
- Consistency: Eventual consistency across regions (acceptable for social media)
- Latency: Low latency within region, higher across regions
- Complexity: Multi-region replication adds operational complexity
- Cost: Multiple replicas and cross-region bandwidth
Key Takeaways
- Replication provides high availability—if one server fails, others can continue
- Master-slave (primary-replica) is most common—primary handles writes, replicas handle reads
- Synchronous replication ensures no data loss but slows writes
- Asynchronous replication is faster but may lose data if primary fails
- Read replicas distribute read load and improve performance
- Geographic replication places data closer to users for lower latency
- Replication lag is inevitable with async replication—design for eventual consistency
- Failover requires automatic promotion of replica to primary
- Monitor replication lag and set up alerts for high lag
- Choose based on requirements—consistency vs. performance trade-offs
- Multi-master replication is complex but provides no single point of failure
- Replication topologies (chain, star, tree) have different trade-offs
Keep exploring
Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.