← Back to databases

Database Topic

Data Replication

Master database replication strategies for high availability and performance. Essential for distributed systems and database interviews.

Data replication is the process of copying and maintaining database objects across multiple database servers to improve availability, performance, and fault tolerance.


Why Replicate?

High Availability

If one server fails, others can continue serving requests.

Primary Server (fails)
Secondary Server (takes over)

Performance

Distribute read load across multiple servers.

Read Request → Load Balancer → Replica 1, 2, or 3
Write Request → Primary Server

Geographic Distribution

Place data closer to users.

US Users → US Replica
EU Users → EU Replica
Asia Users → Asia Replica

Disaster Recovery

Backup copies in different locations.


Replication Models

Master-Slave (Primary-Replica)

One primary server accepts writes, replicas handle reads.

Primary (Writes + Reads)
    ↓ (replication)
Replica 1 (Reads only)
Replica 2 (Reads only)
Replica 3 (Reads only)

Characteristics:

  • Simple to understand
  • Single point of failure (primary)
  • Asynchronous or synchronous replication
  • Replicas can lag behind primary

Use case: Read scaling, backup

Master-Master (Multi-Master)

Multiple servers can accept writes.

Master 1 (Writes + Reads) ←→ Master 2 (Writes + Reads)
    ↓                           ↓
Replica 1                    Replica 2

Characteristics:

  • No single point of failure
  • More complex conflict resolution
  • Can have write conflicts
  • Requires careful design

Use case: Multi-region deployments, high availability


Replication Methods

Statement-Based Replication (SBR)

Replicate SQL statements.

-- Primary executes:
UPDATE users SET balance = balance - 100 WHERE id = 1;

-- Replica executes same statement:
UPDATE users SET balance = balance - 100 WHERE id = 1;

Pros:

  • Compact (only SQL statements)
  • Easy to understand

Cons:

  • Non-deterministic functions (NOW(), RAND())
  • Stored procedures/triggers can cause issues

Row-Based Replication (RBR)

Replicate changed rows.

Primary: Row (id=1, balance=900) changed to (id=1, balance=800)
Replica: Apply same row change

Pros:

  • More reliable (exact data changes)
  • Handles non-deterministic functions

Cons:

  • Larger log size
  • More network bandwidth

Mixed Replication

Uses statement-based by default, switches to row-based when needed.


Synchronization Modes

Synchronous Replication

Primary waits for replica to confirm before committing.

1. Primary writes to log
2. Primary sends to replica
3. Replica writes to log
4. Replica confirms to primary
5. Primary commits transaction

Pros:

  • No data loss if primary fails
  • Strong consistency

Cons:

  • Slower writes (waits for replica)
  • Replica failure blocks primary

Use case: Financial systems, critical data

Asynchronous Replication

Primary doesn't wait for replica confirmation.

1. Primary writes to log
2. Primary commits transaction
3. Primary sends to replica (async)
4. Replica applies changes

Pros:

  • Fast writes
  • Primary not blocked by replica issues

Cons:

  • Possible data loss if primary fails
  • Replica lag (eventual consistency)

Use case: Most web applications, read scaling

Semi-Synchronous Replication

Primary waits for at least one replica (not all).

Primary waits for at least 1 of N replicas to confirm

Balance: Between sync and async


Replication Topologies

Single Replica

Primary → Replica

Simple, but single point of failure for replica.

Chain Replication

Primary → Replica 1 → Replica 2 → Replica 3

Reduces load on primary, but longer replication chain.

Star Topology

        Primary
       /   |   \
Replica1 Replica2 Replica3

All replicas get updates directly from primary.

Tree Topology

        Primary
       /       \
Replica1      Replica2
    |             |
Replica3      Replica4

Hierarchical structure, good for geographic distribution.


Consistency Models

Strong Consistency

All replicas have same data at all times.

Write to Primary → Immediately visible on all replicas

Challenge: Requires synchronous replication, slower

Eventual Consistency

Replicas will converge to same state, but may differ temporarily.

Write to Primary → Eventually visible on all replicas
Read from Replica → May see slightly stale data

Challenge: Application must handle stale reads

Read Your Writes

User always sees their own writes, even if reading from replica.

User writes → Primary
User reads → Primary (or replica with user's writes)

Implementation: Route user's reads to primary or wait for replica to catch up


Common Patterns

Read Replicas

Use replicas for read-heavy workloads.

# Write to primary
db.write(user_data)  # Goes to primary

# Read from replica
user = replica_db.read(user_id)  # Goes to replica

Failover

Automatically promote replica to primary if primary fails.

1. Monitor primary health
2. Detect failure
3. Promote replica to primary
4. Update DNS/load balancer
5. Application reconnects to new primary

Geographic Replication

Replicate to different regions.

US Primary → EU Replica → Asia Replica

Benefits:

  • Lower latency for local users
  • Disaster recovery
  • Compliance (data residency)

Database-Specific Examples

PostgreSQL Streaming Replication

-- On primary: Enable WAL archiving
wal_level = replica
max_wal_senders = 3

-- On replica: Configure replication
primary_conninfo = 'host=primary.example.com port=5432 user=replicator'

MySQL Replication

-- On primary: Enable binary logging
[mysqld]
log-bin=mysql-bin
server-id=1

-- On replica: Configure replication
CHANGE MASTER TO
  MASTER_HOST='primary.example.com',
  MASTER_USER='replicator',
  MASTER_PASSWORD='password',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=154;

MongoDB Replica Set

// Initialize replica set
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "primary:27017" },
    { _id: 1, host: "replica1:27017" },
    { _id: 2, host: "replica2:27017" }
  ]
});

Monitoring Replication

Key Metrics

  • Replication lag: How far behind replicas are
  • Throughput: Replication rate (bytes/second)
  • Errors: Replication failures
  • Health: Replica status (up/down)

Monitoring Queries

-- PostgreSQL: Check replication lag
SELECT 
    client_addr,
    state,
    pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) as lag_bytes
FROM pg_stat_replication;

-- MySQL: Check replication status
SHOW SLAVE STATUS\G

Best Practices

  1. Monitor lag: Set up alerts for replication lag
  2. Test failover: Regularly test failover procedures
  3. Backup replicas: Use replicas for backups to reduce primary load
  4. Geographic distribution: Place replicas close to users
  5. Consistent backups: Backup from replicas, not primary
  6. Handle stale reads: Design application to tolerate eventual consistency

Common Challenges

Replication Lag

Replicas fall behind primary, causing stale reads.

Solutions:

  • Monitor and alert on lag
  • Route critical reads to primary
  • Use read-your-writes consistency

Split-Brain

Multiple primaries think they're the primary.

Solutions:

  • Use quorum-based election
  • Proper failover procedures
  • Network partitioning handling

Write Conflicts

In multi-master setups, simultaneous writes to same data.

Solutions:

  • Conflict resolution strategies
  • Application-level conflict handling
  • Last-write-wins or merge strategies

Trade-offs

Synchronous Replication

  • Pros: No data loss, strong consistency
  • Cons: Slower writes, primary blocked by replica issues

Asynchronous Replication

  • Pros: Fast writes, primary not blocked
  • Cons: Possible data loss, eventual consistency

Multiple Replicas

  • Pros: High availability, read scaling
  • Cons: More complexity, more resources, consistency challenges

Interview Questions

1. Beginner Question

Q: What is database replication, and why is it important?

A: Database replication is the process of copying and maintaining database data across multiple servers.

Why important:

  • High availability: If one server fails, others can continue serving
  • Performance: Distribute read load across multiple servers
  • Geographic distribution: Place data closer to users
  • Disaster recovery: Backup copies in different locations

Example:

Primary Server (handles writes)
    ↓ (replication)
Replica 1 (handles reads)
Replica 2 (handles reads)
Replica 3 (handles reads)

Benefits:

  • Reads can be distributed across replicas (read scaling)
  • If primary fails, a replica can be promoted (high availability)
  • Replicas can be in different regions (lower latency)

2. Intermediate Question

Q: Explain the difference between synchronous and asynchronous replication. When would you use each?

A:

Synchronous replication:

  • Primary waits for replica to confirm before committing
  • Pros: No data loss if primary fails, strong consistency
  • Cons: Slower writes (waits for replica), primary blocked if replica fails
  • Use when: Financial systems, critical data where consistency > performance

Asynchronous replication:

  • Primary doesn't wait for replica confirmation
  • Pros: Fast writes, primary not blocked by replica issues
  • Cons: Possible data loss if primary fails, eventual consistency (replica lag)
  • Use when: Most web applications, read scaling, performance > strong consistency

Example:

-- Synchronous: Primary waits
Primary: Write → Send to Replica → Wait for confirmation → Commit
-- If replica is slow, write is slow

-- Asynchronous: Primary doesn't wait
Primary: WriteCommit → Send to Replica (async)
-- Write is fast, but replica may lag behind

Trade-off: Consistency vs. Performance. Choose based on requirements.

3. Senior-Level System Question

Q: Design a globally distributed database system for a social media platform with 1B users. The system needs to support: low latency worldwide, high availability, and handle 10M writes/second. How would you implement replication?

A:

Multi-region replication strategy:

  1. Regional primaries with cross-region replication:

    US Primary → US Replicas (3)
         ↓ (async replication)
    EU Primary → EU Replicas (3)
         ↓ (async replication)
    Asia Primary → Asia Replicas (3)
    
  2. Sharding by user_id:

    # Route users to their region
    def get_user_shard(user_id):
        region = get_user_region(user_id)  # Based on signup location
        return f"{region}_primary"
    
  3. Replication topology:

    • Within region: Synchronous replication (low latency, strong consistency)
    • Cross-region: Asynchronous replication (high latency, eventual consistency)
  4. Read strategy:

    def read_user_data(user_id):
        user_region = get_user_region(user_id)
        
        # Read from local replica (low latency)
        replica = get_replica(user_region)
        return replica.query_user(user_id)
    
  5. Write strategy:

    def write_user_data(user_id, data):
        user_region = get_user_region(user_id)
        primary = get_primary(user_region)
        
        # Write to local primary (synchronous within region)
        primary.write_user(user_id, data)
        
        # Replicate to other regions (asynchronous)
        async_replicate_to_other_regions(user_id, data)
    
  6. Conflict resolution:

    # Last-write-wins with vector clocks
    def resolve_conflict(user_data_us, user_data_eu):
        if user_data_us.timestamp > user_data_eu.timestamp:
            return user_data_us
        else:
            return user_data_eu
    
  7. Failover strategy:

    # Automatic failover within region
    def handle_primary_failure(region):
        # Promote replica to primary
        new_primary = elect_replica(region)
        
        # Update DNS/load balancer
        update_routing(region, new_primary)
        
        # Notify other regions
        notify_regions_of_failover(region, new_primary)
    
  8. Monitoring:

    • Replication lag (alert if > 5 seconds)
    • Primary health (heartbeat monitoring)
    • Cross-region latency
    • Conflict rates

Trade-offs:

  • Consistency: Eventual consistency across regions (acceptable for social media)
  • Latency: Low latency within region, higher across regions
  • Complexity: Multi-region replication adds operational complexity
  • Cost: Multiple replicas and cross-region bandwidth

Key Takeaways

  • Replication provides high availability—if one server fails, others can continue
  • Master-slave (primary-replica) is most common—primary handles writes, replicas handle reads
  • Synchronous replication ensures no data loss but slows writes
  • Asynchronous replication is faster but may lose data if primary fails
  • Read replicas distribute read load and improve performance
  • Geographic replication places data closer to users for lower latency
  • Replication lag is inevitable with async replication—design for eventual consistency
  • Failover requires automatic promotion of replica to primary
  • Monitor replication lag and set up alerts for high lag
  • Choose based on requirements—consistency vs. performance trade-offs
  • Multi-master replication is complex but provides no single point of failure
  • Replication topologies (chain, star, tree) have different trade-offs

Keep exploring

Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.