Topic Overview

Replication Lag

Learn about replication lag in distributed databases and how to handle it.

Replication lag is the delay between when data is written to the primary and when it's available on replicas.


What is Replication Lag?

Definition: Time difference between primary write and replica update.

Causes:

  • Network latency
  • Replica processing time
  • High write load
  • Network congestion

Impact

Stale reads: Reading from replica may return old data

Inconsistency: Different replicas may have different data

User experience: Users may not see their own writes immediately


Measuring Lag

class ReplicationMonitor {
  async measureLag(): Promise<number> {
    // Get primary's last committed timestamp
    const primaryTime = await this.primary.getLastCommittedTime();
    
    // Get replica's last applied timestamp
    const replicaTime = await this.replica.getLastAppliedTime();
    
    // Lag in milliseconds
    return primaryTime - replicaTime;
  }

  async checkLag(): Promise<void> {
    const lag = await this.measureLag();
    
    if (lag > this.threshold) {
      this.alert('High replication lag', lag);
    }
  }
}

Handling Lag

Read Your Writes

Route user's reads to primary after their writes.

class ReadYourWrites {
  private userWrites: Map<string, number> = new Map();

  async write(userId: string, data: any): Promise<void> {
    await this.primary.write(data);
    this.userWrites.set(userId, Date.now());
  }

  async read(userId: string, key: string): Promise<Data> {
    const lastWrite = this.userWrites.get(userId);
    const lag = await this.getReplicationLag();
    
    // If user wrote recently, read from primary
    if (lastWrite && (Date.now() - lastWrite) < lag) {
      return await this.primary.read(key);
    }
    
    // Otherwise, read from replica
    return await this.replica.read(key);
  }
}

Monotonic Reads

Ensure user always sees newer data (not older).

class MonotonicReads {
  private userReadTimestamps: Map<string, number> = new Map();

  async read(userId: string, key: string): Promise<Data> {
    const lastRead = this.userReadTimestamps.get(userId) || 0;
    const replicaTime = await this.replica.getLastAppliedTime();
    
    // Only read if replica is ahead of last read
    if (replicaTime > lastRead) {
      const data = await this.replica.read(key);
      this.userReadTimestamps.set(userId, replicaTime);
      return data;
    }
    
    // Otherwise, wait or read from primary
    await this.waitForReplica(replicaTime, lastRead);
    return await this.replica.read(key);
  }
}

Examples

Database Replication

class ReplicatedDatabase {
  async handleLag(): Promise<void> {
    const lag = await this.measureLag();
    
    if (lag > 1000) { // 1 second
      // Route critical reads to primary
      this.usePrimaryForCritical = true;
    } else {
      // Can use replicas
      this.usePrimaryForCritical = false;
    }
  }
}

Common Pitfalls

  • Ignoring lag: Users see stale data. Fix: Monitor and handle lag
  • Not routing critical reads: Important reads go to stale replica. Fix: Route to primary
  • No lag monitoring: Don't know when lag is high. Fix: Monitor continuously
  • Assuming zero lag: Replicas always have some lag. Fix: Design for lag

Interview Questions

Beginner

Q: What is replication lag and why does it matter?

A: Replication lag is the delay between when data is written to the primary database and when it's available on replicas.

Why it matters:

  • Stale reads: Reading from replica may return old data
  • Inconsistency: Users may not see their own writes
  • User experience: Confusing when data doesn't appear immediately

Example: User updates profile, but when they refresh, old data appears (read from replica that hasn't updated yet).


Intermediate

Q: How do you handle replication lag in a read replica setup?

A:

Strategies:

  1. Read your writes: Route user's reads to primary after their writes
  2. Monotonic reads: Ensure user always sees newer data
  3. Lag-aware routing: Route critical reads to primary if lag is high
  4. Wait for replication: Wait for replica to catch up before reading

Implementation:

  • Track user's last write time
  • If recent write, read from primary
  • Otherwise, read from replica
  • Monitor lag and adjust routing

Senior

Q: Design a system that handles replication lag for a social media platform. Users post content that must be visible to followers. How do you ensure consistency while maintaining performance?

A:

Design:

class SocialMediaReplication {
  async postContent(userId: string, content: Content): Promise<void> {
    // Write to primary
    await this.primary.write(content);
    
    // Track user's last write
    this.userWrites.set(userId, Date.now());
    
    // Async replication to replicas
    this.replicateAsync(content);
  }

  async getFeed(userId: string): Promise<Content[]> {
    const lastWrite = this.userWrites.get(userId);
    const lag = await this.getReplicationLag();
    
    // If user posted recently, read from primary to see their own posts
    if (lastWrite && (Date.now() - lastWrite) < lag * 2) {
      return await this.primary.getFeed(userId);
    }
    
    // Otherwise, read from replica (faster, may miss very recent posts)
    return await this.replica.getFeed(userId);
  }

  async getPost(postId: string, viewerId: string): Promise<Post> {
    // Check if viewer is the author
    const post = await this.replica.getPost(postId);
    const isAuthor = post.authorId === viewerId;
    
    // Authors always read from primary (read-your-writes)
    if (isAuthor) {
      return await this.primary.getPost(postId);
    }
    
    // Others can read from replica
    return await this.replica.getPost(postId);
  }

  // Handle high lag
  async handleHighLag(): Promise<void> {
    const lag = await this.measureLag();
    
    if (lag > 5000) { // 5 seconds
      // Route more reads to primary
      this.primaryReadRatio = 0.5; // 50% to primary
      
      // Alert operations
      this.alert('High replication lag', lag);
    }
  }
}

Optimizations:

  • Caching: Cache recent posts to reduce replica load
  • Fan-out: Write to follower timelines on post (not on read)
  • Eventual consistency: Accept that followers may see posts slightly later

Key Takeaways

  • Replication lag is inevitable: Network and processing delays cause lag
  • Monitor lag: Continuously measure and alert on high lag
  • Read-your-writes: Route user's reads to primary after their writes
  • Lag-aware routing: Adjust read routing based on lag
  • Trade-offs: Consistency vs performance (primary vs replica reads)
  • Design for lag: Assume replicas are always slightly behind

About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.