Topic Overview

Clock Synchronization (NTP, Lamport)

Learn how distributed systems synchronize clocks and order events using NTP and Lamport clocks.

In distributed systems, nodes have independent clocks that drift apart. Clock synchronization ensures nodes agree on time or can order events correctly.


The Problem

Physical clocks drift: Even with atomic clocks, clocks drift at different rates.

No global time: In distributed systems, there's no single source of truth for time.

Event ordering: Need to determine if event A happened before event B across nodes.


Physical Clock Synchronization: NTP

Network Time Protocol (NTP) synchronizes physical clocks across the network.

How NTP Works

  1. Client requests time from NTP server
  2. Server responds with timestamp
  3. Client calculates offset and adjusts clock
  4. Stratum hierarchy: Servers sync with higher-stratum servers
class NTPClient {
  async synchronize(): Promise<void> {
    const server = this.selectNTPServer();
    
    const t0 = this.getLocalTime(); // Client send time
    const response = await server.getTime();
    const t1 = this.getLocalTime(); // Client receive time
    
    // Server timestamps: t2 (receive), t3 (send)
    const { t2, t3 } = response;
    
    // Calculate offset and delay
    const offset = ((t2 - t0) + (t3 - t1)) / 2;
    const delay = (t1 - t0) - (t3 - t2);
    
    // Adjust clock
    this.adjustClock(offset);
  }
}

NTP Stratum

  • Stratum 0: Atomic clocks, GPS
  • Stratum 1: Servers directly connected to stratum 0
  • Stratum 2: Servers sync with stratum 1
  • And so on...

Logical Clocks: Lamport Clocks

Lamport clocks provide a logical ordering of events without synchronized physical clocks.

Lamport Timestamps

Each node maintains a counter that:

  • Increments before each event
  • Sends counter value with messages
  • Updates to max(local, received) + 1 on receive
class LamportClock {
  private counter: number = 0;

  tick(): number {
    this.counter++;
    return this.counter;
  }

  send(message: Message): Message {
    this.counter++;
    return {
      ...message,
      timestamp: this.counter
    };
  }

  receive(message: Message): void {
    this.counter = Math.max(this.counter, message.timestamp) + 1;
  }

  compare(timestamp1: number, timestamp2: number): 'before' | 'after' | 'concurrent' {
    if (timestamp1 < timestamp2) return 'before';
    if (timestamp1 > timestamp2) return 'after';
    return 'concurrent';
  }
}

Properties

  • Happens-before relation: If A โ†’ B (causally), then L(A) < L(B)
  • Converse not true: L(A) < L(B) doesn't mean A โ†’ B (may be concurrent)
  • Total ordering: Can break ties using node ID

Vector Clocks

Vector clocks improve on Lamport clocks by detecting concurrent events.

Vector Timestamps

Each node maintains a vector of counters (one per node).

class VectorClock {
  private vector: Map<string, number> = new Map();
  private myNodeId: string;

  tick(): Map<string, number> {
    const current = this.vector.get(this.myNodeId) || 0;
    this.vector.set(this.myNodeId, current + 1);
    return new Map(this.vector);
  }

  send(message: Message): Message {
    this.tick();
    return {
      ...message,
      vector: new Map(this.vector)
    };
  }

  receive(message: Message): void {
    const receivedVector = message.vector;
    
    // Update: max(local[i], received[i]) for all i
    for (const [nodeId, timestamp] of receivedVector) {
      const local = this.vector.get(nodeId) || 0;
      this.vector.set(nodeId, Math.max(local, timestamp));
    }
    
    // Increment own counter
    this.tick();
  }

  compare(v1: Map<string, number>, v2: Map<string, number>): 'before' | 'after' | 'concurrent' | 'equal' {
    let v1Less = false;
    let v2Less = false;

    const allNodes = new Set([...v1.keys(), ...v2.keys()]);

    for (const nodeId of allNodes) {
      const t1 = v1.get(nodeId) || 0;
      const t2 = v2.get(nodeId) || 0;

      if (t1 < t2) v1Less = true;
      if (t2 < t1) v2Less = true;
    }

    if (v1Less && !v2Less) return 'before';
    if (v2Less && !v1Less) return 'after';
    if (!v1Less && !v2Less) return 'equal';
    return 'concurrent';
  }
}

Vector Clock Properties

  • Causality detection: V(A) < V(B) iff A causally precedes B
  • Concurrency detection: If neither V(A) < V(B) nor V(B) < V(A), events are concurrent
  • More accurate than Lamport clocks but requires O(n) space per timestamp

Examples

Distributed Event Ordering

class EventOrdering {
  private lamport: LamportClock = new LamportClock();
  private events: Event[] = [];

  async processEvent(event: Event): Promise<void> {
    const timestamp = this.lamport.tick();
    this.events.push({ ...event, timestamp });
    
    // Send to other nodes
    await this.broadcast({ ...event, timestamp });
  }

  async receiveEvent(event: Event): Promise<void> {
    this.lamport.receive(event);
    this.events.push(event);
    
    // Sort by Lamport timestamp
    this.events.sort((a, b) => a.timestamp - b.timestamp);
  }
}

Causality Detection with Vector Clocks

class CausalOrdering {
  private vector: VectorClock = new VectorClock();
  private pending: Map<string, Message> = new Map();

  async send(message: Message): Promise<void> {
    const vector = this.vector.send(message);
    await this.broadcast({ ...message, vector });
  }

  async receive(message: Message): Promise<void> {
    // Check if we can deliver (all causally preceding messages received)
    if (this.canDeliver(message)) {
      this.deliver(message);
      this.deliverPending();
    } else {
      // Store for later
      this.pending.set(message.id, message);
    }
  }

  canDeliver(message: Message): boolean {
    const myVector = this.vector.getVector();
    
    // Check if all events that causally precede this message have been received
    for (const [nodeId, timestamp] of message.vector) {
      const myTimestamp = myVector.get(nodeId) || 0;
      if (timestamp > myTimestamp + 1) {
        // Missing a message from this node
        return false;
      }
    }
    
    return true;
  }
}

Common Pitfalls

  • Assuming physical clocks are synchronized: They drift, use logical clocks for ordering
  • Using Lamport clocks for concurrency detection: Can't detect concurrent events, use vector clocks
  • Not handling clock skew: NTP can have errors, use logical clocks for critical ordering
  • Vector clock size: Grows with number of nodes. Fix: Use bounded vectors or hierarchical clocks
  • Not updating on receive: Must update clock when receiving messages
  • Ignoring network delays: NTP calculations must account for network latency
  • Clock rollover: Clocks can wrap around, handle overflow

Interview Questions

Beginner

Q: Why do we need clock synchronization in distributed systems?

A: Distributed systems need clock synchronization because:

  • Event ordering: Determine if event A happened before event B across nodes
  • Causality: Understand cause-and-effect relationships
  • Consistency: Ensure operations happen in correct order
  • Debugging: Timestamps help debug distributed systems
  • Scheduling: Coordinate timed operations across nodes

Challenge: Physical clocks drift and aren't perfectly synchronized, so we use logical clocks (Lamport, Vector) for ordering.


Intermediate

Q: Compare Lamport clocks and Vector clocks. When would you use each?

A:

Lamport Clocks:

  • Single counter per node
  • O(1) space per timestamp
  • Total ordering: Can order all events
  • Limitation: Can't detect concurrent events (L(A) < L(B) doesn't mean A โ†’ B)

Vector Clocks:

  • Vector of counters (one per node)
  • O(n) space per timestamp
  • Causality detection: V(A) < V(B) iff A causally precedes B
  • Concurrency detection: Can detect if events are concurrent

When to use:

Lamport clocks:

  • Simple event ordering (don't need to detect concurrency)
  • Large-scale systems (space efficient)
  • When total ordering is sufficient

Vector clocks:

  • Need to detect concurrent events
  • Need precise causality tracking
  • Conflict resolution (detect concurrent writes)
  • Small to medium clusters (space overhead acceptable)

Example: Database replication - use vector clocks to detect concurrent writes that need conflict resolution.


Senior

Q: Design a distributed system that maintains causal ordering of messages. How do you handle clock synchronization, message delivery, and ensure no message is delivered before its causally preceding messages?

A:

Architecture:

  • Vector clocks for causality tracking
  • Message buffering for out-of-order messages
  • Delivery guarantees to ensure causal ordering

Design:

class CausalMessageSystem {
  private vector: VectorClock;
  private pending: Map<string, PendingMessage> = new Map();
  private delivered: Set<string> = new Set();

  async send(message: Message): Promise<void> {
    // Increment vector clock and attach to message
    const vector = this.vector.send(message);
    
    const causalMessage = {
      ...message,
      id: this.generateId(),
      vector: new Map(vector),
      sender: this.myNodeId
    };

    // Broadcast to all nodes
    await this.broadcast(causalMessage);
  }

  async receive(message: CausalMessage): Promise<void> {
    if (this.delivered.has(message.id)) {
      return; // Already delivered
    }

    // Check if we can deliver (causally ready)
    if (this.isCausallyReady(message)) {
      await this.deliver(message);
      this.deliverPending(); // Try to deliver pending messages
    } else {
      // Buffer for later
      this.pending.set(message.id, {
        message,
        receivedAt: Date.now()
      });
    }
  }

  isCausallyReady(message: CausalMessage): boolean {
    const myVector = this.vector.getVector();
    const msgVector = message.vector;

    // For each node, check if we've received all messages that causally precede this one
    for (const [nodeId, timestamp] of msgVector) {
      if (nodeId === message.sender) {
        // For sender, we must have received all messages up to timestamp-1
        const myTimestamp = myVector.get(nodeId) || 0;
        if (timestamp > myTimestamp + 1) {
          return false; // Missing a message from sender
        }
      } else {
        // For other nodes, we must have received at least up to their timestamp
        const myTimestamp = myVector.get(nodeId) || 0;
        if (timestamp > myTimestamp) {
          return false; // Missing messages from this node
        }
      }
    }

    return true;
  }

  async deliver(message: CausalMessage): Promise<void> {
    // Update vector clock
    this.vector.receive(message);
    
    // Mark as delivered
    this.delivered.add(message.id);
    
    // Remove from pending
    this.pending.delete(message.id);
    
    // Deliver to application
    await this.application.onMessage(message);
  }

  async deliverPending(): Promise<void> {
    // Try to deliver any pending messages that are now causally ready
    const ready = Array.from(this.pending.values())
      .filter(p => this.isCausallyReady(p.message));

    for (const { message } of ready) {
      await this.deliver(message);
    }
  }

  // Handle node failures and message retransmission
  async handleNodeFailure(nodeId: string): Promise<void> {
    // Remove failed node from vector clock
    this.vector.removeNode(nodeId);
    
    // Clean up pending messages from failed node
    for (const [id, pending] of this.pending) {
      if (pending.message.sender === nodeId) {
        this.pending.delete(id);
      }
    }
  }

  // Periodic cleanup of old delivered message IDs
  cleanup(): void {
    // Keep only recent delivered IDs (sliding window)
    const maxAge = 3600000; // 1 hour
    // Implementation depends on ID format
  }
}

Optimizations:

  • Bounded vectors: For large clusters, use hierarchical or compressed vectors
  • Garbage collection: Periodically clean up old message IDs
  • Piggybacking: Include vector clock in all messages (even acks)
  • Batching: Batch multiple messages with single vector update

Handling Network Issues:

  • Retransmission: Retry failed messages
  • Timeout: Detect missing messages, request retransmission
  • Partition handling: Buffer messages during partition, deliver when partition heals

Monitoring:

  • Track pending message count
  • Measure delivery latency
  • Detect causality violations (should never happen)
  • Monitor vector clock size

Key Takeaways

  • Physical clocks drift: Use NTP for approximate synchronization, but don't rely on it for ordering
  • Lamport clocks: Provide total ordering with O(1) space, but can't detect concurrency
  • Vector clocks: Detect causality and concurrency with O(n) space per timestamp
  • Causal ordering: Messages must be delivered after all causally preceding messages
  • Message buffering: Store out-of-order messages until causally ready
  • NTP hierarchy: Stratum-based hierarchy for scalable time synchronization
  • Trade-offs: Lamport (simple, space-efficient) vs Vector (accurate, space-intensive)
  • Use logical clocks for event ordering in distributed systems, not physical clocks

About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.