Topic Overview

Clock Synchronization (NTP, Lamport)

Learn how distributed systems synchronize clocks and order events using NTP and Lamport clocks.

Intermediate11 min read

In distributed systems, nodes have independent clocks that drift apart. Clock synchronization ensures nodes agree on time or can order events correctly.


The Problem

Physical clocks drift: Even with atomic clocks, clocks drift at different rates.

No global time: In distributed systems, there's no single source of truth for time.

Event ordering: Need to determine if event A happened before event B across nodes.


Physical Clock Synchronization: NTP

Network Time Protocol (NTP) synchronizes physical clocks across the network.

How NTP Works

  1. Client requests time from NTP server
  2. Server responds with timestamp
  3. Client calculates offset and adjusts clock
  4. Stratum hierarchy: Servers sync with higher-stratum servers
1class NTPClient {
2 async synchronize(): Promise<void> {
3 const server = this.selectNTPServer();
4
5 const t0 = this.getLocalTime(); // Client send time
6 const response = await server.getTime();
7 const t1 = this.getLocalTime(); // Client receive time
8
9 // Server timestamps: t2 (receive), t3 (send)
10 const { t2, t3 } = response;

NTP Stratum

  • Stratum 0: Atomic clocks, GPS
  • Stratum 1: Servers directly connected to stratum 0
  • Stratum 2: Servers sync with stratum 1
  • And so on...

Logical Clocks: Lamport Clocks

Lamport clocks provide a logical ordering of events without synchronized physical clocks.

Lamport Timestamps

Each node maintains a counter that:

  • Increments before each event
  • Sends counter value with messages
  • Updates to max(local, received) + 1 on receive
1class LamportClock {
2 private counter: number = 0;
3
4 tick(): number {
5 this.counter++;
6 return this.counter;
7 }
8
9 send(message: Message): Message {
10 this.counter++;
11 return {
12 ...message,
13 timestamp: this.counter
14 };
15 }
16
17 receive(message: Message): void {

Properties

  • Happens-before relation: If A → B (causally), then L(A) < L(B)
  • Converse not true: L(A) < L(B) doesn't mean A → B (may be concurrent)
  • Total ordering: Can break ties using node ID

Vector Clocks

Vector clocks improve on Lamport clocks by detecting concurrent events.

Vector Timestamps

Each node maintains a vector of counters (one per node).

1class VectorClock {
2 private vector: Map<string, number> = new Map();
3 private myNodeId: string;
4
5 tick(): Map<string, number> {
6 const current = this.vector.get(this.myNodeId) || 0;
7 this.vector.set(this.myNodeId, current + 1);
8 return new Map(thisvector

Vector Clock Properties

  • Causality detection: V(A) < V(B) iff A causally precedes B
  • Concurrency detection: If neither V(A) < V(B) nor V(B) < V(A), events are concurrent
  • More accurate than Lamport clocks but requires O(n) space per timestamp

Examples

Distributed Event Ordering

1class EventOrdering {
2 private lamport: LamportClock = new LamportClock();
3 private events: Event[] = [];
4
5 async processEvent(event: Event): Promise<void> {
6 const timestamp = this.lamport.tick();
7 this.events.push({ ...event, timestamp });
8
9 // Send to other nodes
10 await this.broadcast({ event timestamp

Causality Detection with Vector Clocks

1class CausalOrdering {
2 private vector: VectorClock = new VectorClock();
3 private pending: Map<string, Message> = new Map();
4
5 async send(message: Message): Promise<void> {
6 const vector = this.vector.send(message);
7 await this.broadcast({ ...message, vector });
8 }
9
10 async receivemessage Message

Common Pitfalls

  • Assuming physical clocks are synchronized: They drift, use logical clocks for ordering
  • Using Lamport clocks for concurrency detection: Can't detect concurrent events, use vector clocks
  • Not handling clock skew: NTP can have errors, use logical clocks for critical ordering
  • Vector clock size: Grows with number of nodes. Fix: Use bounded vectors or hierarchical clocks
  • Not updating on receive: Must update clock when receiving messages
  • Ignoring network delays: NTP calculations must account for network latency
  • Clock rollover: Clocks can wrap around, handle overflow

Interview Questions

Beginner

Q: Why do we need clock synchronization in distributed systems?

A: Distributed systems need clock synchronization because:

  • Event ordering: Determine if event A happened before event B across nodes
  • Causality: Understand cause-and-effect relationships
  • Consistency: Ensure operations happen in correct order
  • Debugging: Timestamps help debug distributed systems
  • Scheduling: Coordinate timed operations across nodes

Challenge: Physical clocks drift and aren't perfectly synchronized, so we use logical clocks (Lamport, Vector) for ordering.


Intermediate

Q: Compare Lamport clocks and Vector clocks. When would you use each?

A:

Lamport Clocks:

  • Single counter per node
  • O(1) space per timestamp
  • Total ordering: Can order all events
  • Limitation: Can't detect concurrent events (L(A) < L(B) doesn't mean A → B)

Vector Clocks:

  • Vector of counters (one per node)
  • O(n) space per timestamp
  • Causality detection: V(A) < V(B) iff A causally precedes B
  • Concurrency detection: Can detect if events are concurrent

When to use:

Lamport clocks:

  • Simple event ordering (don't need to detect concurrency)
  • Large-scale systems (space efficient)
  • When total ordering is sufficient

Vector clocks:

  • Need to detect concurrent events
  • Need precise causality tracking
  • Conflict resolution (detect concurrent writes)
  • Small to medium clusters (space overhead acceptable)

Example: Database replication - use vector clocks to detect concurrent writes that need conflict resolution.


Senior

Q: Design a distributed system that maintains causal ordering of messages. How do you handle clock synchronization, message delivery, and ensure no message is delivered before its causally preceding messages?

A:

Architecture:

  • Vector clocks for causality tracking
  • Message buffering for out-of-order messages
  • Delivery guarantees to ensure causal ordering

Design:

1class CausalMessageSystem {
2 private vector: VectorClock;
3 private pending: Map<string, PendingMessage> = new Map();
4 private delivered: Set<string> = new Set();
5
6 async send(message: Message): Promise<void> {
7 // Increment vector clock and attach to message
8 const vector = this.vector.send(message);
9
10 const causalMessage = {
11 ...message,

Optimizations:

  • Bounded vectors: For large clusters, use hierarchical or compressed vectors
  • Garbage collection: Periodically clean up old message IDs
  • Piggybacking: Include vector clock in all messages (even acks)
  • Batching: Batch multiple messages with single vector update

Handling Network Issues:

  • Retransmission: Retry failed messages
  • Timeout: Detect missing messages, request retransmission
  • Partition handling: Buffer messages during partition, deliver when partition heals

Monitoring:

  • Track pending message count
  • Measure delivery latency
  • Detect causality violations (should never happen)
  • Monitor vector clock size

  • Physical clocks drift: Use NTP for approximate synchronization, but don't rely on it for ordering

  • Lamport clocks: Provide total ordering with O(1) space, but can't detect concurrency

  • Vector clocks: Detect causality and concurrency with O(n) space per timestamp

  • Causal ordering: Messages must be delivered after all causally preceding messages

  • Message buffering: Store out-of-order messages until causally ready

  • NTP hierarchy: Stratum-based hierarchy for scalable time synchronization

  • Trade-offs: Lamport (simple, space-efficient) vs Vector (accurate, space-intensive)

  • Use logical clocks for event ordering in distributed systems, not physical clocks

  • Leader Election - Ordering events during leader elections

  • Gossip Protocol - Information dissemination with causal ordering

  • Distributed Transactions - Ordering transaction events

  • Partition Tolerance - Handling clock synchronization during partitions

  • Distributed Logging - Correlating logs with timestamps

Key Takeaways

Physical clocks drift: Use NTP for approximate synchronization, but don't rely on it for ordering

Lamport clocks: Provide total ordering with O(1) space, but can't detect concurrency

Vector clocks: Detect causality and concurrency with O(n) space per timestamp

Causal ordering: Messages must be delivered after all causally preceding messages

Message buffering: Store out-of-order messages until causally ready

NTP hierarchy: Stratum-based hierarchy for scalable time synchronization

Trade-offs: Lamport (simple, space-efficient) vs Vector (accurate, space-intensive)

Use logical clocks for event ordering in distributed systems, not physical clocks


About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.