Topic Overview
Clock Synchronization (NTP, Lamport)
Learn how distributed systems synchronize clocks and order events using NTP and Lamport clocks.
In distributed systems, nodes have independent clocks that drift apart. Clock synchronization ensures nodes agree on time or can order events correctly.
The Problem
Physical clocks drift: Even with atomic clocks, clocks drift at different rates.
No global time: In distributed systems, there's no single source of truth for time.
Event ordering: Need to determine if event A happened before event B across nodes.
Physical Clock Synchronization: NTP
Network Time Protocol (NTP) synchronizes physical clocks across the network.
How NTP Works
- Client requests time from NTP server
- Server responds with timestamp
- Client calculates offset and adjusts clock
- Stratum hierarchy: Servers sync with higher-stratum servers
1class NTPClient {2 async synchronize(): Promise<void> {3 const server = this.selectNTPServer();45 const t0 = this.getLocalTime(); // Client send time6 const response = await server.getTime();7 const t1 = this.getLocalTime(); // Client receive time89 // Server timestamps: t2 (receive), t3 (send)10 const { t2, t3 } = response;
NTP Stratum
- Stratum 0: Atomic clocks, GPS
- Stratum 1: Servers directly connected to stratum 0
- Stratum 2: Servers sync with stratum 1
- And so on...
Logical Clocks: Lamport Clocks
Lamport clocks provide a logical ordering of events without synchronized physical clocks.
Lamport Timestamps
Each node maintains a counter that:
- Increments before each event
- Sends counter value with messages
- Updates to max(local, received) + 1 on receive
1class LamportClock {2 private counter: number = 0;34 tick(): number {5 this.counter++;6 return this.counter;7 }89 send(message: Message): Message {10 this.counter++;11 return {12 ...message,13 timestamp: this.counter14 };15 }1617 receive(message: Message): void {
Properties
- Happens-before relation: If A → B (causally), then L(A) < L(B)
- Converse not true: L(A) < L(B) doesn't mean A → B (may be concurrent)
- Total ordering: Can break ties using node ID
Vector Clocks
Vector clocks improve on Lamport clocks by detecting concurrent events.
Vector Timestamps
Each node maintains a vector of counters (one per node).
1class VectorClock {2 private vector: Map<string, number> = new Map();3 private myNodeId: string;45 tick(): Map<string, number> {6 const current = this.vector.get(this.myNodeId) || 0;7 this.vector.set(this.myNodeId, current + 1);8 return new Map(thisvector
Vector Clock Properties
- Causality detection: V(A) < V(B) iff A causally precedes B
- Concurrency detection: If neither V(A) < V(B) nor V(B) < V(A), events are concurrent
- More accurate than Lamport clocks but requires O(n) space per timestamp
Examples
Distributed Event Ordering
1class EventOrdering {2 private lamport: LamportClock = new LamportClock();3 private events: Event[] = [];45 async processEvent(event: Event): Promise<void> {6 const timestamp = this.lamport.tick();7 this.events.push({ ...event, timestamp });89 // Send to other nodes10 await this.broadcast({ event timestamp
Causality Detection with Vector Clocks
1class CausalOrdering {2 private vector: VectorClock = new VectorClock();3 private pending: Map<string, Message> = new Map();45 async send(message: Message): Promise<void> {6 const vector = this.vector.send(message);7 await this.broadcast({ ...message, vector });8 }910 async receivemessage Message
Common Pitfalls
- Assuming physical clocks are synchronized: They drift, use logical clocks for ordering
- Using Lamport clocks for concurrency detection: Can't detect concurrent events, use vector clocks
- Not handling clock skew: NTP can have errors, use logical clocks for critical ordering
- Vector clock size: Grows with number of nodes. Fix: Use bounded vectors or hierarchical clocks
- Not updating on receive: Must update clock when receiving messages
- Ignoring network delays: NTP calculations must account for network latency
- Clock rollover: Clocks can wrap around, handle overflow
Interview Questions
Beginner
Q: Why do we need clock synchronization in distributed systems?
A: Distributed systems need clock synchronization because:
- Event ordering: Determine if event A happened before event B across nodes
- Causality: Understand cause-and-effect relationships
- Consistency: Ensure operations happen in correct order
- Debugging: Timestamps help debug distributed systems
- Scheduling: Coordinate timed operations across nodes
Challenge: Physical clocks drift and aren't perfectly synchronized, so we use logical clocks (Lamport, Vector) for ordering.
Intermediate
Q: Compare Lamport clocks and Vector clocks. When would you use each?
A:
Lamport Clocks:
- Single counter per node
- O(1) space per timestamp
- Total ordering: Can order all events
- Limitation: Can't detect concurrent events (L(A) < L(B) doesn't mean A → B)
Vector Clocks:
- Vector of counters (one per node)
- O(n) space per timestamp
- Causality detection: V(A) < V(B) iff A causally precedes B
- Concurrency detection: Can detect if events are concurrent
When to use:
Lamport clocks:
- Simple event ordering (don't need to detect concurrency)
- Large-scale systems (space efficient)
- When total ordering is sufficient
Vector clocks:
- Need to detect concurrent events
- Need precise causality tracking
- Conflict resolution (detect concurrent writes)
- Small to medium clusters (space overhead acceptable)
Example: Database replication - use vector clocks to detect concurrent writes that need conflict resolution.
Senior
Q: Design a distributed system that maintains causal ordering of messages. How do you handle clock synchronization, message delivery, and ensure no message is delivered before its causally preceding messages?
A:
Architecture:
- Vector clocks for causality tracking
- Message buffering for out-of-order messages
- Delivery guarantees to ensure causal ordering
Design:
1class CausalMessageSystem {2 private vector: VectorClock;3 private pending: Map<string, PendingMessage> = new Map();4 private delivered: Set<string> = new Set();56 async send(message: Message): Promise<void> {7 // Increment vector clock and attach to message8 const vector = this.vector.send(message);910 const causalMessage = {11 ...message,
Optimizations:
- Bounded vectors: For large clusters, use hierarchical or compressed vectors
- Garbage collection: Periodically clean up old message IDs
- Piggybacking: Include vector clock in all messages (even acks)
- Batching: Batch multiple messages with single vector update
Handling Network Issues:
- Retransmission: Retry failed messages
- Timeout: Detect missing messages, request retransmission
- Partition handling: Buffer messages during partition, deliver when partition heals
Monitoring:
- Track pending message count
- Measure delivery latency
- Detect causality violations (should never happen)
- Monitor vector clock size
-
Physical clocks drift: Use NTP for approximate synchronization, but don't rely on it for ordering
-
Lamport clocks: Provide total ordering with O(1) space, but can't detect concurrency
-
Vector clocks: Detect causality and concurrency with O(n) space per timestamp
-
Causal ordering: Messages must be delivered after all causally preceding messages
-
Message buffering: Store out-of-order messages until causally ready
-
NTP hierarchy: Stratum-based hierarchy for scalable time synchronization
-
Trade-offs: Lamport (simple, space-efficient) vs Vector (accurate, space-intensive)
-
Use logical clocks for event ordering in distributed systems, not physical clocks
-
Leader Election - Ordering events during leader elections
-
Gossip Protocol - Information dissemination with causal ordering
-
Distributed Transactions - Ordering transaction events
-
Partition Tolerance - Handling clock synchronization during partitions
-
Distributed Logging - Correlating logs with timestamps
Key Takeaways
Physical clocks drift: Use NTP for approximate synchronization, but don't rely on it for ordering
Lamport clocks: Provide total ordering with O(1) space, but can't detect concurrency
Vector clocks: Detect causality and concurrency with O(n) space per timestamp
Causal ordering: Messages must be delivered after all causally preceding messages
Message buffering: Store out-of-order messages until causally ready
NTP hierarchy: Stratum-based hierarchy for scalable time synchronization
Trade-offs: Lamport (simple, space-efficient) vs Vector (accurate, space-intensive)
Use logical clocks for event ordering in distributed systems, not physical clocks
What's next?