Topic Overview
Clock Synchronization (NTP, Lamport)
Learn how distributed systems synchronize clocks and order events using NTP and Lamport clocks.
In distributed systems, nodes have independent clocks that drift apart. Clock synchronization ensures nodes agree on time or can order events correctly.
The Problem
Physical clocks drift: Even with atomic clocks, clocks drift at different rates.
No global time: In distributed systems, there's no single source of truth for time.
Event ordering: Need to determine if event A happened before event B across nodes.
Physical Clock Synchronization: NTP
Network Time Protocol (NTP) synchronizes physical clocks across the network.
How NTP Works
- Client requests time from NTP server
- Server responds with timestamp
- Client calculates offset and adjusts clock
- Stratum hierarchy: Servers sync with higher-stratum servers
class NTPClient {
async synchronize(): Promise<void> {
const server = this.selectNTPServer();
const t0 = this.getLocalTime(); // Client send time
const response = await server.getTime();
const t1 = this.getLocalTime(); // Client receive time
// Server timestamps: t2 (receive), t3 (send)
const { t2, t3 } = response;
// Calculate offset and delay
const offset = ((t2 - t0) + (t3 - t1)) / 2;
const delay = (t1 - t0) - (t3 - t2);
// Adjust clock
this.adjustClock(offset);
}
}
NTP Stratum
- Stratum 0: Atomic clocks, GPS
- Stratum 1: Servers directly connected to stratum 0
- Stratum 2: Servers sync with stratum 1
- And so on...
Logical Clocks: Lamport Clocks
Lamport clocks provide a logical ordering of events without synchronized physical clocks.
Lamport Timestamps
Each node maintains a counter that:
- Increments before each event
- Sends counter value with messages
- Updates to max(local, received) + 1 on receive
class LamportClock {
private counter: number = 0;
tick(): number {
this.counter++;
return this.counter;
}
send(message: Message): Message {
this.counter++;
return {
...message,
timestamp: this.counter
};
}
receive(message: Message): void {
this.counter = Math.max(this.counter, message.timestamp) + 1;
}
compare(timestamp1: number, timestamp2: number): 'before' | 'after' | 'concurrent' {
if (timestamp1 < timestamp2) return 'before';
if (timestamp1 > timestamp2) return 'after';
return 'concurrent';
}
}
Properties
- Happens-before relation: If A โ B (causally), then L(A) < L(B)
- Converse not true: L(A) < L(B) doesn't mean A โ B (may be concurrent)
- Total ordering: Can break ties using node ID
Vector Clocks
Vector clocks improve on Lamport clocks by detecting concurrent events.
Vector Timestamps
Each node maintains a vector of counters (one per node).
class VectorClock {
private vector: Map<string, number> = new Map();
private myNodeId: string;
tick(): Map<string, number> {
const current = this.vector.get(this.myNodeId) || 0;
this.vector.set(this.myNodeId, current + 1);
return new Map(this.vector);
}
send(message: Message): Message {
this.tick();
return {
...message,
vector: new Map(this.vector)
};
}
receive(message: Message): void {
const receivedVector = message.vector;
// Update: max(local[i], received[i]) for all i
for (const [nodeId, timestamp] of receivedVector) {
const local = this.vector.get(nodeId) || 0;
this.vector.set(nodeId, Math.max(local, timestamp));
}
// Increment own counter
this.tick();
}
compare(v1: Map<string, number>, v2: Map<string, number>): 'before' | 'after' | 'concurrent' | 'equal' {
let v1Less = false;
let v2Less = false;
const allNodes = new Set([...v1.keys(), ...v2.keys()]);
for (const nodeId of allNodes) {
const t1 = v1.get(nodeId) || 0;
const t2 = v2.get(nodeId) || 0;
if (t1 < t2) v1Less = true;
if (t2 < t1) v2Less = true;
}
if (v1Less && !v2Less) return 'before';
if (v2Less && !v1Less) return 'after';
if (!v1Less && !v2Less) return 'equal';
return 'concurrent';
}
}
Vector Clock Properties
- Causality detection: V(A) < V(B) iff A causally precedes B
- Concurrency detection: If neither V(A) < V(B) nor V(B) < V(A), events are concurrent
- More accurate than Lamport clocks but requires O(n) space per timestamp
Examples
Distributed Event Ordering
class EventOrdering {
private lamport: LamportClock = new LamportClock();
private events: Event[] = [];
async processEvent(event: Event): Promise<void> {
const timestamp = this.lamport.tick();
this.events.push({ ...event, timestamp });
// Send to other nodes
await this.broadcast({ ...event, timestamp });
}
async receiveEvent(event: Event): Promise<void> {
this.lamport.receive(event);
this.events.push(event);
// Sort by Lamport timestamp
this.events.sort((a, b) => a.timestamp - b.timestamp);
}
}
Causality Detection with Vector Clocks
class CausalOrdering {
private vector: VectorClock = new VectorClock();
private pending: Map<string, Message> = new Map();
async send(message: Message): Promise<void> {
const vector = this.vector.send(message);
await this.broadcast({ ...message, vector });
}
async receive(message: Message): Promise<void> {
// Check if we can deliver (all causally preceding messages received)
if (this.canDeliver(message)) {
this.deliver(message);
this.deliverPending();
} else {
// Store for later
this.pending.set(message.id, message);
}
}
canDeliver(message: Message): boolean {
const myVector = this.vector.getVector();
// Check if all events that causally precede this message have been received
for (const [nodeId, timestamp] of message.vector) {
const myTimestamp = myVector.get(nodeId) || 0;
if (timestamp > myTimestamp + 1) {
// Missing a message from this node
return false;
}
}
return true;
}
}
Common Pitfalls
- Assuming physical clocks are synchronized: They drift, use logical clocks for ordering
- Using Lamport clocks for concurrency detection: Can't detect concurrent events, use vector clocks
- Not handling clock skew: NTP can have errors, use logical clocks for critical ordering
- Vector clock size: Grows with number of nodes. Fix: Use bounded vectors or hierarchical clocks
- Not updating on receive: Must update clock when receiving messages
- Ignoring network delays: NTP calculations must account for network latency
- Clock rollover: Clocks can wrap around, handle overflow
Interview Questions
Beginner
Q: Why do we need clock synchronization in distributed systems?
A: Distributed systems need clock synchronization because:
- Event ordering: Determine if event A happened before event B across nodes
- Causality: Understand cause-and-effect relationships
- Consistency: Ensure operations happen in correct order
- Debugging: Timestamps help debug distributed systems
- Scheduling: Coordinate timed operations across nodes
Challenge: Physical clocks drift and aren't perfectly synchronized, so we use logical clocks (Lamport, Vector) for ordering.
Intermediate
Q: Compare Lamport clocks and Vector clocks. When would you use each?
A:
Lamport Clocks:
- Single counter per node
- O(1) space per timestamp
- Total ordering: Can order all events
- Limitation: Can't detect concurrent events (L(A) < L(B) doesn't mean A โ B)
Vector Clocks:
- Vector of counters (one per node)
- O(n) space per timestamp
- Causality detection: V(A) < V(B) iff A causally precedes B
- Concurrency detection: Can detect if events are concurrent
When to use:
Lamport clocks:
- Simple event ordering (don't need to detect concurrency)
- Large-scale systems (space efficient)
- When total ordering is sufficient
Vector clocks:
- Need to detect concurrent events
- Need precise causality tracking
- Conflict resolution (detect concurrent writes)
- Small to medium clusters (space overhead acceptable)
Example: Database replication - use vector clocks to detect concurrent writes that need conflict resolution.
Senior
Q: Design a distributed system that maintains causal ordering of messages. How do you handle clock synchronization, message delivery, and ensure no message is delivered before its causally preceding messages?
A:
Architecture:
- Vector clocks for causality tracking
- Message buffering for out-of-order messages
- Delivery guarantees to ensure causal ordering
Design:
class CausalMessageSystem {
private vector: VectorClock;
private pending: Map<string, PendingMessage> = new Map();
private delivered: Set<string> = new Set();
async send(message: Message): Promise<void> {
// Increment vector clock and attach to message
const vector = this.vector.send(message);
const causalMessage = {
...message,
id: this.generateId(),
vector: new Map(vector),
sender: this.myNodeId
};
// Broadcast to all nodes
await this.broadcast(causalMessage);
}
async receive(message: CausalMessage): Promise<void> {
if (this.delivered.has(message.id)) {
return; // Already delivered
}
// Check if we can deliver (causally ready)
if (this.isCausallyReady(message)) {
await this.deliver(message);
this.deliverPending(); // Try to deliver pending messages
} else {
// Buffer for later
this.pending.set(message.id, {
message,
receivedAt: Date.now()
});
}
}
isCausallyReady(message: CausalMessage): boolean {
const myVector = this.vector.getVector();
const msgVector = message.vector;
// For each node, check if we've received all messages that causally precede this one
for (const [nodeId, timestamp] of msgVector) {
if (nodeId === message.sender) {
// For sender, we must have received all messages up to timestamp-1
const myTimestamp = myVector.get(nodeId) || 0;
if (timestamp > myTimestamp + 1) {
return false; // Missing a message from sender
}
} else {
// For other nodes, we must have received at least up to their timestamp
const myTimestamp = myVector.get(nodeId) || 0;
if (timestamp > myTimestamp) {
return false; // Missing messages from this node
}
}
}
return true;
}
async deliver(message: CausalMessage): Promise<void> {
// Update vector clock
this.vector.receive(message);
// Mark as delivered
this.delivered.add(message.id);
// Remove from pending
this.pending.delete(message.id);
// Deliver to application
await this.application.onMessage(message);
}
async deliverPending(): Promise<void> {
// Try to deliver any pending messages that are now causally ready
const ready = Array.from(this.pending.values())
.filter(p => this.isCausallyReady(p.message));
for (const { message } of ready) {
await this.deliver(message);
}
}
// Handle node failures and message retransmission
async handleNodeFailure(nodeId: string): Promise<void> {
// Remove failed node from vector clock
this.vector.removeNode(nodeId);
// Clean up pending messages from failed node
for (const [id, pending] of this.pending) {
if (pending.message.sender === nodeId) {
this.pending.delete(id);
}
}
}
// Periodic cleanup of old delivered message IDs
cleanup(): void {
// Keep only recent delivered IDs (sliding window)
const maxAge = 3600000; // 1 hour
// Implementation depends on ID format
}
}
Optimizations:
- Bounded vectors: For large clusters, use hierarchical or compressed vectors
- Garbage collection: Periodically clean up old message IDs
- Piggybacking: Include vector clock in all messages (even acks)
- Batching: Batch multiple messages with single vector update
Handling Network Issues:
- Retransmission: Retry failed messages
- Timeout: Detect missing messages, request retransmission
- Partition handling: Buffer messages during partition, deliver when partition heals
Monitoring:
- Track pending message count
- Measure delivery latency
- Detect causality violations (should never happen)
- Monitor vector clock size
Key Takeaways
- Physical clocks drift: Use NTP for approximate synchronization, but don't rely on it for ordering
- Lamport clocks: Provide total ordering with O(1) space, but can't detect concurrency
- Vector clocks: Detect causality and concurrency with O(n) space per timestamp
- Causal ordering: Messages must be delivered after all causally preceding messages
- Message buffering: Store out-of-order messages until causally ready
- NTP hierarchy: Stratum-based hierarchy for scalable time synchronization
- Trade-offs: Lamport (simple, space-efficient) vs Vector (accurate, space-intensive)
- Use logical clocks for event ordering in distributed systems, not physical clocks