Topic Overview
Two-Phase Commit (2PC)
Learn the Two-Phase Commit protocol for achieving atomicity in distributed transactions.
Two-Phase Commit (2PC) is a distributed consensus protocol that ensures all participants in a transaction either commit or abort together, maintaining atomicity.
Overview
2PC ensures atomicity: either all nodes commit the transaction, or all abort. It's a blocking protocol with a coordinator and participants.
Protocol Phases
Phase 1: Prepare (Voting)
- Coordinator sends "prepare" message to all participants
- Each participant:
- Writes transaction to log (prepare record)
- Votes "yes" (ready to commit) or "no" (must abort)
- Sends vote to coordinator
- Coordinator collects votes
Phase 2: Commit/Abort (Decision)
-
If all vote "yes":
- Coordinator writes "commit" to log
- Sends "commit" to all participants
- Participants commit and send acknowledgment
-
If any votes "no":
- Coordinator writes "abort" to log
- Sends "abort" to all participants
- Participants abort and send acknowledgment
Implementation
class TwoPhaseCommitCoordinator {
async executeTransaction(participants: Participant[]): Promise<boolean> {
// Phase 1: Prepare
const votes = await Promise.all(
participants.map(p => this.preparePhase(p))
);
// Phase 2: Commit or Abort
if (votes.every(v => v === 'yes')) {
await this.commitPhase(participants);
return true;
} else {
await this.abortPhase(participants);
return false;
}
}
async preparePhase(participant: Participant): Promise<'yes' | 'no'> {
try {
const vote = await participant.prepare();
return vote;
} catch (error) {
return 'no'; // Failure means abort
}
}
async commitPhase(participants: Participant[]): Promise<void> {
await Promise.all(participants.map(p => p.commit()));
}
async abortPhase(participants: Participant[]): Promise<void> {
await Promise.all(participants.map(p => p.abort()));
}
}
class Participant {
private state: 'initial' | 'prepared' | 'committed' | 'aborted' = 'initial';
private log: TransactionLog;
async prepare(): Promise<'yes' | 'no'> {
try {
// Write prepare record to log (durable)
this.log.writePrepare();
// Perform transaction work (but don't commit yet)
const canCommit = await this.doWork();
if (canCommit) {
this.state = 'prepared';
return 'yes';
} else {
this.state = 'aborted';
return 'no';
}
} catch (error) {
this.state = 'aborted';
return 'no';
}
}
async commit(): Promise<void> {
if (this.state === 'prepared') {
this.log.writeCommit();
await this.finalizeCommit();
this.state = 'committed';
}
}
async abort(): Promise<void> {
if (this.state === 'prepared' || this.state === 'initial') {
this.log.writeAbort();
await this.rollback();
this.state = 'aborted';
}
}
}
Examples
Database Replication with 2PC
class ReplicatedDatabase {
async writeTransaction(data: any): Promise<void> {
const coordinator = this.selectCoordinator();
const replicas = this.getAllReplicas();
// Phase 1: Prepare on all replicas
const votes = await Promise.all(
replicas.map(replica => replica.prepareWrite(data))
);
if (votes.every(v => v)) {
// Phase 2: Commit on all replicas
await Promise.all(replicas.map(replica => replica.commitWrite(data)));
} else {
// Phase 2: Abort on all replicas
await Promise.all(replicas.map(replica => replica.abortWrite()));
throw new Error('Transaction aborted');
}
}
}
Common Pitfalls
- Coordinator failure: Participants block indefinitely. Fix: Use timeouts, elect new coordinator, or use 3PC
- Network partition: Cannot proceed if participants unreachable. Fix: Use majority-based commit or eventual consistency
- Not logging state: Cannot recover from failures. Fix: Write all state to durable log
- Blocking behavior: Participants wait for coordinator. Fix: Use timeouts, consider alternative patterns
- Single point of failure: Coordinator is critical. Fix: Use coordinator replication or alternative protocols
Interview Questions
Beginner
Q: What is Two-Phase Commit and how does it work?
A: Two-Phase Commit (2PC) is a protocol that ensures all participants in a distributed transaction either all commit or all abort.
How it works:
- Phase 1 (Prepare): Coordinator asks all participants if they can commit. Participants vote yes/no.
- Phase 2 (Commit/Abort):
- If all vote yes: Coordinator tells everyone to commit
- If any votes no: Coordinator tells everyone to abort
Goal: Atomicity - all nodes agree on the outcome.
Intermediate
Q: What are the limitations of 2PC? How would you address them?
A:
Limitations:
- Blocking: If coordinator fails, participants block waiting for decision
- Single point of failure: Coordinator is critical
- Not partition-tolerant: Requires all nodes to be reachable
- High latency: Multiple round trips (prepare + commit)
- Synchronous: All participants must respond
Solutions:
- Timeouts: Participants timeout and abort if coordinator doesn't respond
- 3PC: Three-Phase Commit reduces blocking (adds pre-commit phase)
- Saga pattern: Use compensating transactions for eventual consistency
- Paxos/Raft: Use consensus algorithms for better fault tolerance
- Majority commit: Commit if majority agrees (sacrifice some consistency)
When to use: Short transactions, strong consistency required, all nodes must agree.
Senior
Q: Design a fault-tolerant 2PC system. How do you handle coordinator failures, participant failures, and network partitions? How do you ensure no data loss?
A:
Fault-Tolerant 2PC Design:
class FaultTolerant2PC {
private coordinator: Coordinator | null = null;
private participants: Participant[] = [];
private log: DurableLog;
async executeWithFaultTolerance(transaction: Transaction): Promise<void> {
// Elect or select coordinator
this.coordinator = await this.electCoordinator();
// Phase 1: Prepare with timeout
const prepareResults = await Promise.allSettled(
this.participants.map(p =>
this.prepareWithTimeout(p, transaction, 5000)
)
);
// Check results
const votes = prepareResults.map(r =>
r.status === 'fulfilled' && r.value === 'yes'
);
const allYes = votes.every(v => v);
const majorityYes = votes.filter(v => v).length > this.participants.length / 2;
// Decision based on fault tolerance level
if (allYes) {
await this.commitAll(transaction);
} else if (majorityYes && this.allowMajorityCommit) {
// Majority commit (sacrifice some consistency)
await this.commitMajority(transaction, votes);
} else {
await this.abortAll(transaction);
}
}
async handleCoordinatorFailure(): Promise<void> {
// Detect coordinator failure
if (!await this.isCoordinatorAlive()) {
// Participants can timeout and abort, or
// Elect new coordinator to recover
const newCoordinator = await this.electNewCoordinator();
await newCoordinator.recoverTransaction(this.transactionId);
}
}
async recoverTransaction(transactionId: string): Promise<void> {
// Read transaction state from log
const state = await this.log.readTransactionState(transactionId);
if (state.phase === 'prepared') {
// Coordinator failed after prepare, need to decide
// Query participants for their state
const participantStates = await Promise.all(
this.participants.map(p => p.getTransactionState(transactionId))
);
const allPrepared = participantStates.every(s => s === 'prepared');
if (allPrepared) {
// All prepared, can safely commit
await this.commitAll(transactionId);
} else {
// Some not prepared, must abort
await this.abortAll(transactionId);
}
}
}
// Participant recovery
async participantRecovery(transactionId: string): Promise<void> {
const state = await this.log.readLocalState(transactionId);
if (state === 'prepared') {
// Was prepared but didn't receive commit/abort
// Query coordinator or other participants
const decision = await this.queryDecision(transactionId);
if (decision === 'commit') {
await this.commit();
} else {
await this.abort();
}
}
}
}
Handling Failures:
-
Coordinator failure:
- Participants timeout and can abort
- Or elect new coordinator to recover
- New coordinator reads log, queries participants, makes decision
-
Participant failure:
- Coordinator continues with remaining participants
- Failed participant recovers from log on restart
- Can query coordinator or other participants for decision
-
Network partition:
- Majority partition can proceed (if configured)
- Minority partition blocks or aborts
- Resolve conflicts when partition heals
Data Loss Prevention:
- Durable logging: Write all state to persistent log before responding
- Write-ahead log (WAL): Log before applying changes
- Replication: Replicate coordinator log for high availability
- Quorum: Require majority for commit decisions
Key Takeaways
- 2PC ensures atomicity: All participants commit or all abort
- Two phases: Prepare (voting) and Commit/Abort (decision)
- Blocking protocol: Participants wait for coordinator
- Coordinator is critical: Single point of failure
- Not partition-tolerant: Requires all nodes reachable
- Use for: Short transactions, strong consistency, all-or-nothing requirements
- Alternatives: 3PC (less blocking), Saga (eventual consistency), Paxos/Raft (better fault tolerance)