Topic Overview

Two-Phase Commit (2PC)

Learn the Two-Phase Commit protocol for achieving atomicity in distributed transactions.

Two-Phase Commit (2PC) is a distributed consensus protocol that ensures all participants in a transaction either commit or abort together, maintaining atomicity.


Overview

2PC ensures atomicity: either all nodes commit the transaction, or all abort. It's a blocking protocol with a coordinator and participants.


Protocol Phases

Phase 1: Prepare (Voting)

  1. Coordinator sends "prepare" message to all participants
  2. Each participant:
    • Writes transaction to log (prepare record)
    • Votes "yes" (ready to commit) or "no" (must abort)
    • Sends vote to coordinator
  3. Coordinator collects votes

Phase 2: Commit/Abort (Decision)

  1. If all vote "yes":

    • Coordinator writes "commit" to log
    • Sends "commit" to all participants
    • Participants commit and send acknowledgment
  2. If any votes "no":

    • Coordinator writes "abort" to log
    • Sends "abort" to all participants
    • Participants abort and send acknowledgment

Implementation

class TwoPhaseCommitCoordinator {
  async executeTransaction(participants: Participant[]): Promise<boolean> {
    // Phase 1: Prepare
    const votes = await Promise.all(
      participants.map(p => this.preparePhase(p))
    );

    // Phase 2: Commit or Abort
    if (votes.every(v => v === 'yes')) {
      await this.commitPhase(participants);
      return true;
    } else {
      await this.abortPhase(participants);
      return false;
    }
  }

  async preparePhase(participant: Participant): Promise<'yes' | 'no'> {
    try {
      const vote = await participant.prepare();
      return vote;
    } catch (error) {
      return 'no'; // Failure means abort
    }
  }

  async commitPhase(participants: Participant[]): Promise<void> {
    await Promise.all(participants.map(p => p.commit()));
  }

  async abortPhase(participants: Participant[]): Promise<void> {
    await Promise.all(participants.map(p => p.abort()));
  }
}

class Participant {
  private state: 'initial' | 'prepared' | 'committed' | 'aborted' = 'initial';
  private log: TransactionLog;

  async prepare(): Promise<'yes' | 'no'> {
    try {
      // Write prepare record to log (durable)
      this.log.writePrepare();
      
      // Perform transaction work (but don't commit yet)
      const canCommit = await this.doWork();
      
      if (canCommit) {
        this.state = 'prepared';
        return 'yes';
      } else {
        this.state = 'aborted';
        return 'no';
      }
    } catch (error) {
      this.state = 'aborted';
      return 'no';
    }
  }

  async commit(): Promise<void> {
    if (this.state === 'prepared') {
      this.log.writeCommit();
      await this.finalizeCommit();
      this.state = 'committed';
    }
  }

  async abort(): Promise<void> {
    if (this.state === 'prepared' || this.state === 'initial') {
      this.log.writeAbort();
      await this.rollback();
      this.state = 'aborted';
    }
  }
}

Examples

Database Replication with 2PC

class ReplicatedDatabase {
  async writeTransaction(data: any): Promise<void> {
    const coordinator = this.selectCoordinator();
    const replicas = this.getAllReplicas();

    // Phase 1: Prepare on all replicas
    const votes = await Promise.all(
      replicas.map(replica => replica.prepareWrite(data))
    );

    if (votes.every(v => v)) {
      // Phase 2: Commit on all replicas
      await Promise.all(replicas.map(replica => replica.commitWrite(data)));
    } else {
      // Phase 2: Abort on all replicas
      await Promise.all(replicas.map(replica => replica.abortWrite()));
      throw new Error('Transaction aborted');
    }
  }
}

Common Pitfalls

  • Coordinator failure: Participants block indefinitely. Fix: Use timeouts, elect new coordinator, or use 3PC
  • Network partition: Cannot proceed if participants unreachable. Fix: Use majority-based commit or eventual consistency
  • Not logging state: Cannot recover from failures. Fix: Write all state to durable log
  • Blocking behavior: Participants wait for coordinator. Fix: Use timeouts, consider alternative patterns
  • Single point of failure: Coordinator is critical. Fix: Use coordinator replication or alternative protocols

Interview Questions

Beginner

Q: What is Two-Phase Commit and how does it work?

A: Two-Phase Commit (2PC) is a protocol that ensures all participants in a distributed transaction either all commit or all abort.

How it works:

  1. Phase 1 (Prepare): Coordinator asks all participants if they can commit. Participants vote yes/no.
  2. Phase 2 (Commit/Abort):
    • If all vote yes: Coordinator tells everyone to commit
    • If any votes no: Coordinator tells everyone to abort

Goal: Atomicity - all nodes agree on the outcome.


Intermediate

Q: What are the limitations of 2PC? How would you address them?

A:

Limitations:

  1. Blocking: If coordinator fails, participants block waiting for decision
  2. Single point of failure: Coordinator is critical
  3. Not partition-tolerant: Requires all nodes to be reachable
  4. High latency: Multiple round trips (prepare + commit)
  5. Synchronous: All participants must respond

Solutions:

  1. Timeouts: Participants timeout and abort if coordinator doesn't respond
  2. 3PC: Three-Phase Commit reduces blocking (adds pre-commit phase)
  3. Saga pattern: Use compensating transactions for eventual consistency
  4. Paxos/Raft: Use consensus algorithms for better fault tolerance
  5. Majority commit: Commit if majority agrees (sacrifice some consistency)

When to use: Short transactions, strong consistency required, all nodes must agree.


Senior

Q: Design a fault-tolerant 2PC system. How do you handle coordinator failures, participant failures, and network partitions? How do you ensure no data loss?

A:

Fault-Tolerant 2PC Design:

class FaultTolerant2PC {
  private coordinator: Coordinator | null = null;
  private participants: Participant[] = [];
  private log: DurableLog;

  async executeWithFaultTolerance(transaction: Transaction): Promise<void> {
    // Elect or select coordinator
    this.coordinator = await this.electCoordinator();
    
    // Phase 1: Prepare with timeout
    const prepareResults = await Promise.allSettled(
      this.participants.map(p => 
        this.prepareWithTimeout(p, transaction, 5000)
      )
    );

    // Check results
    const votes = prepareResults.map(r => 
      r.status === 'fulfilled' && r.value === 'yes'
    );

    const allYes = votes.every(v => v);
    const majorityYes = votes.filter(v => v).length > this.participants.length / 2;

    // Decision based on fault tolerance level
    if (allYes) {
      await this.commitAll(transaction);
    } else if (majorityYes && this.allowMajorityCommit) {
      // Majority commit (sacrifice some consistency)
      await this.commitMajority(transaction, votes);
    } else {
      await this.abortAll(transaction);
    }
  }

  async handleCoordinatorFailure(): Promise<void> {
    // Detect coordinator failure
    if (!await this.isCoordinatorAlive()) {
      // Participants can timeout and abort, or
      // Elect new coordinator to recover
      const newCoordinator = await this.electNewCoordinator();
      await newCoordinator.recoverTransaction(this.transactionId);
    }
  }

  async recoverTransaction(transactionId: string): Promise<void> {
    // Read transaction state from log
    const state = await this.log.readTransactionState(transactionId);

    if (state.phase === 'prepared') {
      // Coordinator failed after prepare, need to decide
      // Query participants for their state
      const participantStates = await Promise.all(
        this.participants.map(p => p.getTransactionState(transactionId))
      );

      const allPrepared = participantStates.every(s => s === 'prepared');
      if (allPrepared) {
        // All prepared, can safely commit
        await this.commitAll(transactionId);
      } else {
        // Some not prepared, must abort
        await this.abortAll(transactionId);
      }
    }
  }

  // Participant recovery
  async participantRecovery(transactionId: string): Promise<void> {
    const state = await this.log.readLocalState(transactionId);

    if (state === 'prepared') {
      // Was prepared but didn't receive commit/abort
      // Query coordinator or other participants
      const decision = await this.queryDecision(transactionId);
      
      if (decision === 'commit') {
        await this.commit();
      } else {
        await this.abort();
      }
    }
  }
}

Handling Failures:

  1. Coordinator failure:

    • Participants timeout and can abort
    • Or elect new coordinator to recover
    • New coordinator reads log, queries participants, makes decision
  2. Participant failure:

    • Coordinator continues with remaining participants
    • Failed participant recovers from log on restart
    • Can query coordinator or other participants for decision
  3. Network partition:

    • Majority partition can proceed (if configured)
    • Minority partition blocks or aborts
    • Resolve conflicts when partition heals

Data Loss Prevention:

  • Durable logging: Write all state to persistent log before responding
  • Write-ahead log (WAL): Log before applying changes
  • Replication: Replicate coordinator log for high availability
  • Quorum: Require majority for commit decisions

Key Takeaways

  • 2PC ensures atomicity: All participants commit or all abort
  • Two phases: Prepare (voting) and Commit/Abort (decision)
  • Blocking protocol: Participants wait for coordinator
  • Coordinator is critical: Single point of failure
  • Not partition-tolerant: Requires all nodes reachable
  • Use for: Short transactions, strong consistency, all-or-nothing requirements
  • Alternatives: 3PC (less blocking), Saga (eventual consistency), Paxos/Raft (better fault tolerance)

About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.