Topic Overview

Consensus Algorithms (Raft, Paxos)

Learn how distributed systems achieve consensus among nodes using Raft and Paxos algorithms.

Senior15 min read

Consensus is the problem of getting multiple nodes in a distributed system to agree on a value, even in the presence of failures. Raft and Paxos are two fundamental consensus algorithms.


What is Consensus?

Consensus algorithms ensure that:

  • Agreement: All non-faulty nodes agree on the same value
  • Validity: The agreed value was proposed by some node
  • Termination: All nodes eventually decide on a value
  • Integrity: A node decides at most one value

Use cases: Distributed databases, configuration management, leader election, state machine replication.


Raft Algorithm

Raft is designed to be understandable while providing the same fault tolerance as Paxos.

Key Concepts

Leader-based: One leader handles all client requests and replicates to followers.

Terms: Time periods numbered sequentially. Each term has at most one leader.

Log replication: Leader appends entries to its log, replicates to followers.

Safety: Leader only commits entries that are replicated to majority.

States

  1. Follower: Passive, responds to leader heartbeats
  2. Candidate: Seeking votes to become leader
  3. Leader: Handles client requests, replicates log

Leader Election

1class RaftNode {
2 private state: 'follower' | 'candidate' | 'leader' = 'follower';
3 private currentTerm: number = 0;
4 private votedFor: number | null = null;
5 private log: LogEntry[] = [];
6 private commitIndex: number = 0;
7
8 async startElection(): Promise<void> {
9 this.state = 'candidate';
10 this.currentTerm

Log Replication

1async appendEntry(entry: LogEntry): Promise<void> {
2 this.log.push({ ...entry, term: this.currentTerm });
3
4 // Replicate to followers
5 const responses = await Promise.all(
6 this.followers.map(follower => this.replicateLog(follower))
7 );
8
9 // Commit if majority acknowledged
10 const ackCount = responses.filter(r => r.success)length

Paxos Algorithm

Paxos is the original consensus algorithm, more complex but highly fault-tolerant.

Phases

Phase 1 (Prepare):

  1. Proposer sends prepare(n) with proposal number n
  2. Acceptor responds with promise not to accept proposals < n
  3. If majority promise, proceed

Phase 2 (Accept):

  1. Proposer sends accept(n, v) with value v
  2. Acceptor accepts if n >= any promised number
  3. If majority accept, value is chosen

Implementation

1class PaxosNode {
2 private promisedNumber: number = 0;
3 private acceptedNumber: number = 0;
4 private acceptedValue: any = null;
5
6 async prepare(proposalNumber: number): Promise<PromiseResponse> {
7 if (proposalNumber > this.promisedNumber) {
8 this.promisedNumber = proposalNumber;
9 return {
10 promised: true,
11 acceptedNumber: this.acceptedNumber,
12 acceptedValue: acceptedValue

Examples

Raft: Distributed Key-Value Store

1class DistributedKVStore {
2 private raft: RaftNode;
3 private state: Map<string, string> = new Map();
4
5 async set(key: string, value: string): Promise<void> {
6 const entry: LogEntry = {
7 type: 'SET',
8 key,
9 value,
10 term: this.raft.currentTerm
11 };
12
13 await this.raft.appendEntryentry

Paxos: Configuration Management

1class ConfigManager {
2 async proposeConfig(config: Config): Promise<void> {
3 let proposalNumber = this.generateProposalNumber();
4 let value = config;
5
6 while (true) {
7 // Phase 1: Prepare
8 const promises = await Promise.all(
9 this.acceptors.map(a => a.prepare(proposalNumber))
10 );
11
12 const majority = promises.p ppromisedlength

Common Pitfalls

  • Split-brain in Raft: Network partition can cause multiple leaders. Fix: Require majority votes, use odd number of nodes
  • Paxos complexity: Hard to understand and implement correctly. Fix: Use Raft for most cases, or use existing libraries
  • Not handling concurrent proposals: Multiple proposers can conflict. Fix: Use unique proposal numbers, leader-based approach
  • Ignoring log consistency: Follower logs must match leader. Fix: Check last log term/index during election
  • Not committing safely: Committing before majority can cause inconsistency. Fix: Only commit after majority acknowledgment
  • Term confusion: Old terms can cause issues. Fix: Always check and update terms, reject stale messages
  • Not handling failures: Algorithm must work with node failures. Fix: Require majority, handle timeouts gracefully

Interview Questions

Beginner

Q: What is consensus and why is it needed in distributed systems?

A: Consensus is the problem of getting multiple nodes to agree on a value. It's needed because:

  • Coordination: Multiple nodes need to agree on shared state (e.g., database value, configuration)
  • Consistency: Ensures all nodes see the same data
  • Fault tolerance: System continues working even if some nodes fail
  • Ordering: Agree on the order of operations (critical for state machines)

Without consensus, nodes might have different views of the system, leading to inconsistencies and conflicts.


Intermediate

Q: Compare Raft and Paxos. When would you choose each?

A:

Raft:

  • Simplicity: Designed to be understandable, easier to implement
  • Leader-based: Single leader handles all requests, simpler model
  • Strong leader: Leader has full authority, no conflicts
  • Use when: Building new systems, need understandable consensus, want easier debugging

Paxos:

  • Complexity: More complex, harder to understand and implement
  • No leader: Any node can propose, more flexible
  • Proven: Original consensus algorithm, well-studied
  • Use when: Need maximum flexibility, building on existing Paxos infrastructure

Comparison:

  • Understandability: Raft wins (designed for this)
  • Performance: Similar (both require majority)
  • Fault tolerance: Similar (both handle minority failures)
  • Flexibility: Paxos wins (no leader requirement)

Recommendation: Use Raft for most cases. Only use Paxos if you need leaderless consensus or are building on existing Paxos systems.


Senior

Q: Design a distributed database using Raft for consensus. How do you handle read operations, write operations, and ensure linearizability? How do you handle network partitions?

A:

Architecture:

  • Raft cluster: 5 nodes (3-node minimum, 5 for better fault tolerance)
  • Leader: Handles all writes, replicates to followers
  • Reads: Can go to leader (strong consistency) or followers (eventual consistency)

Write Operations:

1class DistributedDB {
2 async write(key: string, value: string): Promise<void> {
3 if (this.raft.state !== 'leader') {
4 throw new Error('Not leader, redirect to leader');
5 }
6
7 const entry: LogEntry = {
8 type: 'WRITE',
9 key,
10 value,
11 term: this.raft.currentTerm,
12 index: this.raft.log.length

Read Operations:

1// Option 1: Read from leader (strong consistency)
2async read(key: string): Promise<string> {
3 if (this.raft.state !== 'leader') {
4 // Redirect to leader
5 const leader = await this.findLeader();
6 return await leader.read(key);
7 }
8
9 // Leader can serve reads directly (linearizable)
10 return this.stateMachine.get(key);
11}
12
13// Option 2: Read from followers (eventual consistency, faster)
14 key

Linearizability:

  • Writes: Always go through leader, committed only after majority
  • Reads: Read from leader for linearizability, or use read leases
  • Sequence numbers: Each operation gets sequence number, maintain ordering
  • Fencing tokens: Use tokens to prevent stale reads

Network Partitions:

  1. Detection: Heartbeat timeouts, no majority responses
  2. Minority partition: Cannot elect leader, becomes read-only
  3. Majority partition: Continues operating, can elect new leader
  4. Merge handling:
    • Compare terms, highest term wins
    • Leader with higher term forces followers to update
    • Resolve conflicts using last-write-wins or application logic

Implementation:

1class PartitionAwareRaft {
2 async handlePartition(): Promise<void> {
3 // Check if we have majority
4 const responses = await Promise.allSettled(
5 this.nodes.map(n => this.sendHeartbeat(n))
6 );
7
8 const aliveCount = responses.filter(r => r.status === 'fulfilled').length;
9 const quorum = Math.floor(this.nodes.length / 2

Optimizations:

  • Batching: Batch multiple writes in single log entry
  • Read replicas: Use followers for read scaling (with consistency trade-offs)
  • Snapshotting: Periodically snapshot state, truncate log
  • Compression: Compress log entries to reduce network traffic

  • Consensus ensures agreement among distributed nodes on a value, critical for consistency

  • Raft is simpler than Paxos, designed for understandability while maintaining fault tolerance

  • Paxos is more flexible but complex, allows any node to propose

  • Leader-based approach (Raft) simplifies consensus but creates single point of coordination

  • Majority requirement ensures fault tolerance - system works with up to (n-1)/2 failures

  • Log replication ensures all nodes have same sequence of operations

  • Terms/epochs prevent stale leaders and ensure safety

  • Network partitions require quorum - minority partition cannot make progress

  • Linearizability requires reads to go through leader or use read leases

  • Choose Raft for most use cases due to simplicity, use Paxos only if you need leaderless consensus

  • Leader Election - How nodes elect a leader for coordination

  • Fault Tolerance - Handling node failures in distributed systems

  • Partition Tolerance - CAP theorem and handling network partitions

  • Two-Phase Commit (2PC) - Another consensus protocol for distributed transactions

  • Distributed Transactions - Maintaining ACID properties across nodes

Key Takeaways

Consensus ensures agreement among distributed nodes on a value, critical for consistency

Raft is simpler than Paxos, designed for understandability while maintaining fault tolerance

Paxos is more flexible but complex, allows any node to propose

Leader-based approach (Raft) simplifies consensus but creates single point of coordination

Majority requirement ensures fault tolerance - system works with up to (n-1)/2 failures

Log replication ensures all nodes have same sequence of operations

Terms/epochs prevent stale leaders and ensure safety

Network partitions require quorum - minority partition cannot make progress

Linearizability requires reads to go through leader or use read leases

Choose Raft for most use cases due to simplicity, use Paxos only if you need leaderless consensus


About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.