Topic Overview

Consensus Algorithms (Raft, Paxos)

Learn how distributed systems achieve consensus among nodes using Raft and Paxos algorithms.

Senior15 min read

Consensus is the problem of getting multiple nodes in a distributed system to agree on a value, even in the presence of failures. Raft and Paxos are two fundamental consensus algorithms.

What is Consensus?

Consensus algorithms ensure that:

Agreement: All non-faulty nodes agree on the same value
Validity: The agreed value was proposed by some node
Termination: All nodes eventually decide on a value
Integrity: A node decides at most one value

Use cases: Distributed databases, configuration management, leader election, state machine replication.

Raft Algorithm

Raft is designed to be understandable while providing the same fault tolerance as Paxos.

Key Concepts

Leader-based: One leader handles all client requests and replicates to followers.

Terms: Time periods numbered sequentially. Each term has at most one leader.

Log replication: Leader appends entries to its log, replicates to followers.

Safety: Leader only commits entries that are replicated to majority.

States

Follower: Passive, responds to leader heartbeats
Candidate: Seeking votes to become leader
Leader: Handles client requests, replicates log

Leader Election

1class RaftNode {
2  private state: 'follower' | 'candidate' | 'leader' = 'follower';
3  private currentTerm: number = 0;
4  private votedFor: number | null = null;
5  private log: LogEntry[] = [];
6  private commitIndex: number = 0;
7
8  async startElection(): Promise<void> {
9    this.state = 'candidate';
10    this.currentTerm

Log Replication

1async appendEntry(entry: LogEntry): Promise<void> {
2  this.log.push({ ...entry, term: this.currentTerm });
3
4  // Replicate to followers
5  const responses = await Promise.all(
6    this.followers.map(follower => this.replicateLog(follower))
7  );
8
9  // Commit if majority acknowledged
10  const ackCount = responses.filter(r => r.success)length

Paxos Algorithm

Paxos is the original consensus algorithm, more complex but highly fault-tolerant.

Phases

Phase 1 (Prepare):

Proposer sends prepare(n) with proposal number n
Acceptor responds with promise not to accept proposals < n
If majority promise, proceed

Phase 2 (Accept):

Proposer sends accept(n, v) with value v
Acceptor accepts if n >= any promised number
If majority accept, value is chosen

Implementation

1class PaxosNode {
2  private promisedNumber: number = 0;
3  private acceptedNumber: number = 0;
4  private acceptedValue: any = null;
5
6  async prepare(proposalNumber: number): Promise<PromiseResponse> {
7    if (proposalNumber > this.promisedNumber) {
8      this.promisedNumber = proposalNumber;
9      return {
10        promised: true,
11        acceptedNumber: this.acceptedNumber,
12        acceptedValue: acceptedValue

Examples

Raft: Distributed Key-Value Store

1class DistributedKVStore {
2  private raft: RaftNode;
3  private state: Map<string, string> = new Map();
4
5  async set(key: string, value: string): Promise<void> {
6    const entry: LogEntry = {
7      type: 'SET',
8      key,
9      value,
10      term: this.raft.currentTerm
11    };
12
13    await this.raft.appendEntryentry

Paxos: Configuration Management

1class ConfigManager {
2  async proposeConfig(config: Config): Promise<void> {
3    let proposalNumber = this.generateProposalNumber();
4    let value = config;
5
6    while (true) {
7      // Phase 1: Prepare
8      const promises = await Promise.all(
9        this.acceptors.map(a => a.prepare(proposalNumber))
10      );
11
12      const majority = promises.p  ppromisedlength

Common Pitfalls

Split-brain in Raft: Network partition can cause multiple leaders. Fix: Require majority votes, use odd number of nodes
Paxos complexity: Hard to understand and implement correctly. Fix: Use Raft for most cases, or use existing libraries
Not handling concurrent proposals: Multiple proposers can conflict. Fix: Use unique proposal numbers, leader-based approach
Ignoring log consistency: Follower logs must match leader. Fix: Check last log term/index during election
Not committing safely: Committing before majority can cause inconsistency. Fix: Only commit after majority acknowledgment
Term confusion: Old terms can cause issues. Fix: Always check and update terms, reject stale messages
Not handling failures: Algorithm must work with node failures. Fix: Require majority, handle timeouts gracefully

Interview Questions

Beginner

Q: What is consensus and why is it needed in distributed systems?

A: Consensus is the problem of getting multiple nodes to agree on a value. It's needed because:

Coordination: Multiple nodes need to agree on shared state (e.g., database value, configuration)
Consistency: Ensures all nodes see the same data
Fault tolerance: System continues working even if some nodes fail
Ordering: Agree on the order of operations (critical for state machines)

Without consensus, nodes might have different views of the system, leading to inconsistencies and conflicts.

Intermediate

Q: Compare Raft and Paxos. When would you choose each?

Raft:

Simplicity: Designed to be understandable, easier to implement
Leader-based: Single leader handles all requests, simpler model
Strong leader: Leader has full authority, no conflicts
Use when: Building new systems, need understandable consensus, want easier debugging

Paxos:

Complexity: More complex, harder to understand and implement
No leader: Any node can propose, more flexible
Proven: Original consensus algorithm, well-studied
Use when: Need maximum flexibility, building on existing Paxos infrastructure

Comparison:

Understandability: Raft wins (designed for this)
Performance: Similar (both require majority)
Fault tolerance: Similar (both handle minority failures)
Flexibility: Paxos wins (no leader requirement)

Recommendation: Use Raft for most cases. Only use Paxos if you need leaderless consensus or are building on existing Paxos systems.

Senior

Q: Design a distributed database using Raft for consensus. How do you handle read operations, write operations, and ensure linearizability? How do you handle network partitions?

Architecture:

Raft cluster: 5 nodes (3-node minimum, 5 for better fault tolerance)
Leader: Handles all writes, replicates to followers
Reads: Can go to leader (strong consistency) or followers (eventual consistency)

Write Operations:

1class DistributedDB {
2  async write(key: string, value: string): Promise<void> {
3    if (this.raft.state !== 'leader') {
4      throw new Error('Not leader, redirect to leader');
5    }
6
7    const entry: LogEntry = {
8      type: 'WRITE',
9      key,
10      value,
11      term: this.raft.currentTerm,
12      index: this.raft.log.length

Read Operations:

1// Option 1: Read from leader (strong consistency)
2async read(key: string): Promise<string> {
3  if (this.raft.state !== 'leader') {
4    // Redirect to leader
5    const leader = await this.findLeader();
6    return await leader.read(key);
7  }
8
9  // Leader can serve reads directly (linearizable)
10  return this.stateMachine.get(key);
11}
12
13// Option 2: Read from followers (eventual consistency, faster)
14 key

Linearizability:

Writes: Always go through leader, committed only after majority
Reads: Read from leader for linearizability, or use read leases
Sequence numbers: Each operation gets sequence number, maintain ordering
Fencing tokens: Use tokens to prevent stale reads

Network Partitions:

Detection: Heartbeat timeouts, no majority responses
Minority partition: Cannot elect leader, becomes read-only
Majority partition: Continues operating, can elect new leader
Merge handling:
- Compare terms, highest term wins
- Leader with higher term forces followers to update
- Resolve conflicts using last-write-wins or application logic

Implementation:

1class PartitionAwareRaft {
2  async handlePartition(): Promise<void> {
3    // Check if we have majority
4    const responses = await Promise.allSettled(
5      this.nodes.map(n => this.sendHeartbeat(n))
6    );
7
8    const aliveCount = responses.filter(r => r.status === 'fulfilled').length;
9    const quorum = Math.floor(this.nodes.length / 2

Optimizations:

Batching: Batch multiple writes in single log entry
Read replicas: Use followers for read scaling (with consistency trade-offs)
Snapshotting: Periodically snapshot state, truncate log
Compression: Compress log entries to reduce network traffic

Consensus ensures agreement among distributed nodes on a value, critical for consistency
Raft is simpler than Paxos, designed for understandability while maintaining fault tolerance
Paxos is more flexible but complex, allows any node to propose
Leader-based approach (Raft) simplifies consensus but creates single point of coordination
Majority requirement ensures fault tolerance - system works with up to (n-1)/2 failures
Log replication ensures all nodes have same sequence of operations
Terms/epochs prevent stale leaders and ensure safety
Network partitions require quorum - minority partition cannot make progress
Linearizability requires reads to go through leader or use read leases
Choose Raft for most use cases due to simplicity, use Paxos only if you need leaderless consensus
Leader Election - How nodes elect a leader for coordination
Fault Tolerance - Handling node failures in distributed systems
Partition Tolerance - CAP theorem and handling network partitions
Two-Phase Commit (2PC) - Another consensus protocol for distributed transactions
Distributed Transactions - Maintaining ACID properties across nodes

Key Takeaways

Consensus ensures agreement among distributed nodes on a value, critical for consistency

Raft is simpler than Paxos, designed for understandability while maintaining fault tolerance

Paxos is more flexible but complex, allows any node to propose

Leader-based approach (Raft) simplifies consensus but creates single point of coordination

Majority requirement ensures fault tolerance - system works with up to (n-1)/2 failures

Log replication ensures all nodes have same sequence of operations

Terms/epochs prevent stale leaders and ensure safety

Network partitions require quorum - minority partition cannot make progress

Linearizability requires reads to go through leader or use read leases

Choose Raft for most use cases due to simplicity, use Paxos only if you need leaderless consensus

Consensus Algorithms (Raft, Paxos)

What is Consensus?

Raft Algorithm

Key Concepts

States

Leader Election

Log Replication

Paxos Algorithm

Phases

Implementation

Examples

Raft: Distributed Key-Value Store

Paxos: Configuration Management

Common Pitfalls

Interview Questions

Beginner

Intermediate

Senior

Key Takeaways

Related Topics