Topic Overview

CAP Theorem

Master the CAP theorem: understand why you can only guarantee two of Consistency, Availability, and Partition tolerance. Learn real-world trade-offs and examples.

The CAP theorem states that in a distributed system, you can only guarantee two out of three properties: Consistency, Availability, and Partition tolerance. Understanding CAP is crucial for designing distributed systems.


What is CAP Theorem?

CAP Theorem (also called Brewer's theorem) states:

In a distributed system, you can only guarantee two out of three:

  • Consistency: All nodes see the same data simultaneously
  • Availability: System remains operational
  • Partition tolerance: System continues despite network failures

Key insight: Network partitions are inevitable in distributed systems, so you must choose between Consistency and Availability when a partition occurs.


The Three Properties

Consistency (C)

Definition: All nodes see the same data at the same time.

Characteristics:

  • Linearizability: All reads see the latest write
  • Strong consistency: No stale data
  • Synchronous replication: All replicas updated before response

Example:

Write: Node A → x = 5
Read: Node B → x = 5 (immediately, not stale)

Availability (A)

Definition: System remains operational and responds to requests.

Characteristics:

  • No downtime: System always responds
  • No errors: Requests don't fail (except network partitions)
  • All nodes: Every non-failing node must respond

Example:

Request → System (always responds, even if data might be stale)

Partition Tolerance (P)

Definition: System continues operating despite network failures.

Characteristics:

  • Network partitions: Nodes can't communicate
  • Continues operating: System doesn't fail completely
  • Inevitable: Network partitions will happen

Example:

Network split: Node A ↔ Node B (can't communicate)
System continues: Both nodes still respond

CAP Combinations

CA (Consistency + Availability)

Trade-off: No partition tolerance

Characteristics:

  • Strong consistency
  • High availability
  • Cannot handle network partitions

Example:

  • Single-node database
  • Traditional RDBMS (without replication)
  • Not practical for distributed systems (partitions are inevitable)

Why not practical:

  • Network partitions are inevitable
  • Distributed systems must handle partitions
  • CA systems fail completely during partitions

CP (Consistency + Partition Tolerance)

Trade-off: Sacrifice availability during partitions

Characteristics:

  • Strong consistency
  • Handles network partitions
  • Unavailable during partitions (until partition resolves)

Example:

  • MongoDB (with strong consistency)
  • HBase
  • ZooKeeper
  • Traditional RDBMS (with synchronous replication)

Behavior during partition:

Partition occurs:
  - System maintains consistency
  - Some nodes become unavailable
  - System waits for partition to resolve

AP (Availability + Partition Tolerance)

Trade-off: Sacrifice consistency (eventual consistency)

Characteristics:

  • High availability
  • Handles network partitions
  • Eventual consistency (may serve stale data)

Example:

  • Cassandra
  • DynamoDB
  • CouchDB
  • DNS

Behavior during partition:

Partition occurs:
  - All nodes remain available
  - May serve stale data
  - Consistency achieved eventually (when partition resolves)

Real-World Examples

CP System: MongoDB

// MongoDB with strong consistency
// During partition: Some nodes unavailable

// Write with write concern
db.collection.insertOne(
  { data: "value" },
  { writeConcern: { w: "majority" } }
);

// Behavior:
// - Waits for majority of nodes to acknowledge
// - If partition: Some nodes unavailable
// - Trade-off: Consistency over availability

AP System: Cassandra

// Cassandra with eventual consistency
// During partition: All nodes available, may serve stale data

// Write with consistency level
const query = "INSERT INTO table (id, data) VALUES (1, 'value')";
await client.execute(query, [], { consistency: cassandra.types.consistencies.one });

// Behavior:
// - Writes to one node (fast)
// - Replicates asynchronously
// - During partition: All nodes available
// - Trade-off: Availability over consistency

CA System: Single-Node Database

-- Single-node PostgreSQL
-- No partition tolerance (single node)

INSERT INTO table (id, data) VALUES (1, 'value');
-- Always consistent and available
-- But: Not distributed, can't handle network partitions

CAP Theorem Misconceptions

Misconception 1: "You must choose one"

Reality: You choose which two to prioritize. Partition tolerance is usually required, so you choose between CP and AP.

Misconception 2: "CAP applies to all operations"

Reality: CAP applies during network partitions. When there's no partition, you can have all three.

Misconception 3: "AP means no consistency"

Reality: AP systems use eventual consistency. They become consistent when partition resolves.

Misconception 4: "CP means always unavailable"

Reality: CP systems are unavailable only during partitions. When no partition, they're available.


CAP in Practice

Choosing CP vs AP

Choose CP when:

  • Strong consistency required: Financial transactions, inventory
  • Can tolerate downtime: During partitions, some nodes unavailable
  • Examples: Banking systems, inventory management

Choose AP when:

  • High availability critical: Social media, content delivery
  • Can tolerate eventual consistency: User profiles, recommendations
  • Examples: Social networks, CDNs, DNS

Hybrid Approaches

Many systems use both CP and AP for different operations:

class HybridSystem {
  // CP for critical operations
  async transferMoney(from: Account, to: Account, amount: number) {
    // Strong consistency required
    await this.cpDatabase.transaction(async (tx) => {
      await tx.debit(from, amount);
      await tx.credit(to, amount);
    });
  }
  
  // AP for non-critical operations
  async updateProfile(userId: string, profile: Profile) {
    // Eventual consistency acceptable
    await this.apDatabase.update(userId, profile);
  }
}

Common Pitfalls

  • Ignoring partition tolerance: Assuming partitions won't happen. Fix: Always design for partitions
  • Choosing CA: Not practical for distributed systems. Fix: Choose CP or AP
  • Misunderstanding AP: Thinking AP means no consistency. Fix: AP uses eventual consistency
  • Not considering use case: Choosing CP when AP is better (or vice versa). Fix: Understand requirements
  • Assuming all operations same: Different operations may need different CAP. Fix: Use hybrid approach
  • Not handling partitions: Not planning for partition scenarios. Fix: Design partition handling

Interview Questions

Beginner

Q: What is the CAP theorem? Explain the three properties.

A:

CAP Theorem states that in a distributed system, you can only guarantee two out of three properties:

  1. Consistency (C): All nodes see the same data at the same time

    • Strong consistency: All reads see latest write
    • No stale data
  2. Availability (A): System remains operational and responds to requests

    • No downtime
    • All non-failing nodes respond
  3. Partition Tolerance (P): System continues operating despite network failures

    • Handles network partitions
    • Inevitable in distributed systems

Key insight: Network partitions are inevitable, so you must choose between Consistency and Availability during partitions.

Combinations:

  • CA: Consistency + Availability (not practical for distributed systems)
  • CP: Consistency + Partition tolerance (sacrifice availability)
  • AP: Availability + Partition tolerance (sacrifice consistency)

Example:

  • CP: MongoDB (strong consistency, unavailable during partitions)
  • AP: Cassandra (high availability, eventual consistency)

Intermediate

Q: Explain the difference between CP and AP systems. When would you choose each?

A:

CP (Consistency + Partition Tolerance):

Characteristics:

  • Strong consistency: All nodes see same data
  • Handles network partitions
  • Unavailable during partitions (until partition resolves)

Behavior during partition:

Partition: Node A ↔ Node B (can't communicate)
  - System maintains consistency
  - Some nodes become unavailable
  - Waits for partition to resolve

Examples:

  • MongoDB (with strong consistency)
  • HBase
  • ZooKeeper
  • Traditional RDBMS (synchronous replication)

Choose CP when:

  • Strong consistency required (financial transactions, inventory)
  • Can tolerate downtime during partitions
  • Data integrity critical

AP (Availability + Partition Tolerance):

Characteristics:

  • High availability: System always responds
  • Handles network partitions
  • Eventual consistency (may serve stale data)

Behavior during partition:

Partition: Node A ↔ Node B (can't communicate)
  - All nodes remain available
  - May serve stale data
  - Consistency achieved eventually (when partition resolves)

Examples:

  • Cassandra
  • DynamoDB
  • CouchDB
  • DNS

Choose AP when:

  • High availability critical (social media, content delivery)
  • Can tolerate eventual consistency (user profiles, recommendations)
  • Downtime not acceptable

Decision Matrix:

  • Financial transactions: CP (consistency critical)
  • Social media: AP (availability critical)
  • Inventory: CP (consistency critical)
  • Content delivery: AP (availability critical)

Senior

Q: Design a distributed system that handles both CP and AP requirements. How do you implement different consistency levels for different operations?

A:

class HybridCAPSystem {
  private cpStore: CPStore; // Strong consistency
  private apStore: APStore; // Eventual consistency
  private coordinator: Coordinator;
  
  constructor() {
    // CP store for critical operations
    this.cpStore = new CPStore({
      consistency: 'strong',
      replication: 'synchronous'
    });
    
    // AP store for non-critical operations
    this.apStore = new APStore({
      consistency: 'eventual',
      replication: 'asynchronous'
    });
    
    this.coordinator = new Coordinator();
  }
  
  // 1. CP Operations (Strong Consistency)
  async transferMoney(from: Account, to: Account, amount: number): Promise<void> {
    // Critical operation: Requires strong consistency
    await this.cpStore.transaction(async (tx) => {
      // Read with strong consistency
      const fromBalance = await tx.read(from.id, { consistency: 'strong' });
      const toBalance = await tx.read(to.id, { consistency: 'strong' });
      
      // Validate
      if (fromBalance < amount) {
        throw new Error('Insufficient funds');
      }
      
      // Write with strong consistency
      await tx.write(from.id, fromBalance - amount, { consistency: 'strong' });
      await tx.write(to.id, toBalance + amount, { consistency: 'strong' });
      
      // Wait for majority acknowledgment (CP)
      await tx.commit({ w: 'majority' });
    });
  }
  
  // 2. AP Operations (Eventual Consistency)
  async updateProfile(userId: string, profile: Profile): Promise<void> {
    // Non-critical operation: Eventual consistency acceptable
    await this.apStore.write(userId, profile, {
      consistency: 'eventual',
      replication: 'asynchronous'
    });
    
    // Returns immediately (AP)
    // Replicates asynchronously
  }
  
  // 3. Tunable Consistency
  async readData(key: string, consistencyLevel: 'strong' | 'eventual'): Promise<Data> {
    if (consistencyLevel === 'strong') {
      // CP: Strong consistency, may be unavailable during partition
      return await this.cpStore.read(key, { consistency: 'strong' });
    } else {
      // AP: Eventual consistency, always available
      return await this.apStore.read(key, { consistency: 'eventual' });
    }
  }
  
  // 4. CP Store Implementation
  class CPStore {
    async read(key: string, options: { consistency: 'strong' }): Promise<Data> {
      // Read from majority of nodes
      const nodes = this.getMajorityNodes();
      const results = await Promise.all(nodes.map(node => node.read(key)));
      
      // Return latest (by timestamp)
      return this.getLatest(results);
    }
    
    async write(key: string, value: Data, options: { consistency: 'strong' }): Promise<void> {
      // Write to majority of nodes
      const nodes = this.getMajorityNodes();
      await Promise.all(nodes.map(node => node.write(key, value)));
      
      // Wait for acknowledgment (CP)
      // If partition: Some nodes unavailable, operation blocks
    }
  }
  
  // 5. AP Store Implementation
  class APStore {
    async read(key: string, options: { consistency: 'eventual' }): Promise<Data> {
      // Read from any available node (AP)
      const node = this.getAvailableNode();
      return await node.read(key);
      
      // May return stale data (eventual consistency)
      // But: Always available
    }
    
    async write(key: string, value: Data, options: { consistency: 'eventual' }): Promise<void> {
      // Write to any available node (AP)
      const node = this.getAvailableNode();
      await node.write(key, value);
      
      // Replicate asynchronously
      this.replicateAsync(key, value);
      
      // Returns immediately (AP)
      // Consistency achieved eventually
    }
  }
  
  // 6. Partition Handling
  async handlePartition(): Promise<void> {
    // Detect partition
    const partition = await this.detectPartition();
    
    if (partition) {
      // CP operations: Block until partition resolves
      this.cpStore.setUnavailable();
      
      // AP operations: Continue with eventual consistency
      this.apStore.continueWithEventualConsistency();
      
      // Monitor partition resolution
      await this.monitorPartitionResolution();
    }
  }
  
  // 7. Consistency Levels per Operation
  getConsistencyLevel(operation: string): 'strong' | 'eventual' {
    const cpOperations = ['transfer', 'payment', 'inventory'];
    const apOperations = ['profile', 'recommendation', 'cache'];
    
    if (cpOperations.includes(operation)) {
      return 'strong'; // CP
    } else {
      return 'eventual'; // AP
    }
  }
}

Features:

  1. Hybrid approach: CP for critical, AP for non-critical
  2. Tunable consistency: Different consistency levels per operation
  3. Partition handling: CP blocks, AP continues
  4. Operation-specific: Choose CAP per operation type
  5. Monitoring: Track consistency, availability, partitions

Key Takeaways

  • CAP Theorem: In distributed systems, you can only guarantee two of Consistency, Availability, Partition tolerance
  • Partition tolerance: Usually required (partitions are inevitable), so choose CP or AP
  • CP systems: Strong consistency, unavailable during partitions (MongoDB, HBase)
  • AP systems: High availability, eventual consistency (Cassandra, DynamoDB)
  • CA systems: Not practical for distributed systems (partitions are inevitable)
  • Hybrid approach: Use CP for critical operations, AP for non-critical
  • Tunable consistency: Different consistency levels for different operations
  • Best practices: Understand requirements, choose appropriate CAP combination, handle partitions

About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.