Topic Overview

CAP Theorem

Master the CAP theorem: understand why you can only guarantee two of Consistency, Availability, and Partition tolerance. Learn real-world trade-offs and examples.

14 min read

The CAP theorem states that in a distributed system, you can only guarantee two out of three properties: Consistency, Availability, and Partition tolerance. Understanding CAP is crucial for designing distributed systems.

What is CAP Theorem?

CAP Theorem (also called Brewer's theorem) states:

In a distributed system, you can only guarantee two out of three:

Consistency: All nodes see the same data simultaneously

Availability: System remains operational

Partition tolerance: System continues despite network failures

Key insight: Network partitions are inevitable in distributed systems, so you must choose between Consistency and Availability when a partition occurs.

The Three Properties

Consistency (C)

Definition: All nodes see the same data at the same time.

Characteristics:

Linearizability: All reads see the latest write
Strong consistency: No stale data
Synchronous replication: All replicas updated before response

Example:

Write: Node A → x = 5
Read: Node B → x = 5 (immediately, not stale)

Availability (A)

Definition: System remains operational and responds to requests.

Characteristics:

No downtime: System always responds
No errors: Requests don't fail (except network partitions)
All nodes: Every non-failing node must respond

Example:

Request → System (always responds, even if data might be stale)

Partition Tolerance (P)

Definition: System continues operating despite network failures.

Characteristics:

Network partitions: Nodes can't communicate
Continues operating: System doesn't fail completely
Inevitable: Network partitions will happen

Example:

Network split: Node A ↔ Node B (can't communicate)
System continues: Both nodes still respond

CAP Combinations

CA (Consistency + Availability)

Trade-off: No partition tolerance

Characteristics:

Strong consistency
High availability
Cannot handle network partitions

Example:

Single-node database
Traditional RDBMS (without replication)
Not practical for distributed systems (partitions are inevitable)

Why not practical:

Network partitions are inevitable
Distributed systems must handle partitions
CA systems fail completely during partitions

CP (Consistency + Partition Tolerance)

Trade-off: Sacrifice availability during partitions

Characteristics:

Strong consistency
Handles network partitions
Unavailable during partitions (until partition resolves)

Example:

MongoDB (with strong consistency)
HBase
ZooKeeper
Traditional RDBMS (with synchronous replication)

Behavior during partition:

Partition occurs:
  - System maintains consistency
  - Some nodes become unavailable
  - System waits for partition to resolve

AP (Availability + Partition Tolerance)

Trade-off: Sacrifice consistency (eventual consistency)

Characteristics:

High availability
Handles network partitions
Eventual consistency (may serve stale data)

Example:

Cassandra
DynamoDB
CouchDB
DNS

Behavior during partition:

Partition occurs:
  - All nodes remain available
  - May serve stale data
  - Consistency achieved eventually (when partition resolves)

Real-World Examples

CP System: MongoDB

// MongoDB with strong consistency
// During partition: Some nodes unavailable

// Write with write concern
db.collection.insertOne(
  { data: "value" },
  { writeConcern: { w: "majority" } }
);

// Behavior:
// - Waits for majority of nodes to acknowledge
// - If partition: Some nodes unavailable
// - Trade-off: Consistency over availability

AP System: Cassandra

// Cassandra with eventual consistency
// During partition: All nodes available, may serve stale data

// Write with consistency level
const query = "INSERT INTO table (id, data) VALUES (1, 'value')";
await client.execute(query, [], { consistency: cassandra.types.consistencies.one });

// Behavior:
// - Writes to one node (fast)
// - Replicates asynchronously
// - During partition: All nodes available
// - Trade-off: Availability over consistency

CA System: Single-Node Database

-- Single-node PostgreSQL
-- No partition tolerance (single node)

INSERT INTO table (id, data) VALUES (1, 'value');
-- Always consistent and available
-- But: Not distributed, can't handle network partitions

CAP Theorem Misconceptions

Misconception 1: "You must choose one"

Reality: You choose which two to prioritize. Partition tolerance is usually required, so you choose between CP and AP.

Misconception 2: "CAP applies to all operations"

Reality: CAP applies during network partitions. When there's no partition, you can have all three.

Misconception 3: "AP means no consistency"

Reality: AP systems use eventual consistency. They become consistent when partition resolves.

Misconception 4: "CP means always unavailable"

Reality: CP systems are unavailable only during partitions. When no partition, they're available.

CAP in Practice

Choosing CP vs AP

Choose CP when:

Strong consistency required: Financial transactions, inventory
Can tolerate downtime: During partitions, some nodes unavailable
Examples: Banking systems, inventory management

Choose AP when:

High availability critical: Social media, content delivery
Can tolerate eventual consistency: User profiles, recommendations
Examples: Social networks, CDNs, DNS

Hybrid Approaches

Many systems use both CP and AP for different operations:

class HybridSystem {
  // CP for critical operations
  async transferMoney(from: Account, to: Account, amount: number) {
    // Strong consistency required
    await this.cpDatabase.transaction(async (tx) => {
      await tx.debit(from, amount);
      await tx.credit(to, amount);
    });
  }
  
  // AP for non-critical operations
  async updateProfile(userId: string, profile: Profile) {
    // Eventual consistency acceptable
    await this.apDatabase.update(userId, profile);
  }
}

Common Pitfalls

Ignoring partition tolerance: Assuming partitions won't happen. Fix: Always design for partitions
Choosing CA: Not practical for distributed systems. Fix: Choose CP or AP
Misunderstanding AP: Thinking AP means no consistency. Fix: AP uses eventual consistency
Not considering use case: Choosing CP when AP is better (or vice versa). Fix: Understand requirements
Assuming all operations same: Different operations may need different CAP. Fix: Use hybrid approach
Not handling partitions: Not planning for partition scenarios. Fix: Design partition handling

Interview Questions

Beginner

Q: What is the CAP theorem? Explain the three properties.

CAP Theorem states that in a distributed system, you can only guarantee two out of three properties:

Consistency (C): All nodes see the same data at the same time
- Strong consistency: All reads see latest write
- No stale data
Availability (A): System remains operational and responds to requests
- No downtime
- All non-failing nodes respond
Partition Tolerance (P): System continues operating despite network failures
- Handles network partitions
- Inevitable in distributed systems

Key insight: Network partitions are inevitable, so you must choose between Consistency and Availability during partitions.

Combinations:

CA: Consistency + Availability (not practical for distributed systems)
CP: Consistency + Partition tolerance (sacrifice availability)
AP: Availability + Partition tolerance (sacrifice consistency)

Example:

CP: MongoDB (strong consistency, unavailable during partitions)
AP: Cassandra (high availability, eventual consistency)

Intermediate

Q: Explain the difference between CP and AP systems. When would you choose each?

CP (Consistency + Partition Tolerance):

Characteristics:

Strong consistency: All nodes see same data
Handles network partitions
Unavailable during partitions (until partition resolves)

Behavior during partition:

Partition: Node A ↔ Node B (can't communicate)
  - System maintains consistency
  - Some nodes become unavailable
  - Waits for partition to resolve

Examples:

MongoDB (with strong consistency)
HBase
ZooKeeper
Traditional RDBMS (synchronous replication)

Choose CP when:

Strong consistency required (financial transactions, inventory)
Can tolerate downtime during partitions
Data integrity critical

AP (Availability + Partition Tolerance):

Characteristics:

High availability: System always responds
Handles network partitions
Eventual consistency (may serve stale data)

Behavior during partition:

Partition: Node A ↔ Node B (can't communicate)
  - All nodes remain available
  - May serve stale data
  - Consistency achieved eventually (when partition resolves)

Examples:

Cassandra
DynamoDB
CouchDB
DNS

Choose AP when:

High availability critical (social media, content delivery)
Can tolerate eventual consistency (user profiles, recommendations)
Downtime not acceptable

Decision Matrix:

Financial transactions: CP (consistency critical)
Social media: AP (availability critical)
Inventory: CP (consistency critical)
Content delivery: AP (availability critical)

Senior

Q: Design a distributed system that handles both CP and AP requirements. How do you implement different consistency levels for different operations?

class HybridCAPSystem {
  private cpStore: CPStore; // Strong consistency
  private apStore: APStore; // Eventual consistency
  private coordinator: Coordinator;
  
  constructor() {
    // CP store for critical operations
    this.cpStore = new CPStore({
      consistency: 'strong',
      replication: 'synchronous'
    });
    
    // AP store for non-critical operations
    this.apStore = new APStore({
      consistency: 'eventual',
      replication: 'asynchronous'
    });
    
    this.coordinator = new Coordinator();
  }
  
  // 1. CP Operations (Strong Consistency)
  async transferMoney(from: Account, to: Account, amount: number): Promise<void> {
    // Critical operation: Requires strong consistency
    await this.cpStore.transaction(async (tx) => {
      // Read with strong consistency
      const fromBalance = await tx.read(from.id, { consistency: 'strong' });
      const toBalance = await tx.read(to.id, { consistency: 'strong' });
      
      // Validate
      if (fromBalance < amount) {
        throw new Error('Insufficient funds');
      }
      
      // Write with strong consistency
      await tx.write(from.id, fromBalance - amount, { consistency: 'strong' });
      await tx.write(to.id, toBalance + amount, { consistency: 'strong' });
      
      // Wait for majority acknowledgment (CP)
      await tx.commit({ w: 'majority' });
    });
  }
  
  // 2. AP Operations (Eventual Consistency)
  async updateProfile(userId: string, profile: Profile): Promise<void> {
    // Non-critical operation: Eventual consistency acceptable
    await this.apStore.write(userId, profile, {
      consistency: 'eventual',
      replication: 'asynchronous'
    });
    
    // Returns immediately (AP)
    // Replicates asynchronously
  }
  
  // 3. Tunable Consistency
  async readData(key: string, consistencyLevel: 'strong' | 'eventual'): Promise<Data> {
    if (consistencyLevel === 'strong') {
      // CP: Strong consistency, may be unavailable during partition
      return await this.cpStore.read(key, { consistency: 'strong' });
    } else {
      // AP: Eventual consistency, always available
      return await this.apStore.read(key, { consistency: 'eventual' });
    }
  }
  
  // 4. CP Store Implementation
  class CPStore {
    async read(key: string, options: { consistency: 'strong' }): Promise<Data> {
      // Read from majority of nodes
      const nodes = this.getMajorityNodes();
      const results = await Promise.all(nodes.map(node => node.read(key)));
      
      // Return latest (by timestamp)
      return this.getLatest(results);
    }
    
    async write(key: string, value: Data, options: { consistency: 'strong' }): Promise<void> {
      // Write to majority of nodes
      const nodes = this.getMajorityNodes();
      await Promise.all(nodes.map(node => node.write(key, value)));
      
      // Wait for acknowledgment (CP)
      // If partition: Some nodes unavailable, operation blocks
    }
  }
  
  // 5. AP Store Implementation
  class APStore {
    async read(key: string, options: { consistency: 'eventual' }): Promise<Data> {
      // Read from any available node (AP)
      const node = this.getAvailableNode();
      return await node.read(key);
      
      // May return stale data (eventual consistency)
      // But: Always available
    }
    
    async write(key: string, value: Data, options: { consistency: 'eventual' }): Promise<void> {
      // Write to any available node (AP)
      const node = this.getAvailableNode();
      await node.write(key, value);
      
      // Replicate asynchronously
      this.replicateAsync(key, value);
      
      // Returns immediately (AP)
      // Consistency achieved eventually
    }
  }
  
  // 6. Partition Handling
  async handlePartition(): Promise<void> {
    // Detect partition
    const partition = await this.detectPartition();
    
    if (partition) {
      // CP operations: Block until partition resolves
      this.cpStore.setUnavailable();
      
      // AP operations: Continue with eventual consistency
      this.apStore.continueWithEventualConsistency();
      
      // Monitor partition resolution
      await this.monitorPartitionResolution();
    }
  }
  
  // 7. Consistency Levels per Operation
  getConsistencyLevel(operation: string): 'strong' | 'eventual' {
    const cpOperations = ['transfer', 'payment', 'inventory'];
    const apOperations = ['profile', 'recommendation', 'cache'];
    
    if (cpOperations.includes(operation)) {
      return 'strong'; // CP
    } else {
      return 'eventual'; // AP
    }
  }
}

Features:

Hybrid approach: CP for critical, AP for non-critical
Tunable consistency: Different consistency levels per operation
Partition handling: CP blocks, AP continues
Operation-specific: Choose CAP per operation type
Monitoring: Track consistency, availability, partitions

CAP Theorem: In distributed systems, you can only guarantee two of Consistency, Availability, Partition tolerance
Partition tolerance: Usually required (partitions are inevitable), so choose CP or AP
CP systems: Strong consistency, unavailable during partitions (MongoDB, HBase)
AP systems: High availability, eventual consistency (Cassandra, DynamoDB)
CA systems: Not practical for distributed systems (partitions are inevitable)
Hybrid approach: Use CP for critical operations, AP for non-critical
Tunable consistency: Different consistency levels for different operations
Best practices: Understand requirements, choose appropriate CAP combination, handle partitions

What's next?

Practice this in System Design →Next topic →