Topic Overview
CAP Theorem
Master the CAP theorem: understand why you can only guarantee two of Consistency, Availability, and Partition tolerance. Learn real-world trade-offs and examples.
The CAP theorem states that in a distributed system, you can only guarantee two out of three properties: Consistency, Availability, and Partition tolerance. Understanding CAP is crucial for designing distributed systems.
What is CAP Theorem?
CAP Theorem (also called Brewer's theorem) states:
In a distributed system, you can only guarantee two out of three:
- Consistency: All nodes see the same data simultaneously
- Availability: System remains operational
- Partition tolerance: System continues despite network failures
Key insight: Network partitions are inevitable in distributed systems, so you must choose between Consistency and Availability when a partition occurs.
The Three Properties
Consistency (C)
Definition: All nodes see the same data at the same time.
Characteristics:
- Linearizability: All reads see the latest write
- Strong consistency: No stale data
- Synchronous replication: All replicas updated before response
Example:
Write: Node A → x = 5
Read: Node B → x = 5 (immediately, not stale)
Availability (A)
Definition: System remains operational and responds to requests.
Characteristics:
- No downtime: System always responds
- No errors: Requests don't fail (except network partitions)
- All nodes: Every non-failing node must respond
Example:
Request → System (always responds, even if data might be stale)
Partition Tolerance (P)
Definition: System continues operating despite network failures.
Characteristics:
- Network partitions: Nodes can't communicate
- Continues operating: System doesn't fail completely
- Inevitable: Network partitions will happen
Example:
Network split: Node A ↔ Node B (can't communicate)
System continues: Both nodes still respond
CAP Combinations
CA (Consistency + Availability)
Trade-off: No partition tolerance
Characteristics:
- Strong consistency
- High availability
- Cannot handle network partitions
Example:
- Single-node database
- Traditional RDBMS (without replication)
- Not practical for distributed systems (partitions are inevitable)
Why not practical:
- Network partitions are inevitable
- Distributed systems must handle partitions
- CA systems fail completely during partitions
CP (Consistency + Partition Tolerance)
Trade-off: Sacrifice availability during partitions
Characteristics:
- Strong consistency
- Handles network partitions
- Unavailable during partitions (until partition resolves)
Example:
- MongoDB (with strong consistency)
- HBase
- ZooKeeper
- Traditional RDBMS (with synchronous replication)
Behavior during partition:
Partition occurs:
- System maintains consistency
- Some nodes become unavailable
- System waits for partition to resolve
AP (Availability + Partition Tolerance)
Trade-off: Sacrifice consistency (eventual consistency)
Characteristics:
- High availability
- Handles network partitions
- Eventual consistency (may serve stale data)
Example:
- Cassandra
- DynamoDB
- CouchDB
- DNS
Behavior during partition:
Partition occurs:
- All nodes remain available
- May serve stale data
- Consistency achieved eventually (when partition resolves)
Real-World Examples
CP System: MongoDB
// MongoDB with strong consistency
// During partition: Some nodes unavailable
// Write with write concern
db.collection.insertOne(
{ data: "value" },
{ writeConcern: { w: "majority" } }
);
// Behavior:
// - Waits for majority of nodes to acknowledge
// - If partition: Some nodes unavailable
// - Trade-off: Consistency over availability
AP System: Cassandra
// Cassandra with eventual consistency
// During partition: All nodes available, may serve stale data
// Write with consistency level
const query = "INSERT INTO table (id, data) VALUES (1, 'value')";
await client.execute(query, [], { consistency: cassandra.types.consistencies.one });
// Behavior:
// - Writes to one node (fast)
// - Replicates asynchronously
// - During partition: All nodes available
// - Trade-off: Availability over consistency
CA System: Single-Node Database
-- Single-node PostgreSQL
-- No partition tolerance (single node)
INSERT INTO table (id, data) VALUES (1, 'value');
-- Always consistent and available
-- But: Not distributed, can't handle network partitions
CAP Theorem Misconceptions
Misconception 1: "You must choose one"
Reality: You choose which two to prioritize. Partition tolerance is usually required, so you choose between CP and AP.
Misconception 2: "CAP applies to all operations"
Reality: CAP applies during network partitions. When there's no partition, you can have all three.
Misconception 3: "AP means no consistency"
Reality: AP systems use eventual consistency. They become consistent when partition resolves.
Misconception 4: "CP means always unavailable"
Reality: CP systems are unavailable only during partitions. When no partition, they're available.
CAP in Practice
Choosing CP vs AP
Choose CP when:
- Strong consistency required: Financial transactions, inventory
- Can tolerate downtime: During partitions, some nodes unavailable
- Examples: Banking systems, inventory management
Choose AP when:
- High availability critical: Social media, content delivery
- Can tolerate eventual consistency: User profiles, recommendations
- Examples: Social networks, CDNs, DNS
Hybrid Approaches
Many systems use both CP and AP for different operations:
class HybridSystem {
// CP for critical operations
async transferMoney(from: Account, to: Account, amount: number) {
// Strong consistency required
await this.cpDatabase.transaction(async (tx) => {
await tx.debit(from, amount);
await tx.credit(to, amount);
});
}
// AP for non-critical operations
async updateProfile(userId: string, profile: Profile) {
// Eventual consistency acceptable
await this.apDatabase.update(userId, profile);
}
}
Common Pitfalls
- Ignoring partition tolerance: Assuming partitions won't happen. Fix: Always design for partitions
- Choosing CA: Not practical for distributed systems. Fix: Choose CP or AP
- Misunderstanding AP: Thinking AP means no consistency. Fix: AP uses eventual consistency
- Not considering use case: Choosing CP when AP is better (or vice versa). Fix: Understand requirements
- Assuming all operations same: Different operations may need different CAP. Fix: Use hybrid approach
- Not handling partitions: Not planning for partition scenarios. Fix: Design partition handling
Interview Questions
Beginner
Q: What is the CAP theorem? Explain the three properties.
A:
CAP Theorem states that in a distributed system, you can only guarantee two out of three properties:
-
Consistency (C): All nodes see the same data at the same time
- Strong consistency: All reads see latest write
- No stale data
-
Availability (A): System remains operational and responds to requests
- No downtime
- All non-failing nodes respond
-
Partition Tolerance (P): System continues operating despite network failures
- Handles network partitions
- Inevitable in distributed systems
Key insight: Network partitions are inevitable, so you must choose between Consistency and Availability during partitions.
Combinations:
- CA: Consistency + Availability (not practical for distributed systems)
- CP: Consistency + Partition tolerance (sacrifice availability)
- AP: Availability + Partition tolerance (sacrifice consistency)
Example:
- CP: MongoDB (strong consistency, unavailable during partitions)
- AP: Cassandra (high availability, eventual consistency)
Intermediate
Q: Explain the difference between CP and AP systems. When would you choose each?
A:
CP (Consistency + Partition Tolerance):
Characteristics:
- Strong consistency: All nodes see same data
- Handles network partitions
- Unavailable during partitions (until partition resolves)
Behavior during partition:
Partition: Node A ↔ Node B (can't communicate)
- System maintains consistency
- Some nodes become unavailable
- Waits for partition to resolve
Examples:
- MongoDB (with strong consistency)
- HBase
- ZooKeeper
- Traditional RDBMS (synchronous replication)
Choose CP when:
- Strong consistency required (financial transactions, inventory)
- Can tolerate downtime during partitions
- Data integrity critical
AP (Availability + Partition Tolerance):
Characteristics:
- High availability: System always responds
- Handles network partitions
- Eventual consistency (may serve stale data)
Behavior during partition:
Partition: Node A ↔ Node B (can't communicate)
- All nodes remain available
- May serve stale data
- Consistency achieved eventually (when partition resolves)
Examples:
- Cassandra
- DynamoDB
- CouchDB
- DNS
Choose AP when:
- High availability critical (social media, content delivery)
- Can tolerate eventual consistency (user profiles, recommendations)
- Downtime not acceptable
Decision Matrix:
- Financial transactions: CP (consistency critical)
- Social media: AP (availability critical)
- Inventory: CP (consistency critical)
- Content delivery: AP (availability critical)
Senior
Q: Design a distributed system that handles both CP and AP requirements. How do you implement different consistency levels for different operations?
A:
class HybridCAPSystem {
private cpStore: CPStore; // Strong consistency
private apStore: APStore; // Eventual consistency
private coordinator: Coordinator;
constructor() {
// CP store for critical operations
this.cpStore = new CPStore({
consistency: 'strong',
replication: 'synchronous'
});
// AP store for non-critical operations
this.apStore = new APStore({
consistency: 'eventual',
replication: 'asynchronous'
});
this.coordinator = new Coordinator();
}
// 1. CP Operations (Strong Consistency)
async transferMoney(from: Account, to: Account, amount: number): Promise<void> {
// Critical operation: Requires strong consistency
await this.cpStore.transaction(async (tx) => {
// Read with strong consistency
const fromBalance = await tx.read(from.id, { consistency: 'strong' });
const toBalance = await tx.read(to.id, { consistency: 'strong' });
// Validate
if (fromBalance < amount) {
throw new Error('Insufficient funds');
}
// Write with strong consistency
await tx.write(from.id, fromBalance - amount, { consistency: 'strong' });
await tx.write(to.id, toBalance + amount, { consistency: 'strong' });
// Wait for majority acknowledgment (CP)
await tx.commit({ w: 'majority' });
});
}
// 2. AP Operations (Eventual Consistency)
async updateProfile(userId: string, profile: Profile): Promise<void> {
// Non-critical operation: Eventual consistency acceptable
await this.apStore.write(userId, profile, {
consistency: 'eventual',
replication: 'asynchronous'
});
// Returns immediately (AP)
// Replicates asynchronously
}
// 3. Tunable Consistency
async readData(key: string, consistencyLevel: 'strong' | 'eventual'): Promise<Data> {
if (consistencyLevel === 'strong') {
// CP: Strong consistency, may be unavailable during partition
return await this.cpStore.read(key, { consistency: 'strong' });
} else {
// AP: Eventual consistency, always available
return await this.apStore.read(key, { consistency: 'eventual' });
}
}
// 4. CP Store Implementation
class CPStore {
async read(key: string, options: { consistency: 'strong' }): Promise<Data> {
// Read from majority of nodes
const nodes = this.getMajorityNodes();
const results = await Promise.all(nodes.map(node => node.read(key)));
// Return latest (by timestamp)
return this.getLatest(results);
}
async write(key: string, value: Data, options: { consistency: 'strong' }): Promise<void> {
// Write to majority of nodes
const nodes = this.getMajorityNodes();
await Promise.all(nodes.map(node => node.write(key, value)));
// Wait for acknowledgment (CP)
// If partition: Some nodes unavailable, operation blocks
}
}
// 5. AP Store Implementation
class APStore {
async read(key: string, options: { consistency: 'eventual' }): Promise<Data> {
// Read from any available node (AP)
const node = this.getAvailableNode();
return await node.read(key);
// May return stale data (eventual consistency)
// But: Always available
}
async write(key: string, value: Data, options: { consistency: 'eventual' }): Promise<void> {
// Write to any available node (AP)
const node = this.getAvailableNode();
await node.write(key, value);
// Replicate asynchronously
this.replicateAsync(key, value);
// Returns immediately (AP)
// Consistency achieved eventually
}
}
// 6. Partition Handling
async handlePartition(): Promise<void> {
// Detect partition
const partition = await this.detectPartition();
if (partition) {
// CP operations: Block until partition resolves
this.cpStore.setUnavailable();
// AP operations: Continue with eventual consistency
this.apStore.continueWithEventualConsistency();
// Monitor partition resolution
await this.monitorPartitionResolution();
}
}
// 7. Consistency Levels per Operation
getConsistencyLevel(operation: string): 'strong' | 'eventual' {
const cpOperations = ['transfer', 'payment', 'inventory'];
const apOperations = ['profile', 'recommendation', 'cache'];
if (cpOperations.includes(operation)) {
return 'strong'; // CP
} else {
return 'eventual'; // AP
}
}
}
Features:
- Hybrid approach: CP for critical, AP for non-critical
- Tunable consistency: Different consistency levels per operation
- Partition handling: CP blocks, AP continues
- Operation-specific: Choose CAP per operation type
- Monitoring: Track consistency, availability, partitions
Key Takeaways
- CAP Theorem: In distributed systems, you can only guarantee two of Consistency, Availability, Partition tolerance
- Partition tolerance: Usually required (partitions are inevitable), so choose CP or AP
- CP systems: Strong consistency, unavailable during partitions (MongoDB, HBase)
- AP systems: High availability, eventual consistency (Cassandra, DynamoDB)
- CA systems: Not practical for distributed systems (partitions are inevitable)
- Hybrid approach: Use CP for critical operations, AP for non-critical
- Tunable consistency: Different consistency levels for different operations
- Best practices: Understand requirements, choose appropriate CAP combination, handle partitions