Topic Overview

Partition Tolerance: Concepts, Trade-offs & Failure Modes

Learn how distributed systems handle network partitions and maintain availability.

Senior10 min read

Partition tolerance is the ability of a distributed system to continue operating despite network partitions that split the system into isolated groups.


What is a Network Partition?

A network partition occurs when network failures cause nodes to be unable to communicate, splitting the system into disconnected groups.

Example: Data center A can't reach data center B, but both continue operating independently.


CAP Theorem

You can only guarantee 2 of 3:

  • Consistency: All nodes see same data
  • Availability: System responds to requests
  • Partition tolerance: System continues despite partitions

During partition:

  • CP: Choose consistency (block writes, maintain consistency)
  • AP: Choose availability (allow writes, sacrifice consistency)
  • CA: Not possible in distributed systems (partitions will happen)

Handling Partitions

CP Systems (Consistency + Partition Tolerance)

Behavior: Block operations during partition to maintain consistency.

Example: Traditional databases with strong consistency.

1class CPSystem {
2 async write(data: any): Promise<void> {
3 // Check if we have quorum
4 if (!await this.hasQuorum()) {
5 throw new Error('No quorum, cannot write');
6 }
7
8 // Write to majority
9 await this.writeToMajority(data);
10 }
11
12 async hasQuorum(): Promise<boolean> {
13 const reachable

AP Systems (Availability + Partition Tolerance)

Behavior: Continue operating, accept eventual consistency.

Example: Dynamo, Cassandra, CouchDB.

1class APSystem {
2 async write(data: any): Promise<void> {
3 // Always allow writes (availability)
4 await this.localWrite(data);
5
6 // Replicate when partition heals
7 this.queueForReplication(data);
8 }
9
10 async read(key: string): Promise<Data> {
11 // Read from local (may be stale)
12 return await this.localRead(key);
13 }

Examples

Dynamo-style Partition Handling

1class PartitionTolerantStore {
2 async write(key: string, value: any): Promise<void> {
3 // Write to local and available nodes
4 const available = await this.getAvailableNodes();
5
6 // Write to N nodes (quorum)
7 const writeCount = Math.ceil(available.length / 2) + 1;
8 await this.writeToNodes(key, value, available.slice(0, writeCount));

Common Pitfalls

  • Assuming no partitions: Partitions will happen, must design for them
  • Choosing wrong CAP trade-off: Not matching system requirements
  • Not handling conflicts: AP systems need conflict resolution
  • Ignoring split-brain: Multiple leaders during partition
  • Not testing partitions: Must test partition scenarios

Interview Questions

Beginner

Q: What is partition tolerance in the context of CAP theorem?

A: Partition tolerance means the system continues operating despite network partitions that split nodes into isolated groups.

CAP theorem: You can only guarantee 2 of 3:

  • CP: Consistency + Partition tolerance (block during partition)
  • AP: Availability + Partition tolerance (continue, accept inconsistency)
  • CA: Not possible (partitions will happen in distributed systems)

Example: During partition, CP system blocks writes to maintain consistency. AP system allows writes but may have conflicts to resolve later.


Intermediate

Q: How does a system handle network partitions? Compare CP and AP approaches.

A:

CP Approach (Consistency + Partition Tolerance):

  • Behavior: Block operations if no quorum
  • Trade-off: Sacrifice availability for consistency
  • Use when: Strong consistency required (financial systems)
  • Example: Traditional SQL databases, Zookeeper

AP Approach (Availability + Partition Tolerance):

  • Behavior: Continue operating, allow writes
  • Trade-off: Sacrifice consistency for availability
  • Use when: High availability required (social media, content delivery)
  • Example: Dynamo, Cassandra, CouchDB

During partition:

  • CP: Minority partition blocks, majority continues
  • AP: Both partitions continue, resolve conflicts when partition heals

Senior

Q: Design a partition-tolerant distributed database. How do you handle writes during partitions, resolve conflicts, and ensure data consistency when partitions heal?

A:

Design: AP System with Conflict Resolution

1class PartitionTolerantDatabase {
2 private nodes: Node[] = [];
3 private quorum: number;
4
5 async write(key: string, value: any, vectorClock: VectorClock): Promise<void> {
6 // Always allow writes (availability)
7 const available = await this.getAvailableNodes();
8
9 if (available.length === 0) {
10 throw new Error('No nodes available');

Conflict Resolution Strategies:

  1. Last-write-wins: Simple, but may lose data
  2. Vector clocks: Detect concurrent writes, require application resolution
  3. CRDTs: Automatic conflict resolution for certain data types
  4. Application-level: Let application decide how to merge

  • Partition tolerance is required in distributed systems (partitions will happen)

  • CAP theorem: Choose 2 of 3 (CP or AP, not CA)

  • CP systems: Block during partition to maintain consistency

  • AP systems: Continue operating, resolve conflicts later

  • Conflict resolution: AP systems need strategies (LWW, vector clocks, CRDTs)

  • Quorum: Majority-based operations handle partitions

  • Design for partitions: Don't assume perfect network connectivity

  • Consensus Algorithms (Raft, Paxos) - Handling partitions in consensus

  • Leader Election - Elections during network partitions

  • Distributed Transactions - Transaction protocols and partitions

  • Clock Synchronization (NTP, Lamport) - Ordering events across partitions

  • Gossip Protocol - Information dissemination during partitions

Key Takeaways

Partition tolerance is required in distributed systems (partitions will happen)

CAP theorem: Choose 2 of 3 (CP or AP, not CA)

CP systems: Block during partition to maintain consistency

AP systems: Continue operating, resolve conflicts later

Conflict resolution: AP systems need strategies (LWW, vector clocks, CRDTs)

Quorum: Majority-based operations handle partitions

Design for partitions: Don't assume perfect network connectivity


About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.