← Back to practice catalog

System design interview guide

Design a Distributed Cache

TL;DR: You’re building an in-memory cluster that feels like Redis or Memcached at scale: sub-millisecond reads, horizontal add/remove of nodes, and explicit policies for eviction, replication, and what “consistent” means when failures and partitions happen. The interview is less about memorizing RDB/AOF flags and more about **partitioning**, **hot keys**, and **failure** behavior.

Problem statement

You’re designing a distributed in-memory cache (Redis/Memcached-class): get/set with optional TTL, eviction when memory is full, atomic operations for counters, and horizontal scale across many servers with high availability.

Product constraints. Functionally: key-value API, TTL, eviction policies, optional persistence snapshots for warm restart. Non-functionally: millions of ops/s, sub-millisecond p99 for hot keys, 99.9%+ availability. Scale: large working set (e.g. 100 GB+ aggregate), hundreds of millions of keys, skewed access.

Interview arc: partitioning → routing → replication → eviction → failure. Admit eventual consistency unless you scope something stronger.

Introduction

A distributed cache is fast mutable state spread across machines. The interview is not “name seven AWS services”—it is how clients find the right shard, what happens when a node disappears, how replicas stay good enough without claiming linearizability everywhere, and how eviction interacts with tail latency when memory is tight.

Weak answers treat the cache as a black box that “makes things faster.” Strong answers separate metadata (routing, membership) from data plane (GET/SET on primaries), and name hot key failure modes.

How to approach

Start with single-node semantics: eviction policy, TTL, single-threaded vs multi-threaded IO. Then shard with consistent hashing and virtual nodes. Add replication (read scaling + failover). Finish with cache-aside invalidation and thundering herd. Order beats listing features.

Interview tips

  • Virtual nodes: Without them, one physical node can own an uneven slice of the hash ring—load imbalance shows up in production as one server melting while others nap.
  • Replication lag: If the app does read-after-write to the same key through different replicas, it may see stale values—either read from primary, sticky sessions, or version checks.
  • Invalidation: “We invalidate on update” sounds easy until many cache keys map to one DB row—pattern invalidation or TTL as safety net.
  • Thundering herd: Popular key expires → thousands of misses hit the DB at once—singleflight, probabilistic early refresh, or mutex per key in app layer.
  • Persistence: RDB/AOF fork latency on large heaps—name it if you mention durability; it is a classic p99 story.

Capacity estimation

TopicAnchorImplication
Working set100 GB aggregate (prompt)Eviction pressure; hot vs cold key skew
QPS10M ops/sPer-node ceiling; pipelining; connection limits
Value size~1 KB averageNetwork bandwidth; large values need compression or chunking
Hot keysTop 0.01% keysReplica reads, application-level sharding, or in-process L2

Implications: You shard to spread RAM and CPU; you do not solve hot keys only by “adding nodes” if one logical key dominates.

High-level architecture

Clients either embed a smart client (knows hash ring) or talk to a proxy (stateless routing). The cluster exposes logical keyspace partitioned into slots or token ranges, each owned by a primary with 1+ replicas. Gossip or a configuration service (etcd-style) publishes membership. Persistence (if any) is async snapshots or append logs—off the critical read path for most designs.

Who owns what:

  • Routing layer — Hash(key) → node; handles MOVED/ASK-style redirects when topology changes (Redis Cluster pattern).
  • Shard primary — Serializes writes per key (common pattern: single-threaded primary); serves reads or forwards to replica depending on consistency mode.
  • Replicas — Async copy from primary; promoted on failover after health checks + epoch bump.
  • Control plane — Add/remove nodes, rebalance throttles, placement of primaries across racks/AZs.
[ App servers ]
       |
       |  GET/SET (sync, sub-ms target)
       v
[ Smart client or Proxy ]  ----membership----► [ Config / gossip ]
       |
       |  consistent hash: key → shard
       v
+------------------+     replication      +------------------+
| Primary A        | ------------------► | Replica A'         |
| (token range T1) |                     | (async, read opt)  |
+------------------+                     +------------------+
+------------------+                     +------------------+
| Primary B        | ----------------►  | Replica B'       |
| (token range T2) |                     |                  |
+------------------+                     +------------------+

  Rebalance (async): new node C → migrate token ranges → throttle → HANDOFF

In the room: Walk one GET: hash → node → memory lookup → optional replica. Then one failover: primary dead → replica promoted → clients refresh topology.

Core design approaches

Cache-aside (lazy loading)

App checks cache → on miss, loads DB, sets cache. Simple; invalidation is app’s job when DB updates.

Read-through / write-through

Cache module loads DB on miss (read-through); writes go to cache and DB together (write-through). Stronger consistency; higher write latency.

Write-behind

Writes ack after cache update; async flush to DB—fast but loss risk if cache dies before persist—rare for generic interview unless scoped.

Pick: Most candidates describe cache-aside for read-heavy workloads; mention write-through when stale reads are unacceptable after writes.

Detailed design

Read path

  1. Client computes shard (or asks proxy).
  2. Primary or replica serves GET if stale read OK.
  3. On miss (cache-aside), app loads DB and SET with TTL.

Write path (cache-aside invalidation)

  1. App commits DB update.
  2. App DELETE cache key(s) or bump version embedded in value.
  3. Next read repopulates—avoid stale by delete vs update carefully (race: read old DB, overwrite new cache—ordering matters).

Atomic ops

INCR, CAS—used for counters and locks. Single primary per key makes this tractable; cross-shard transactions are out of scope for “cache.”

Key challenges

  • Rebalancing: Moving keys without dual ownership bugs (serving wrong value during migration)—handoff protocols and throttled migrations matter.
  • Hot keys: One key → one primary thread or one NICYCSB-style skew burns you in benchmarks and production.
  • Replication lag: “I wrote then read” flakiness—apps must know which replica they hit.
  • Eviction thrash: Memory at max + churning working set → constant eviction + miss storm.
  • Split brain: Two primaries after partition—mitigate with quorum, fencing, epoch—at cost of complexity.

Scaling the system

  • Horizontal: Add nodes; only adjacent key ranges migrate (consistent hashing).
  • Vertical: Bigger RAM for hot shards—sometimes cheaper than perfect algorithmic fairness.
  • Read scaling: Replica reads for read-heavy, eventually consistent keys; primary for linearizable per-key sequence if required.
  • Multi-region: Active-active cache is hard (conflicting writes)—often read replicas per region with async replication and stale SLAs.

Failure handling

FailureUser-visibleMitigation
Primary downErrors or failover latencyHealth checks, automatic promotion, client retry with backoff
Replica lag spikeStale readsRoute critical reads to primary; monotonic reads per session if needed
Network partitionSplit brain riskQuorum; minority partition does not accept writes
Node add during peakLatency during migrationThrottle migration; off-peak rebalance

Degraded UX: Slightly stale data; outage is unavailable cache—often degrade to DB with timeouts (slower, not wrong).

API design

Caches expose binary protocols (RESP) or thin REST. Interview clarity matters more than opcode trivia.

Core operations:

OpSemantics
GET keyReturn value or miss
SET key value [EX ttl]Upsert; optional TTL
DEL keyInvalidate
INCR keyAtomic counter
MGET keys[]Batch to reduce RTT

Cluster routing:

Client viewBehavior
Hash ring in SDKClient sends directly to node; handles redirects
ProxyClients talk to one endpoint; proxy forwards

Error semantics: Miss is not an HTTP 404—it is a cache miss; apps distinguish miss vs error (connection reset).

Request flow diagram (GET):

App --GET key--> Client lib --hash--> Primary shard
                              |
                              hit --> value
                              miss (cache-aside) --> DB --> SET --> value

Production angles

Caches are not “faster databases”—they are coordination surfaces where topology, TTLs, and failure fallbacks interact. Veteran operators worry less about hit ratio in steady state and more about what happens the minute a hot key expires, a cluster reshuffles, or a client still points at a dead primary.

p99 latency jumps right after adding capacity

What it looks like — Rolling add of nodes to “help” load; instead p99 GET latency jumps for hours. Dashboards show rebalance traffic saturating NICs or disk IOPS if persisted; some shards go hot because token ranges shifted under live traffic.

Why it happens — Consistent-hashing migrations move masses of keys at once; replicas catch up slowly; clients may still send to old owners until redirect storms finish. Adding nodes without throttling migration is how you create an outage during a scale-up.

What good teams do — Throttle migration Mbps and keys per second; schedule adds in low-traffic windows when possible; pre-warm replicas before cutting over traffic; verify client library behavior on MOVED/ASK. Alert on rebalance progress and per-shard imbalance, not only cluster CPU.

Database load spikes while cache “healthy”

What it looks like — Miss ratio ticks up briefly; DB connections and CPU spike; cascading timeouts in services that fall back to origin. Redis memory and ops/sec look fine—the pain moved downstream.

Why it happens — Thundering herd on one popular key at TTL expiry; aligned TTLs on the minute boundary; no request coalescing (singleflight). Cache-aside without stampede control is a load generator for the database.

What good teams do — TTL jitter; singleflight per key in the app tier; per-key locks or probabilistic early refresh; stale-while-revalidate where the business allows. Correlate eviction rate with miss ratio and DB QPS—the triangle of pain.

Inconsistent reads after a deploy or failover

What it looks like — Users see different values for the same key for seconds to minutes; rare writes appear lost then reappear. SRE correlates with client rollout or replica promotion.

Why it happens — Client routes to stale topology; replica not caught up after failover; read-your-writes not guaranteed unless designed. Split-brain windows exist in every distributed system—the question is duration and blast radius.

What good teams do — Versioned cluster maps with short TTL and push updates; health-gated reads on replication lag; monotonic read token where the store supports it; explicit strong-read API for money paths. Document what your cache does not promise.

Backpressure: cache saturates, timeouts trigger DB stampede

What it looks like — GET latency rises; apps time out and hit the DB directly (or worse, each retry spawns new backend work); the DB protects the cache that was supposed to protect the DB.

Why it happens — Hot keys or slow commands on single-threaded Redis; connection pool exhaustion to cache; cascading fallback without bulkheads.

What good teams do — Circuit-break DB fallback when cache is unhealthy—or serve stale with a flag; shed load at the gateway; alert on eviction rate plus miss ratio correlation. Never allow unbounded parallel DB refill for one key.

[ Apps spike GETs ] → [ Cache primaries at CPU max ]
       → latency ↑, timeouts
       → some clients fall back to DB → DB overload → worse cache refill

How to use this in an interview — Lead with hot keys and thundering herd—not “we use Redis.” Close with one mitigation (jitter, singleflight, micro-cache) and what degrades first under load.

Bottlenecks and tradeoffs

  • Memory vs hit ratio: Bigger cache buys fewer DB hits—diminishing returns past a point.
  • Strong consistency vs speed: Cross-shard transactions are the wrong tool—use DB as truth + short TTL if needed.
  • Operational simplicity: Managed Redis vs self-operated cluster—trade money vs on-call surface.

What interviewers expect

  • API surface: get/set, TTL, atomic INCR, optional compare-and-swap—tie each to use case (session, feature flag, rate limit backing store).
  • Partitioning: consistent hashing, virtual nodes, what moves when a node joins (only adjacent token ranges).
  • Replication: primary/replica, async replication lag, failover without split brain (epochs, quorum—high level).
  • Eviction: LRU/LFU approximations; volatile vs allkeys; interaction with TTL.
  • Client routing: smart client vs proxy vs cluster gossip; discovery of topology.
  • Cache-aside vs write-through: who invalidates on DB update; thundering herd after expiry.
  • Failure: node loss, rebalance, hinted handoff; read during migration.
  • CAP in plain English: under partition, prefer available reads with stale data for most cache workloads—say when you would not.

Interview workflow (template)

  1. Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
  2. Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
  3. APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
  4. Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
  5. Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
  6. Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).

Frequently asked follow-ups

  • How does consistent hashing work when a node is added?
  • What happens to hot keys?
  • How do you handle cache invalidation with a database behind the cache?
  • Strong vs eventual consistency—which do you guarantee for what operations?
  • How is this different from a distributed database?

Deep-dive questions and strong answer outlines

Walk through a GET when keys are sharded across 100 nodes.

Hash key to token; map token to node (and replica); route request—client-side ring or proxy. Mention single-flight or hot key replication if one key dominates a shard.

How do you add a node without moving every key?

Consistent hashing moves only keys in the wedge near the new token range; virtual nodes smooth load. Contrast with naive mod-N hash where all keys move when N changes.

Cache-aside vs write-through?

Cache-aside: app loads DB on miss, fills cache. Write-through: writes go to cache and backing store together—stronger consistency, more write latency. Invalidation: update DB then delete cache key (or version stamp) for common patterns.

What if a primary crashes mid-replication?

Failover to replica with epoch or quorum to avoid split brain; accept lost in-flight writes depending on sync policy. Durability vs latency tradeoff must be explicit.

Production angles

  • Rebalance storm after adding nodes—throttle migrations, use **handoff** to avoid dual-serving wrong values.
  • Memory pressure triggers eviction thrashing—monitor **evicted_keys** and **latency**.
  • Split brain after network glitch—use **fencing tokens** or quorum if you promise linearizable writes (rare for pure cache).

AI feedback on your design

After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.

FAQs

Q: Should I design Redis protocol-compatible?

A: Only if it helps you reason about RESP and tooling. Interviewers care about behavior (single-threaded primary per shard, pipelining) not opcode trivia—unless they go deep.

Q: Is strong consistency possible for every key?

A: Expensive (cross-shard transactions, consensus). Most caches are eventually consistent with best-effort invalidation; say what your app tolerates.

Q: When should I pick Memcached over Redis?

A: Memcached is simpler pure cache; Redis adds data structures, persistence options, and Lua. If you only need get/set with TTL at huge scale, Memcached can win on simplicity—Redis wins when you need more than key-value.

Q: How do I talk about CAP?

A: Briefly: under partition, availability of reads vs stale data. Caches usually favor availability and accept staleness; not the place for two-phase commit unless they steer you.

Practice interactively

Open the practice session to use the canvas and stages, then review AI feedback.