← Back to practice catalog

System design interview guide

Distributed Cache System Design

TL;DR:

Flash sale starts and one product id owns a Redis shard—CPU looks fine while the database melts. The interview is consistent hashing, eviction, replication lag, and cache-aside invalidation, not memorizing RDB flags.

Problem statement

Distributed in-memory cache: partition, replicate, evict, survive failures.

Introduction

A flash sale starts. One product id is in every cart. Your cache cluster shows healthy CPU—but the database catches fire and checkout fails for everyone.

A distributed cache is fast mutable state spread across machines. The interview is not “Redis vs Memcached trivia.” It is how clients find the right shard, what moves when a node joins, what replicas promise (and do not), and what happens when a hot key or TTL expiry turns the cache into a load generator for the database.

Weak answers say “we add Redis.” Strong answers separate routing (hash ring, virtual nodes) from data plane (GET/SET on primaries) and name cache-aside, invalidation, and thundering herd.

If you remember one thing: A cache protects the DB only if you design routing, eviction, and miss storms—not just hit ratio.

How to approach

Start with one machine: TTL, eviction (LRU/LFU), memory limit.

Add sharding: consistent hashing + virtual nodes so load stays even.

Add replication: async copy, failover, stale read rules.

End with cache-aside: who loads on miss, who deletes on DB update, how you stop herd on expiry.

In the room: “Single-node semantics, then hash ring, then replica lag, then cache-aside invalidation and stampede control.”

If you remember one thing: Order beats feature lists—routing before persistence flags.

Interview tips

Five questions that expose shallow cache answers.

Modulo hashing

You: “We hash the key mod number of servers.”

They ask: “You add one server—what happens to every key?”

Land here: Almost all keys move—cache storm + DB spike. Consistent hashing moves only keys in the wedge near the new token; virtual nodes balance load across physical machines.

Invalidate on update

You: “When the DB updates, we invalidate the cache.”

They ask: “One product row maps to fifty cache keys—how?”

Land here: Delete affected keys (or version stamp in value); pattern invalidation is hard at scale—TTL as safety net; document eventual staleness window.

Read-after-write

You: “We read from any replica for speed.”

They ask: “User saves profile, refreshes, sees old name—why?”

Land here: Replication lag. Critical keys: read primary, sticky routing, or version check in value; say what is OK to be stale (display name) vs not (balance).

Thundering herd

You: “TTL keeps data fresh.”

They ask: “Viral key expires—what hits the database?”

Land here: Thousands of simultaneous misses. Singleflight per key, TTL jitter, probabilistic early refresh, stale-while-revalidate where product allows.

Cache vs database

You: “The cache is our source of truth.”

They ask: “Primary crashes—what did we lose?”

Land here: Cache is ephemeral for most workloads; DB is truth. Persistence (RDB/AOF) trades durability for fork latency on big heaps—name p99 impact if you mention it.

If you remember one thing: Tie every tip to one failure mode—rebalance storm, stale replica, herd, split brain.

Capacity estimation

TopicAnchorImplication
Working set100 GB aggregate (illustrative)Eviction pressure; hot vs cold key skew
QPS10M ops/sPer-node ceiling; pipelining; connection limits
Value size~1 KB averageNetwork bandwidth; large values need compression or chunking
Hot keysTop 0.01% keysReplica reads, application-level sharding, or in-process L2

Implications: You shard to spread RAM and CPU; you do not solve hot keys only by “adding nodes” if one logical key dominates.

If you remember one thing: Hot keys are a logical problem—sharding spreads averages, not one viral id.

High-level architecture

Clients either embed a smart client (knows hash ring) or talk to a proxy (stateless routing). The cluster exposes logical keyspace partitioned into slots or token ranges, each owned by a primary with 1+ replicas. Gossip or a configuration service (etcd-style) publishes membership. Persistence (if any) is async snapshots or append logs—off the critical read path for most designs.

Who owns what:

  • Routing layer — Hash(key) → node; handles MOVED/ASK-style redirects when topology changes (Redis Cluster pattern).
  • Shard primary — Serializes writes per key (common pattern: single-threaded primary); serves reads or forwards to replica depending on consistency mode.
  • Replicas — Async copy from primary; promoted on failover after health checks + epoch bump.
  • Control plane — Add/remove nodes, rebalance throttles, placement of primaries across racks/AZs.
[ App servers ]
       |
       |  GET/SET (sync, sub-ms target)
       v
[ Smart client or Proxy ]  ----membership----► [ Config / gossip ]
       |
       |  consistent hash: key → shard
       v
+------------------+     replication      +------------------+
| Primary A        | ------------------► | Replica A'         |
| (token range T1) |                     | (async, read opt)  |
+------------------+                     +------------------+
+------------------+                     +------------------+
| Primary B        | ----------------►  | Replica B'       |
| (token range T2) |                     |                  |
+------------------+                     +------------------+

  Rebalance (async): new node C → migrate token ranges → throttle → HANDOFF

In the room: Walk one GET: hash → node → memory lookup → optional replica. Then one failover: primary dead → replica promoted → clients refresh topology.

If you remember one thing: GET is hash → primary (or replica with stale rules); rebalance is async and throttled.

Core design approaches

Cache-aside (lazy loading)

App checks cache → on miss, loads DB, sets cache. Simple; invalidation is app’s job when DB updates.

Read-through / write-through

Cache module loads DB on miss (read-through); writes go to cache and DB together (write-through). Stronger consistency; higher write latency.

Write-behind

Writes ack after cache update; async flush to DB—fast but loss risk if cache dies before persist—rare for generic interview unless scoped.

Pick: Most candidates describe cache-aside for read-heavy workloads; mention write-through when stale reads are unacceptable after writes.

If you remember one thing: Cache-aside = app owns invalidation; write-through = stronger consistency, slower writes.

Detailed design

Read path

  1. Client computes shard (or asks proxy).
  2. Primary or replica serves GET if stale read OK.
  3. On miss (cache-aside), app loads DB and SET with TTL.

Write path (cache-aside invalidation)

  1. App commits DB update.
  2. App DELETE cache key(s) or bump version embedded in value.
  3. Next read repopulates—avoid stale by delete vs update carefully (race: read old DB, overwrite new cache—ordering matters).

Atomic ops

INCR, CAS—used for counters and locks. Single primary per key makes this tractable; cross-shard transactions are out of scope for “cache.”

If you remember one thing: On DB update, delete cache keys (or bump version)—order matters to avoid writing stale data back.

Key challenges

  • Rebalancing: Moving keys without dual ownership bugs (serving wrong value during migration)—handoff protocols and throttled migrations matter.
  • Hot keys: One key → one primary thread or one NICYCSB-style skew burns you in benchmarks and production.
  • Replication lag: “I wrote then read” flakiness—apps must know which replica they hit.
  • Eviction thrash: Memory at max + churning working set → constant eviction + miss storm.
  • Split brain: Two primaries after partition—mitigate with quorum, fencing, epoch—at cost of complexity.

If you remember one thing: Rebalance and hot keys hurt more in production than picking LRU vs LFU on day one.

Scaling the system

  • Horizontal: Add nodes; only adjacent key ranges migrate (consistent hashing).
  • Vertical: Bigger RAM for hot shards—sometimes cheaper than perfect algorithmic fairness.
  • Read scaling: Replica reads for read-heavy, eventually consistent keys; primary for linearizable per-key sequence if required.
  • Multi-region: Active-active cache is hard (conflicting writes)—often read replicas per region with async replication and stale SLAs.

If you remember one thing: Replica reads scale throughput; primary wins when stale is unacceptable.

Failure handling

FailureUser-visibleMitigation
Primary downErrors or failover latencyHealth checks, automatic promotion, client retry with backoff
Replica lag spikeStale readsRoute critical reads to primary; monotonic reads per session if needed
Network partitionSplit brain riskQuorum; minority partition does not accept writes
Node add during peakLatency during migrationThrottle migration; off-peak rebalance

Degraded UX: Slightly stale data; outage is unavailable cache—often degrade to DB with timeouts (slower, not wrong).

If you remember one thing: Under partition, pick availability + stale for most cache workloads—unless they scope money paths.

API design

Caches expose binary protocols (RESP) or thin REST. Interview clarity matters more than opcode trivia.

Core operations:

OpSemantics
GET keyReturn value or miss
SET key value [EX ttl]Upsert; optional TTL
DEL keyInvalidate
INCR keyAtomic counter
MGET keys[]Batch to reduce RTT

Cluster routing:

Client viewBehavior
Hash ring in SDKClient sends directly to node; handles redirects
ProxyClients talk to one endpoint; proxy forwards

Error semantics: Miss is not an HTTP 404—it is a cache miss; apps distinguish miss vs error (connection reset).

Request flow diagram (GET):

App --GET key--> Client lib --hash--> Primary shard
                              |
                              hit --> value
                              miss (cache-aside) --> DB --> SET --> value

If you remember one thing: A miss is not HTTP 404—it triggers cache-aside load unless you design read-through.

Production angles

Caches are not “faster databases”—they are coordination surfaces where topology, TTLs, and failure fallbacks interact. Veteran operators worry less about hit ratio in steady state and more about what happens the minute a hot key expires, a cluster reshuffles, or a client still points at a dead primary.

p99 latency jumps right after adding capacity

What users saw — You scaled up to “fix” latency; scroll and API got slower for hours.

Why — Rebalance moved huge key ranges at once; NICs saturated; clients still talked to old owners; replicas lagged.

What good teams doThrottle migration; off-peak adds; pre-warm replicas; alert per-shard imbalance, not average CPU.

Database load spikes while cache “healthy”

What users saw — Brief errors; DB connections maxed; cache dashboards still green.

WhyThundering herd on one key at TTL expiry; aligned TTLs; no singleflight.

What good teams doTTL jitter; singleflight in app tier; stale-while-revalidate; correlate evicted_keys, miss ratio, DB QPS.

Inconsistent reads after failover

What users saw — Two tabs show different values; “lost” write reappears after refresh.

Why — Stale topology in clients; replica not caught up; split-brain window.

What good teams do — Push topology updates; read primary for money paths; epoch on failover; document no read-your-writes unless built.

Cache slow → DB stampede

What users saw — Timeouts everywhere; DB overload while team blames “cache cluster up.”

Why — Hot key on single-threaded primary; pool exhaustion; unbounded parallel DB refill on miss.

What good teams do — Circuit-break DB fallback; bulkhead; cap concurrent refill per key; never N identical DB queries for one expiry.

[ Apps spike GETs ] → [ Cache primaries at CPU max ]
       → latency ↑, timeouts
       → some clients fall back to DB → DB overload → worse cache refill

How to use this in an interview — Lead with hot key or herd on expiry. Name singleflight or jitter and what fails first: cache p99 then DB.

Bottlenecks and tradeoffs

Memory vs hit ratio

The tension — Bigger cluster means fewer DB hits; RAM is not free.

What breaks — Diminishing returns; eviction thrash when working set exceeds RAM.

What teams do — Track evicted_keys and tail latency; tier hot data; right-size per shard.

Say in the interview — Cache size is a cost decision with an SLO, not “as big as possible.”

Strong consistency vs cache speed

The tension — Apps want read-your-writes; caches favor speed and partition tolerance.

What breaks — Cross-shard transactions on a cache are the wrong tool.

What teams do — DB as truth; delete on write; short TTL; primary read for critical keys.

Say in the interview — Most caches are eventually stale—say what your app tolerates.

Rebalance vs availability

The tension — Adding nodes should help capacity; migrating keys risks wrong values and latency.

What breaks — Dual ownership during move; redirect storms.

What teams doHandoff protocol; throttled migration; hinted handoff for reads during move.

Say in the interview — Adding a node is an ops event, not a free horizontal scale button.

If you remember one thing: Caches fail open to the DB in the worst way unless you engineer miss behavior.

What should stick

You do not need every Redis opcode. You should leave knowing:

  1. Consistent hashing + virtual nodes — Adding a server moves a wedge of keys, not the whole keyspace.
  2. Cache-aside — App loads on miss; delete (or version) on DB update; TTL as safety net.
  3. Replication lag — Replicas scale reads; primary when stale is unacceptable.
  4. Thundering herd — One expiry → many DB hits; jitter + singleflight are mandatory patterns.
  5. Hot keys — One logical key can saturate one shard; replication or app-level sharding of the key.

Tell it in the room: “Client hashes key to primary via consistent hashing with vnodes. GET hits memory or miss → DB → SET with TTL. On DB write, DELETE cache key. Replica for read-heavy stale-OK data; primary for read-after-write. Viral key: singleflight on miss, TTL jitter. Node add: throttled rebalance with handoff, not big-bang migration.”

Reference diagram

High-level diagram for Distributed Cache System Design

What interviewers expect

Consistent hashing, cache-aside vs write-through, eviction policy, failover, hot key mitigation.

Interview workflow (template)

  1. Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
  2. Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
  3. APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
  4. Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
  5. Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
  6. Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).

Frequently asked follow-ups

  • Consistent hashing add node?
  • Hot keys?
  • Invalidation?
  • Strong consistency?
  • vs Redis?

Deep-dive questions and strong answer outlines

GET path sharded?

Hash key → token → node; client ring or proxy; replica read optional; single-flight on miss.

Add node?

Consistent hashing moves wedge keys only; virtual nodes balance; rebalance in background.

Cache-aside?

App loads DB on miss; write DB then delete cache key; thundering herd use lock or early refresh.

Primary crash?

Failover replica with epoch/quorum; accept lost in-flight per sync policy.

Hot key?

Replicate hot key to multiple nodes; local in-process cache; split key sub-ranges.

AI feedback on your design

After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.

FAQs

Q: Redis protocol?

A: Behavior matters more than opcode trivia.

Q: Strong everywhere?

A: Usually eventual; per-op quorum if pushed.

Q: Memcached vs Redis?

A: Pure cache vs data structures + persistence options.

Q: CAP?

A: Caches favor availability + stale reads under partition.

Practice interactively

Open the practice session to use the canvas and stages, then review AI feedback.