Backend Topic

Caching Strategies (Backend Fundamentals)

Learn caching basics: read-through vs write-through, TTLs, invalidation, and when caches help or hurt.

January 23, 202532 min read

Caching Strategies

Why Engineers Care About This

Caching is the art of remembering expensive computations so you don't have to do them again. When done right, caching makes systems fast. When done wrong, caching makes systems inconsistent, complex, and hard to debug. Most performance problems can be solved with caching, but most caching problems come from not understanding when and how to cache.

When your API is slow, or your database is overloaded, or users see stale data, you're hitting caching problems (or missing caching opportunities). These problems compound. A slow API gets more requests (users retry), which makes it slower, which gets more requests. Caching breaks this cycle by serving responses from memory instead of recomputing them. But caching adds complexity—you must decide what to cache, when to invalidate, and how to handle cache misses.

In interviews, when someone asks "How would you improve this system's performance?", they're really asking: "Do you understand caching? Do you know when to cache, what to cache, and how to invalidate caches?" Most engineers don't. They add caching without understanding trade-offs, or avoid caching because it's "too complex," missing performance gains.

For a full distributed-cache interview walkthrough—sharding, eviction, hot keys, and replication—see the Distributed Cache system design guide. For a production stampede story, see The Cache Stampede That Took Down Our API.

Core Intuitions You Must Build

Caching is about trade-offs, not free performance. Caching improves performance but adds complexity. You must decide what to cache (frequently accessed data), when to invalidate (when data changes), and how to handle cache misses (recompute or return stale data). Also, caches use memory, which costs money. Don't cache everything—cache the things that provide the most benefit (hot data, expensive computations).
Cache invalidation is the hard part. Deciding what to cache is easy. Deciding when to invalidate is hard. If you invalidate too early, you waste cache space. If you invalidate too late, users see stale data. Common strategies: time-based expiration (TTL), event-based invalidation (invalidate when data changes), and version-based invalidation (include version in cache key). Each has trade-offs. Time-based is simple but can serve stale data. Event-based is accurate but requires tracking dependencies.
Different cache layers solve different problems. Browser cache (for static assets), CDN cache (for global content), application cache (for computed results), database cache (for query results). Each layer has different characteristics (size, latency, invalidation). Use multiple layers—browser cache for static assets (long TTL), CDN for global content (medium TTL), application cache for computed results (short TTL or event-based invalidation).
Cache consistency is about guarantees, not perfection. Perfect cache consistency (cache always matches source) is expensive and often unnecessary. Eventual consistency (cache will match source eventually) is often sufficient. For read-heavy workloads, serving slightly stale data (seconds old) is acceptable if it improves performance. For write-heavy workloads, you might need stronger consistency (invalidate immediately on writes). Understand your consistency requirements and choose caching strategies accordingly.
Read-through, cache-aside, and write-through are different contracts. Cache-aside puts the application in charge of reads and writes to the cache. Read-through delegates cache population to the cache layer on a miss. Write-through updates cache and source together on writes. Write-behind writes to cache first and source asynchronously. The pattern you pick changes who owns consistency and what happens on failure.
Cache keys matter more than you think. Cache keys determine what gets cached and when it gets invalidated. Good cache keys are specific (include all parameters that affect the result) but not too specific (avoid cache fragmentation). Include user ID in keys for user-specific data, but don't include timestamps unless necessary (they prevent caching). Also, cache keys should be deterministic (same input = same key) so you can predict cache hits.

Pattern comparison at a glance

Five read/write patterns differ mainly in who populates the cache, write latency, and consistency:

Pattern	Who fills cache on miss?	Write path	Consistency	Pick when
Cache-aside	Application	App writes DB; cache updated separately	Eventual unless you invalidate	Default for most APIs; you control logic
Read-through	Cache layer	Same as cache-aside unless paired with write-through	Cache layer can enforce single-flight	You want simpler app code on reads
Write-through	Application or cache	Cache + DB updated together	Stronger on writes	Sessions, inventory counts that must not drift
Write-behind	Application	Cache first; DB async	Fastest writes; risk of loss	Analytics buffers, non-critical counters
Refresh-ahead	Background job	Proactive refresh before TTL	Hides expiry spikes	Hot keys with predictable access

Five patterns differ mainly in who populates the cache on a miss and how writes stay consistent:

Comparison of cache-aside, read-through, write-through, write-behind, and refresh-ahead caching patterns

Multi-layer request path

A typical read hits several caches before the origin. Each layer has different TTL and invalidation rules:

Multi-layer cache path from browser through CDN and application Redis to the database

[ Browser cache ] → [ CDN edge ] → [ API / app cache (Redis) ] → [ Database ]
       ↑                  ↑                    ↑
  Cache-Control      purge / TTL          invalidation / TTL

Subtopics (Taught Through Real Scenarios)

Cache Invalidation Strategies

What people usually get wrong:

Engineers often think "just set a TTL and forget about it." But TTL-based invalidation can serve stale data (data changes but cache hasn't expired) or waste cache space (data doesn't change but cache expires). Event-based invalidation (invalidate when data changes) is more accurate but requires tracking dependencies. Version-based invalidation (include version in cache key) is simple and effective—when data changes, version changes, old cache entries become unused.

How this breaks systems in the real world:

An API cached user profiles with a 1-hour TTL. A user updated their profile. For up to 1 hour, API responses showed the old profile (cache hadn't expired). Users complained about stale data. The fix? Use event-based invalidation—when a user updates their profile, invalidate their cache entry immediately. For catalog data that rarely changes, TTL is fine; for user-edited fields, TTL alone is a bug. But the real lesson is: invalidation strategy must match how often data actually changes—not how simple the code is.

What interviewers are really listening for:

They want to hear you talk about different invalidation strategies (TTL, event-based, version-based) and their trade-offs. Junior engineers say "just set a TTL." Senior engineers say "TTL is simple but can serve stale data, event-based is accurate but requires dependency tracking, version-based is simple and effective when you can bump a version on write." They're testing whether you understand that cache invalidation is a trade-off between simplicity and accuracy.

Read-Through vs Cache-Aside vs Write-Through

What people usually get wrong:

Engineers treat all caching as cache-aside—app checks Redis, on miss queries the database, then sets Redis. Read-through moves miss handling into the cache client or library: the app asks the cache; the cache fetches from DB on miss. Write-through updates cache and database in one logical write. Teams pick cache-aside by default even when read-through would centralize stampede protection, or write-through when inventory must never show stale stock.

How this breaks systems in the real world:

A checkout service used cache-aside for product inventory. Two concurrent purchases both missed cache after a flash sale key expired. Both read "5 left" from the database and both sold the last units—oversell. The fix was write-through on inventory updates (cache and DB updated together) plus short TTL as a backstop. For read-heavy product descriptions, cache-aside with event invalidation stayed fine. But the real lesson is: pattern choice follows consistency requirements—inventory is not the same problem as a blog post.

What interviewers are really listening for:

They want you to name who owns the miss path and the write path. Junior engineers say "just use cache-aside." Senior engineers say "cache-aside is flexible but the app must handle races; read-through simplifies reads; write-through costs write latency but keeps cache and DB aligned—pick per entity, not per service."

Multi-Layer Caching

What people usually get wrong:

Engineers often think "one cache is enough." But different cache layers solve different problems. Browser cache (for static assets) has high hit rates but limited control. CDN cache (for global content) reduces latency but requires cache invalidation across edge locations. Application cache (for computed results) is fast but uses server memory. Database cache (for query results) reduces database load but requires careful invalidation. Use multiple layers—each layer caches what it's good at caching.

How this breaks systems in the real world:

A service used only application-level caching (Redis). This worked for frequently accessed data, but static assets (images, CSS, JS) weren't cached, causing unnecessary requests. Also, the service had global users, but all cache lookups went to a single Redis instance, adding latency for distant users. The fix? Add CDN caching for static assets (long TTL, global distribution) and browser caching (Cache-Control headers). Purge CDN on deploy; keep Redis for dynamic API responses. But the real lesson is: different cache layers solve different problems—push static content to the edge, keep dynamic data near the app.

What interviewers are really listening for:

They want to hear you talk about different cache layers (browser, CDN, application, database) and when to use each. Junior engineers say "just use Redis." Senior engineers say "use browser cache for static assets, CDN for global content, application cache for computed results—each layer has different TTL and invalidation semantics." They're testing whether you understand that caching is about multiple layers, not just one cache.

Cache Stampede and Thundering Herd

What people usually get wrong:

Engineers assume cache misses are rare and cheap. When a hot key expires—or someone flushes Redis—all concurrent requests miss together and hammer the origin. That is a cache stampede (many keys expire) or thundering herd (many requests refresh the same key). Adding Redis without single-flight locking, probabilistic early expiration, or cache warming is borrowing performance until the first mass expiry.

How this breaks systems in the real world:

After an accidental production cache flush, every profile request missed cache at once. Ten thousand requests in a minute each opened a database connection; the pool hit its limit and the API returned timeouts. Retries amplified load. Recovery required warming hot keys and adding distributed locks so only one request per key repopulates the cache. See the full timeline in The Cache Stampede That Took Down Our API. But the real lesson is: design for mass miss events—locking, staggered TTL jitter, and warming are not optional on hot paths.

What interviewers are really listening for:

They want stampede mitigations by name: per-key lock or singleflight, probabilistic early expiration, stale-while-revalidate, cache warming after deploy. Junior engineers say "we'll scale the database." Senior engineers say "only one worker should rebuild a hot key; everyone else waits or serves stale briefly."

When many keys expire together or one hot key is rebuilt by every request, protection looks like this:

Cache stampede mitigation with single-flight lock versus unprotected thundering herd to the database

When the Cache Store Is Unavailable

What people usually get wrong:

Teams treat Redis as always available and never decide fail-open (skip cache, hit origin) vs fail-closed (reject or serve stale only). They also block indefinitely on cache calls instead of using short timeouts. A blip in Redis can take down an API that was healthy at the database layer.

How this breaks systems in the real world:

Redis latency spiked during a failover. API threads blocked on GET calls; p99 jumped even though PostgreSQL was fine. The team set a 50ms cache timeout and fail-open to the database for read paths, with circuit breakers to cap origin load. Write paths that depended on Redis for session storage failed closed with a clear 503. But the real lesson is: the cache is a dependency with its own failure mode—state the degrade path per route tier.

What interviewers are really listening for:

You name fail-open vs fail-closed per data class—catalog reads might degrade to DB with a breaker; payments might reject. Junior engineers say "Redis handles it." Senior engineers say what happens in the first thirty seconds of an outage and how you prevent the database from becoming the next casualty.

When Caching Hurts More Than It Helps

What people usually get wrong:

Engineers cache everything—including low-hit-rate data, large payloads, and rapidly changing values. They ignore memory cost, eviction churn, and debugging pain. Sometimes the right answer is no cache, a shorter query, or a materialized view instead of another Redis keyspace.

How this breaks systems in the real world:

A team cached every search result page with a unique key per user, timestamp, and sort order. Hit rate stayed under 20%; Redis memory grew until eviction thrashed and latency rose. Removing the cache and adding a database index on the filter columns cut p95 in half. Caching worked again only for the top ten static category pages. But the real lesson is: low hit rate means the cache is expensive noise—measure hit rate and origin load before adding keys.

What interviewers are really listening for:

They want you to say when not to cache: write-heavy data, unique keys per request, strong consistency requirements without invalidation discipline, or payloads cheaper to recompute than serialize. Senior engineers mention hit rate, memory ceiling, and operability—not just "caching makes it faster."

Interview questions to practice

Design a product page cache across CDN and Redis—what is cached where, and how do you invalidate on a price change?
You picked cache-aside—what happens when a hot key expires during a flash sale, and how do you prevent a stampede?
Redis is unreachable for thirty seconds—fail open to the database or fail closed, and how do you protect the origin?
Walk me through read-through vs write-through for inventory—where can oversell still happen?
Hit rate is 15% after adding a cache—what do you check before scaling Redis?
One product id owns 80% of traffic—how does your cache key design and warming plan change?

FAQs

Q: What's the difference between cache-aside and read-through?

A: In cache-aside, the application checks the cache and populates it on a miss. In read-through, the cache layer fetches from the source on a miss—the application only talks to the cache. Read-through can centralize single-flight locking; cache-aside gives the app more control.

Q: When should I use write-through vs write-behind?

A: Write-through updates cache and database together—stronger consistency, slower writes. Write-behind writes to cache first and database asynchronously—faster writes, risk of data loss if the cache fails before flush. Use write-through when stale writes are unacceptable (inventory, balances); write-behind only when loss or reordering is tolerable.

Q: TTL or event-based invalidation—which do I pick?

A: TTL when staleness is acceptable and data changes slowly. Event-based when users must see updates immediately (profiles, prices). Version keys are a middle ground—bump version on write, no need to delete old keys eagerly.

Q: What's the difference between cache stampede and thundering herd?

A: Stampede usually means many keys expire together (or a flush) and the origin is overwhelmed. Thundering herd often means many concurrent requests miss the same hot key and all try to rebuild it. Both are fixed with locking, staggered expiry, warming, or serving stale briefly.

Q: Should I cache if Redis might go down?

A: Yes, but define degrade behavior first—fail-open to DB with timeouts and circuit breakers for read-heavy routes, or fail-closed when correctness requires the cache layer. The cache is a dependency, not a free sidecar.

Q: Where do I go for a full distributed cache system design answer?

A: This topic covers fundamentals and trade-offs. For sharding, eviction, hot keys, and cluster topology, use the Distributed Cache system design guide and the practice problem.

Key Takeaways

Invalidate for how data changes — TTL for slow-moving catalog; events or version keys for user-visible freshness

Pattern per entity — cache-aside for flexible reads; write-through when writes and cache must stay aligned

Layer by latency budget — browser and CDN for static; Redis for dynamic; don't ship assets from origin every time

Mass miss is a design case — locking, TTL jitter, warming, and stale-while-revalidate belong in the first sketch

Cache outage is your problem — fail-open vs fail-closed is a per-route product call, not an infrastructure default

Hit rate tells the truth — low hit rate means remove keys or fix key design before buying more memory

Link to the incident story — stampedes and pool exhaustion are interview favorites; know mitigations by name

Keep exploring

Backend interviews reward connected thinking. Follow a related topic or practice problem before the details fade.

Caching Strategies (Backend Fundamentals)

Caching Strategies

Why Engineers Care About This

Core Intuitions You Must Build

Pattern comparison at a glance

Multi-layer request path

Subtopics (Taught Through Real Scenarios)

Cache Invalidation Strategies

Read-Through vs Cache-Aside vs Write-Through

Multi-Layer Caching

Cache Stampede and Thundering Herd

When the Cache Store Is Unavailable

When Caching Hurts More Than It Helps

Interview questions to practice

FAQs

Key Takeaways

Related Topics

Keep exploring