System design pattern
Feed / Timeline
Design a low-latency, personalized timeline that remains stable under celebrity spikes, ranking pressure, and failure recovery.
How to Recognize This Pattern
- You are asked to serve a continuously updating home surface where users expect both freshness and relevance, not just chronological ordering.
- The traffic shape is asymmetric: reads are massive and steady, writes are bursty and can spike dramatically for a tiny subset of producers.
- The interviewer hints at social graph fan-out (followers/following), which signals that naive query-time joins will not survive tail latency SLOs.
- You hear pressure words like 'instant', 'personalized', 'at scale', 'real-time', or 'millions of users'—that is timeline-pattern language.
- The challenge is not storing posts; the challenge is deciding where to pay the compute bill: write path, read path, or a tiered hybrid.
- If product asks for ranking quality, ad insertion, and freshness together, you are in a multi-objective decision problem, not a CRUD problem.
Approach (Step-by-step)
This is where senior candidates show decision quality, not just component naming.
- 1
Start with explicit service goals: p95 read latency target, freshness budget (how stale is acceptable), and feed correctness guarantees during partial failures.
- 2
Model workload tiers: normal producers, high-follower producers, and celebrity producers. A single global strategy is usually wrong.
- 3
Choose a baseline fan-out model and state why. Then define a switch condition for hybridization (for example follower count threshold or queue lag threshold).
- 4
Design write path in detail: post creation API, durable write, event publication, fan-out workers, idempotency keys, and replay-safe retry semantics.
- 5
Design read path in detail: timeline candidate fetch, lightweight online ranking, dedup/filter, cache policy, and stable cursor-based pagination.
- 6
Define data contracts between components: timeline entry schema, ranking feature freshness policy, and deletion/edit propagation guarantees.
- 7
Handle failure and surge scenarios: queue backlog, cache stampedes, ranking timeouts, and regional degradation strategy that still preserves user trust.
- 8
Close with an operations posture: SLO dashboards, alarm thresholds, kill-switches for expensive ranking stages, and emergency fallback feed modes.
Key Trade-offs
Think of this as decision math: where does load move, what fails first, and what user experience are you willing to protect.
Fan-out on write
Simple meaning: you do extra work when someone posts so readers get fast timelines later. Logic: this is great when reads are huge, but one celebrity post can flood your write workers.
Fan-out on read
Simple meaning: posting is cheap, and you build the feed only when a user opens it. Logic: this protects write path during spikes, but user request latency and ranking CPU cost go up.
Decision lens: Use hybrid because traffic is mixed: precompute for normal accounts, but switch celebrity accounts to read-time assembly so one giant account cannot overload the entire pipeline.
Strict ordering
Simple meaning: everyone sees exactly one global post order. Logic: this is clean for reasoning, but distributed coordination cost is high and slows the system at scale.
Eventual ordering
Simple meaning: users may briefly see small order differences, then feeds converge. Logic: you gain throughput and resilience because services coordinate less on every write.
Decision lens: Default to eventual ordering for timeline speed; keep strict ordering only where correctness risk is high (for example compliance, financial events, or moderation audit paths).
Heavy online ranking
Simple meaning: you calculate a lot of ranking logic during each user request. Logic: quality is high, but p95 latency and cost become unstable during peak traffic.
Precomputed ranking features
Simple meaning: you compute most ranking signals before request time. Logic: latency and cost become predictable, but ultra-fresh intent signals are weaker.
Decision lens: Use a two-layer ranking model: heavy features offline + lightweight online rerank. This keeps response fast while still reacting to the latest user context.
Scale Realism (Numbers That Matter)
- Follower distribution: In practice, follower graphs are power-law. A realistic split is ~90-93% long-tail creators (<10k), ~6-8% mid-tier (10k-100k), and ~1% heavy hitters (>100k). This tiny heavy-hitter segment drives most write-path spikes.
- Traffic profile: A realistic peak profile can be ~250k feed reads/sec and ~15k-20k writes/sec globally. During live events, one 10M-follower post can create ~10M fan-out intents within seconds, which will overwhelm naive write fan-out.
- Latency target: For user trust, target p95 < 200ms and p99 < 350ms on warm read paths. Under safe degradation, allow brief p95 drift toward ~320-380ms, but keep behavior deterministic and observable.
- Failure envelope: If queue lag crosses ~15-20s, cache miss ratio jumps >3x baseline, or ranking timeout exceeds ~2-3%, switch to safe mode: lighter ranking, smaller candidate sets, and stricter cache preference.
Hybrid Switching Rules (Operational Logic)
These rules make hybrid strategy measurable and observable.
- Follower threshold rule: accounts with >100k followers default to read-based fan-out.
- Queue lag rule: if fan-out queue lag exceeds 15 seconds for 3 consecutive windows, temporarily move borderline accounts (50k-100k) to read-based fan-out.
- Write amplification rule: if projected fan-out writes per post exceed 1.5M entries, bypass write fan-out for that post and shift to read-time assembly.
- Error budget rule: if write-path error rate crosses 0.5% for fan-out workers, switch to safer read-path mode until error budget recovers.
Read Path Deep Dive
- From hard production lessons: never mix candidate generation and ranking in one expensive query. Candidate generation is your speed layer, ranking is your quality layer. Keep them separate so failures degrade gracefully.
- I usually over-fetch candidates quickly (for example 300-500), then rank down to a final page. This gives ranking room to improve quality without blowing request latency.
- Treat feature freshness in tiers. Tier 0 real-time counters are expensive and noisy. Many useful features can be 5-15 minutes old with negligible quality loss and major cost savings.
- Pagination must be treated as a consistency problem, not a UI problem. Use cursor snapshots or bounded session windows so users do not see duplicates or disappearing posts between pages.
- Always define fallback ranking paths before launch. If full ranking times out, drop to heuristic ranking (recency + social affinity + lightweight engagement) to protect p95.
- Do dedup, mute/block filters, and policy filters before expensive ranking stages. Ranking items you will drop anyway is silent latency waste.
- Track read-path health with three numbers weekly: cache hit ratio, ranking timeout ratio, and candidate over-fetch ratio.
Latency Budget Breakdown
Map each component to a concrete budget so p95 targets are enforceable.
| Component | Target (ms) | Why this budget |
|---|---|---|
| API gateway + auth | 15 | Fast auth path and request validation only. |
| Timeline cache lookup | 25 | Primary path; should serve majority of requests. |
| Candidate generation fallback | 60 | Only when cache misses or stale windows trigger rebuild. |
| Online ranking | 55 | Apply lightweight rerank with strict timeout budget. |
| Serialization + network egress | 20 | Cursor packaging, payload shaping, and response send. |
Real-world Challenges
Celebrity fan-out meltdown
I have seen one live-event post consume enough capacity to starve normal users for minutes. Without account-tier routing, batching, and queue backpressure, freshness collapses platform-wide.
Hot partitions in timeline storage
A classic failure: partition by exactly what product traffic concentrates on (popular users + current time). That creates instant hot shards. Partitioning must spread heat, not mirror popularity.
Cache invalidation under edits/deletes
Denormalized cards are fast until edits and deletes arrive. If invalidation is not replay-safe and idempotent, users see stale or ghost content and trust erodes quickly.
Ranking timeout cascade
Ranking rarely fails as one call; it fails as a dependency chain. One slow dependency multiplies end-to-end latency unless strict per-stage timeouts and fallback tiers are in place.
Data correctness during retries
At-least-once delivery is operationally practical, but duplicates are guaranteed unless idempotency keys and dedupe windows are designed from day one.
What Interviewers Expect
- You quantify traffic and SLOs before architecture choices, so decisions are grounded in measurable goals.
- You choose a fan-out strategy with an explicit hybrid threshold and explain why that threshold is operationally safe.
- You show full write and read paths, including failure behavior, retries, dedupe, and fallback behavior under degradation.
- You explain consistency and ordering choices in product terms (what the user may briefly observe) and not only storage terms.
- You address hot keys/hot partitions and show practical partitioning strategy instead of vague 'use sharding' language.
- You include observability and incident posture: what you monitor, what alerts fire, and what emergency switches you would add.
- You demonstrate judgment: what to simplify in v1 and what to postpone without compromising trust or core correctness.
Practice Problems
These practice sessions map directly to timeline/fan-out decisions. Start with one, then revisit this guide and evaluate where your design leaked latency, correctness, or cost.
- Design Facebook Feed
Best for learning hybrid fan-out and ranking tradeoffs under social-graph scale.
- Design Instagram Feed
Strong practice for media-heavy feed latency, caching, and freshness balancing.
- Design Twitter System
Excellent for reasoning about timeline consistency, hot keys, and write/read pressure.
Architecture Overview
Read this section as a request journey: API receives intent, cache protects latency, database protects correctness, and queue protects the system during spikes. If one box fails, define how the next box keeps user impact limited.
API Layer
This is the front door for publish and read requests. Keep it thin: validate request, enforce auth, attach cursor tokens, and pass work to downstream services quickly. Logic: a thin API layer reduces p95 variance.
Example: Example: User opens home feed -> GET /timeline?cursor=abc -> API validates token and returns feed page + next cursor in under target p95.
Cache
Cache timeline candidates and feed cards so most reads avoid expensive recomputation. Use TTL + invalidation events and stampede protection. Logic: cache is your main latency lever, but only if invalidation is disciplined.
Example: Example: First read misses cache and builds candidates; next 100 reads hit Redis timeline cache, dropping median latency from ~220ms to ~45ms.
Database
Store posts durably and keep timeline/index tables partitioned to spread hot traffic. Use replication and replay-safe mutation logs. Logic: durability and partition strategy protect correctness and throughput together.
Example: Example: Post write lands in durable primary store first; timeline index writes are partitioned by user shard + time bucket to avoid a single hot partition.
Queue
Queue handles async fan-out and enrichment so writes do not block user requests. Consumers must be idempotent with retry + dead-letter controls. Logic: queues absorb spikes, but only if duplicate handling is designed upfront.
Example: Example: Celebrity post emits fan-out events to queue; workers process in batches, retry failed jobs, and dedupe by (postId, followerId) key.
Architecture Diagrams
Visual flows below show where latency is paid and where load is absorbed. Use them as memory anchors in interviews.
Write Path (Post Publish)
Logic: keep user write fast, push heavy fan-out work to async pipeline with idempotent workers.
Read Path (Home Timeline)
Logic: serve from cache first, then fallback to timeline index + lightweight ranking when needed.
Design Evolution (v1 → v3)
v1: Ship reliable basics quickly
Build now
- Read-based fan-out for all users so writes stay stable
- Heuristic ranking (recency + engagement + relationship strength)
- Simple cache for hot feed pages and first-screen blocks
Avoid for now
- ML-heavy ranking pipelines with expensive freshness requirements
- Complex hybrid automation before baseline metrics are trustworthy
- Multi-region active-active writes before single-region reliability is proven
v2: Stabilize latency and cost under real traffic
Build now
- Candidate generation cache tiers with stricter invalidation discipline
- Queue-backed fan-out for normal producers using idempotent workers
- Stronger cursor consistency to prevent cross-page duplicates
Avoid for now
- Real-time feature stores for every ranking signal
- Overfitting ranking knobs without observability and replay tooling
v3: Hybrid optimization at scale
Build now
- Explicit hybrid switching rules with measurable triggers
- Tiered ranking pipeline with graceful fallback paths
- Operational automation for lag, error, and degradation triggers
Avoid for now
- Unbounded online ranking complexity that destabilizes p95
- Hard coupling between ranking, storage, and queue internals
What Not to Build Initially
Strong system design is also about disciplined scope control.
- Do not add ML-heavy ranking in v1; start with transparent heuristic scoring so on-call teams can debug behavior quickly.
- Do not build cross-region write-active architecture before single-region SLOs and incident playbooks are mature.
- Do not add dozens of ranking signals early; every signal adds freshness cost, observability cost, and debugging complexity.
- Do not over-optimize celebrity edge cases before queue lag and write amplification are measured reliably.