System design pattern

Feed / Timeline

Design a low-latency, personalized timeline that remains stable under celebrity spikes, ranking pressure, and failure recovery.

HardScalabilityReal-timeCachingConsistency

Practice: Design Facebook Feed Practice: Design Instagram Feed

How to Recognize This Pattern

You are asked to serve a continuously updating home surface where users expect both freshness and relevance, not just chronological ordering.
The traffic shape is asymmetric: reads are massive and steady, writes are bursty and can spike dramatically for a tiny subset of producers.
The interviewer hints at social graph fan-out (followers/following), which signals that naive query-time joins will not survive tail latency SLOs.
You hear pressure words like 'instant', 'personalized', 'at scale', 'real-time', or 'millions of users'—that is timeline-pattern language.
The challenge is not storing posts; the challenge is deciding where to pay the compute bill: write path, read path, or a tiered hybrid.
If product asks for ranking quality, ad insertion, and freshness together, you are in a multi-objective decision problem, not a CRUD problem.

Approach (Step-by-step)

This is where senior candidates show decision quality, not just component naming.

1
Start with explicit service goals: p95 read latency target, freshness budget (how stale is acceptable), and feed correctness guarantees during partial failures.
2
Model workload tiers: normal producers, high-follower producers, and celebrity producers. A single global strategy is usually wrong.
3
Choose a baseline fan-out model and state why. Then define a switch condition for hybridization (for example follower count threshold or queue lag threshold).
4
Design write path in detail: post creation API, durable write, event publication, fan-out workers, idempotency keys, and replay-safe retry semantics.
5
Design read path in detail: timeline candidate fetch, lightweight online ranking, dedup/filter, cache policy, and stable cursor-based pagination.
6
Define data contracts between components: timeline entry schema, ranking feature freshness policy, and deletion/edit propagation guarantees.
7
Handle failure and surge scenarios: queue backlog, cache stampedes, ranking timeouts, and regional degradation strategy that still preserves user trust.
8
Close with an operations posture: SLO dashboards, alarm thresholds, kill-switches for expensive ranking stages, and emergency fallback feed modes.

Key Trade-offs

Think of this as decision math: where does load move, what fails first, and what user experience are you willing to protect.

Fan-out on write

Simple meaning: you do extra work when someone posts so readers get fast timelines later. Logic: this is great when reads are huge, but one celebrity post can flood your write workers.

Fan-out on read

Simple meaning: posting is cheap, and you build the feed only when a user opens it. Logic: this protects write path during spikes, but user request latency and ranking CPU cost go up.

Decision lens: Use hybrid because traffic is mixed: precompute for normal accounts, but switch celebrity accounts to read-time assembly so one giant account cannot overload the entire pipeline.

Strict ordering

Simple meaning: everyone sees exactly one global post order. Logic: this is clean for reasoning, but distributed coordination cost is high and slows the system at scale.

Eventual ordering

Simple meaning: users may briefly see small order differences, then feeds converge. Logic: you gain throughput and resilience because services coordinate less on every write.

Decision lens: Default to eventual ordering for timeline speed; keep strict ordering only where correctness risk is high (for example compliance, financial events, or moderation audit paths).

Heavy online ranking

Simple meaning: you calculate a lot of ranking logic during each user request. Logic: quality is high, but p95 latency and cost become unstable during peak traffic.

Precomputed ranking features

Simple meaning: you compute most ranking signals before request time. Logic: latency and cost become predictable, but ultra-fresh intent signals are weaker.

Decision lens: Use a two-layer ranking model: heavy features offline + lightweight online rerank. This keeps response fast while still reacting to the latest user context.

Scale Realism (Numbers That Matter)

Follower distribution: In practice, follower graphs are power-law. A realistic split is ~90-93% long-tail creators (<10k), ~6-8% mid-tier (10k-100k), and ~1% heavy hitters (>100k). This tiny heavy-hitter segment drives most write-path spikes.
Traffic profile: A realistic peak profile can be ~250k feed reads/sec and ~15k-20k writes/sec globally. During live events, one 10M-follower post can create ~10M fan-out intents within seconds, which will overwhelm naive write fan-out.
Latency target: For user trust, target p95 < 200ms and p99 < 350ms on warm read paths. Under safe degradation, allow brief p95 drift toward ~320-380ms, but keep behavior deterministic and observable.
Failure envelope: If queue lag crosses ~15-20s, cache miss ratio jumps >3x baseline, or ranking timeout exceeds ~2-3%, switch to safe mode: lighter ranking, smaller candidate sets, and stricter cache preference.

Hybrid Switching Rules (Operational Logic)

These rules make hybrid strategy measurable and observable.

Follower threshold rule: accounts with >100k followers default to read-based fan-out.
Queue lag rule: if fan-out queue lag exceeds 15 seconds for 3 consecutive windows, temporarily move borderline accounts (50k-100k) to read-based fan-out.
Write amplification rule: if projected fan-out writes per post exceed 1.5M entries, bypass write fan-out for that post and shift to read-time assembly.
Error budget rule: if write-path error rate crosses 0.5% for fan-out workers, switch to safer read-path mode until error budget recovers.

Read Path Deep Dive

From hard production lessons: never mix candidate generation and ranking in one expensive query. Candidate generation is your speed layer, ranking is your quality layer. Keep them separate so failures degrade gracefully.
I usually over-fetch candidates quickly (for example 300-500), then rank down to a final page. This gives ranking room to improve quality without blowing request latency.
Treat feature freshness in tiers. Tier 0 real-time counters are expensive and noisy. Many useful features can be 5-15 minutes old with negligible quality loss and major cost savings.
Pagination must be treated as a consistency problem, not a UI problem. Use cursor snapshots or bounded session windows so users do not see duplicates or disappearing posts between pages.
Always define fallback ranking paths before launch. If full ranking times out, drop to heuristic ranking (recency + social affinity + lightweight engagement) to protect p95.
Do dedup, mute/block filters, and policy filters before expensive ranking stages. Ranking items you will drop anyway is silent latency waste.
Track read-path health with three numbers weekly: cache hit ratio, ranking timeout ratio, and candidate over-fetch ratio.

Latency Budget Breakdown

Map each component to a concrete budget so p95 targets are enforceable.

Component	Target (ms)	Why this budget
API gateway + auth	15	Fast auth path and request validation only.
Timeline cache lookup	25	Primary path; should serve majority of requests.
Candidate generation fallback	60	Only when cache misses or stale windows trigger rebuild.
Online ranking	55	Apply lightweight rerank with strict timeout budget.
Serialization + network egress	20	Cursor packaging, payload shaping, and response send.

Real-world Challenges

Celebrity fan-out meltdown

I have seen one live-event post consume enough capacity to starve normal users for minutes. Without account-tier routing, batching, and queue backpressure, freshness collapses platform-wide.

Hot partitions in timeline storage

A classic failure: partition by exactly what product traffic concentrates on (popular users + current time). That creates instant hot shards. Partitioning must spread heat, not mirror popularity.

Cache invalidation under edits/deletes

Denormalized cards are fast until edits and deletes arrive. If invalidation is not replay-safe and idempotent, users see stale or ghost content and trust erodes quickly.

Ranking timeout cascade

Ranking rarely fails as one call; it fails as a dependency chain. One slow dependency multiplies end-to-end latency unless strict per-stage timeouts and fallback tiers are in place.

Data correctness during retries

At-least-once delivery is operationally practical, but duplicates are guaranteed unless idempotency keys and dedupe windows are designed from day one.

What Interviewers Expect

You quantify traffic and SLOs before architecture choices, so decisions are grounded in measurable goals.
You choose a fan-out strategy with an explicit hybrid threshold and explain why that threshold is operationally safe.
You show full write and read paths, including failure behavior, retries, dedupe, and fallback behavior under degradation.
You explain consistency and ordering choices in product terms (what the user may briefly observe) and not only storage terms.
You address hot keys/hot partitions and show practical partitioning strategy instead of vague 'use sharding' language.
You include observability and incident posture: what you monitor, what alerts fire, and what emergency switches you would add.
You demonstrate judgment: what to simplify in v1 and what to postpone without compromising trust or core correctness.

Practice Problems

These practice sessions map directly to timeline/fan-out decisions. Start with one, then revisit this guide and evaluate where your design leaked latency, correctness, or cost.

Design Facebook Feed
Best for learning hybrid fan-out and ranking tradeoffs under social-graph scale.
Design Instagram Feed
Strong practice for media-heavy feed latency, caching, and freshness balancing.
Design Twitter System
Excellent for reasoning about timeline consistency, hot keys, and write/read pressure.

Open Design Facebook Feed Open Design Instagram Feed Open Design Twitter System

Architecture Overview

Read this section as a request journey: API receives intent, cache protects latency, database protects correctness, and queue protects the system during spikes. If one box fails, define how the next box keeps user impact limited.

API Layer

This is the front door for publish and read requests. Keep it thin: validate request, enforce auth, attach cursor tokens, and pass work to downstream services quickly. Logic: a thin API layer reduces p95 variance.

Example: Example: User opens home feed -> GET /timeline?cursor=abc -> API validates token and returns feed page + next cursor in under target p95.

Cache

Cache timeline candidates and feed cards so most reads avoid expensive recomputation. Use TTL + invalidation events and stampede protection. Logic: cache is your main latency lever, but only if invalidation is disciplined.

Example: Example: First read misses cache and builds candidates; next 100 reads hit Redis timeline cache, dropping median latency from ~220ms to ~45ms.

Database

Store posts durably and keep timeline/index tables partitioned to spread hot traffic. Use replication and replay-safe mutation logs. Logic: durability and partition strategy protect correctness and throughput together.

Example: Example: Post write lands in durable primary store first; timeline index writes are partitioned by user shard + time bucket to avoid a single hot partition.

Queue

Queue handles async fan-out and enrichment so writes do not block user requests. Consumers must be idempotent with retry + dead-letter controls. Logic: queues absorb spikes, but only if duplicate handling is designed upfront.

Example: Example: Celebrity post emits fan-out events to queue; workers process in batches, retry failed jobs, and dedupe by (postId, followerId) key.

Architecture Diagrams

Visual flows below show where latency is paid and where load is absorbed. Use them as memory anchors in interviews.

Write Path (Post Publish)

Logic: keep user write fast, push heavy fan-out work to async pipeline with idempotent workers.

Client

API Layer

Post Store

Event Queue

Fan-out Workers

Timeline Cache/Index

Read Path (Home Timeline)

Logic: serve from cache first, then fallback to timeline index + lightweight ranking when needed.

Client

API Layer

Timeline Cache

Timeline Index DB

Ranking Layer

Response

Design Evolution (v1 → v3)

v1: Ship reliable basics quickly

Build now

Read-based fan-out for all users so writes stay stable
Heuristic ranking (recency + engagement + relationship strength)
Simple cache for hot feed pages and first-screen blocks

Avoid for now

ML-heavy ranking pipelines with expensive freshness requirements
Complex hybrid automation before baseline metrics are trustworthy
Multi-region active-active writes before single-region reliability is proven

v2: Stabilize latency and cost under real traffic

Build now

Candidate generation cache tiers with stricter invalidation discipline
Queue-backed fan-out for normal producers using idempotent workers
Stronger cursor consistency to prevent cross-page duplicates

Avoid for now

Real-time feature stores for every ranking signal
Overfitting ranking knobs without observability and replay tooling

v3: Hybrid optimization at scale

Build now

Explicit hybrid switching rules with measurable triggers
Tiered ranking pipeline with graceful fallback paths
Operational automation for lag, error, and degradation triggers

Avoid for now

Unbounded online ranking complexity that destabilizes p95
Hard coupling between ranking, storage, and queue internals

What Not to Build Initially

Strong system design is also about disciplined scope control.

Do not add ML-heavy ranking in v1; start with transparent heuristic scoring so on-call teams can debug behavior quickly.
Do not build cross-region write-active architecture before single-region SLOs and incident playbooks are mature.
Do not add dozens of ranking signals early; every signal adds freshness cost, observability cost, and debugging complexity.
Do not over-optimize celebrity edge cases before queue lag and write amplification are measured reliably.

How to Recognize This Pattern

Approach (Step-by-step)

Key Trade-offs

Fan-out on write

Fan-out on read

Strict ordering

Eventual ordering

Heavy online ranking

Precomputed ranking features

Scale Realism (Numbers That Matter)

Hybrid Switching Rules (Operational Logic)

Read Path Deep Dive

Latency Budget Breakdown

Real-world Challenges

Celebrity fan-out meltdown

Hot partitions in timeline storage

Cache invalidation under edits/deletes

Ranking timeout cascade

Data correctness during retries

What Interviewers Expect

Practice Problems

Relevant System Design Topics

Architecture Overview

API Layer

Cache

Database

Queue

Architecture Diagrams

Write Path (Post Publish)

Read Path (Home Timeline)

Design Evolution (v1 → v3)

v1: Ship reliable basics quickly

v2: Stabilize latency and cost under real traffic

v3: Hybrid optimization at scale

What Not to Build Initially