System design interview guide
Design Facebook Feed
TL;DR: From the user’s side, the home feed is simple: open the app, scroll, see posts from people and pages you follow—ideally ranked or at least fresh enough that the product feels alive. From your side as the engineer, the hard part is the shape of the traffic: reads dwarf writes by orders of magnitude, the follow graph is huge and skewed, and a handful of accounts have follower counts that break any naive “fan this post out to everyone” or “at read time, join every account I follow and sort by time” story. In practice you separate durable post storage from feed assembly, you think about fan-out on write versus pull on read versus a hybrid, you shard mailboxes or candidate sets, you put ranking behind a latency budget, and you say what degrades when queues back up or the ranker slows down. The interview is basically: walk that path like you have shipped it, not like you memorized a diagram.
Problem statement
You’re designing the home feed for a large social network: updates from people and pages you follow, ordered by relevance or recency, paginated, and fast enough that scrolling feels instant. At interview scale, the interesting part is not the ER diagram—it’s how you avoid doing unbounded work per user per swipe.
How I’d frame the constraints in a room. Product-wise, people need to create posts—text, links, and media—and to follow or unfollow who they care about. The core read is the home timeline with pagination (not “load everything”), plus reactions and comments on posts so the surface feels social, and hide/block flows because safety is not optional. Most prompts also want you to name two product modes—“most recent” versus “top stories” (or similar)—even if both ride on the same backends.
On the non-functional side, assume a global audience: latency is measured where users actually wait, so anchor the read path to something like feed p95 under 300ms for typical sessions—not a lab number on an empty table. High availability is table stakes; the home screen cannot be the thing that goes dark. And be straight about ranking: the scoring pipeline may legitimately lag seconds behind the raw post—if you argue the ranker is instant and strongly consistent with every write, interviewers will push.
Scale is where polite hand-waving ends. Call the headline numbers: 1B+ MAU, 500M+ DAU, hundreds of millions of posts per day, feed reads orders of magnitude higher than writes, and skew—some accounts sit at 100M+ followers. That last line is not trivia; it decides whether fan-out, storage, and what “fair” means for the system are even definable the naive way.
What I want you to walk through—not as a checklist, but as a story.
Push, pull, or hybrid. If you only say “we fan out on write,” I’m going to ask what happens when a post would fan out to a hundred million mailboxes. If you only say “we pull on read,” I’m going to ask what happens when someone follows thousands of accounts. Production systems usually blend: materialize where it’s affordable, merge or pull where it isn’t, and have a named policy for the head of the distribution—not a hand-wave labeled “celebrity.”
Where data lives and what you cache. The social graph (who follows whom, blocks, mutes) is hot and heavily cached; post bodies and media live in their own durability story—often object storage and a CDN—while feed assembly deals in IDs and ordering, not giant joins on every swipe. Say that out loud: you’re not doing one SQL query that joins the world.
Ranking as a contract, not a magic box. Mix online features with precomputed or batch scores; wire in experimentation (launches, holdouts) and safety (integrity, policy) as first-class inputs—not bolt-ons. Strong candidates separate candidate generation (“what might appear?”) from ordering (“in what order?”) and say what happens when the ranker misses its budget or returns late.
Pagination when the world moves. If order can change between taps, offset/limit is a trap at scale. You want cursors and a grown-up conversation about duplicates, skips, and best-effort—not fake strong consistency.
Media and previews off the hot path. Uploads, transcoding, link unfurling—these are async pipelines. They should not own your feed p95.
Multi-region means choosing where the graph is authoritative, where feed rows or mailboxes are materialized, and what replication lag you accept when edges or fan-out cross regions—without pretending one synchronous world map.
Failure and degradation. When fan-out or ranking paths are slow, the product still needs a feed: bounded work on read, timeouts, fallback ordering (often recency), and honest degradation—not an edge fleet blocked forever behind a stuck ranker or a backed-up fan-out service.
Introduction
If you have only one mental image for this problem, make it skew: almost everyone reads, almost nobody posts, and a handful of accounts have more followers than some countries have people. The home feed is “posts from the graph, in some order, fast”—the boring part is storage; the interesting part is not doing unbounded work when someone scrolls or when a celebrity hits Post.
Interviewers are not grading your ability to draw eleven boxes. They want to hear you name the hot paths, split push vs pull without hand-waving, and admit where consistency gets fuzzy (pagination + ranking + graph churn). If you can do that in plain language, you are already ahead of the template answers.
How to approach
Treat the first minutes as a negotiation, not a quiz. Ask what “feed” means (home only vs groups), whether ranking is in scope, and what “fresh” means in seconds—not because you need permission to be smart, but because your tradeoffs change if the answer is “purely chronological” vs “Top stories with ML.”
Then sketch capacity in one pass: reads vs writes, rough fan-out per post for a normal user vs a page, and where the fire starts (mailbox shards, fan-out queue, ranker). After that, walk one write (post created → durable → async fan-out) and one read (load candidates → merge → rank → page) before you let yourself drown in storage brands.
If time is short, prioritize celebrity policy and pagination under changing rank—those are the questions that separate “I read a blog post” from “I’ve thought about production.”
Interview tips
Say where the graph lives and what you cache. Say where post bodies live vs feed IDs. Say fan-out is async and idempotent so retries do not duplicate in mailboxes.
When you mention Kafka or a queue, tie it to backpressure: what you measure when workers fall behind. When you mention ranking, separate candidate generation from scoring and pick a freshness story you can defend.
Expect to be pressed on blocks and unfollows—a crisp answer admits brief staleness but never hand-waves safety. Expect “what if the ranker is slow?”—timeouts and recency fallback beat heroic blocking.
Capacity estimation
Rough numbers to anchor the whiteboard (tune in the room):
| Input | Order of magnitude |
|---|---|
| DAU | 500M+ (from problem spec) |
| Home feed reads per DAU | Tens to hundreds per day (people open the app often, not every scroll is a full refetch) |
| Writes (new posts) per day | Hundreds of millions globally → write QPS is modest vs read QPS |
| Follow graph | Highly skewed: median user hundreds of follows; P99 and celebrities dominate cost |
Implications: You cannot do O(followers) work on every read. You shard mailboxes or cap pull-merge. You async fan-out. You bound candidate pools before ranking so p95 stays in range.
High-level architecture
At a whiteboard, you need a story, not a catalog. One reasonable story: traffic hits edge + API (TLS termination, auth, rate limits, request IDs); a feed service owns “give me the next page of the home timeline”; it pulls candidate post IDs from per-user storage (mailboxes and/or on-read merge), batch-loads post bodies and author metadata, applies safety filters (blocks, policy), then hands a bounded candidate set to a ranker with a hard time budget. Nothing in that sentence assumes a single database or a monolith—only clear ownership of each hop.
Who owns what (typical split):
- API / BFF — Session or token validation, coarse routing, timeouts to downstreams, sometimes response shaping so the mobile client is not fanning out to twenty microservices.
- Feed service — Orchestrates read path: assemble candidates, call graph for “who can this viewer see?”, call post store for bodies, call ranker, return cursor + hydrated posts. It is where you cap work (max candidates, max fan-out depth).
- Mailbox / timeline store — Durable ordered lists of post IDs (or score+id tuples) per viewer, often sharded by
user_id. This is not the full post body store. - Post service + object storage — Source of truth for post metadata (author, created_at, privacy, media keys). Blob/video live in object storage; CDN serves bytes; URLs in API responses are usually signed or stable ids resolved at read time.
- Graph service — Follow / unfollow / block / mute edges; heavily cached at the edge of the feed path because every home read touches it.
- Fan-out workers — Consumers off a queue or log: given
(post_id, author_id), append to follower mailboxes where push is in policy. Work is at-least-once; writes are idempotent(viewer_id, post_id)so retries do not duplicate. - Ranker / scoring — May be embedded in feed svc or separate; often feature/log fetches first, then score. Treat it as best-effort within SLA, not a synchronous join to every offline job.
Where caching sits: Hot post bodies, author profiles, and graph “following set” snapshots are common cache layers—often between feed orchestration and the durable stores. Say what is safe to serve stale (profile display name) vs not (block list for safety).
Celebrity / hybrid on the diagram: For authors you do not fully fan out, the feed service still merges recent posts by author id (or pulls from an author-indexed store) into the candidate pool. Label that edge on the whiteboard—otherwise the diagram only shows push fan-out and you get the follow-up you deserve.
[ Client ]
→ [ LB / API gateway ] → [ Feed svc ] → [ Cache? ] → [ Mailbox store / merge ] → [ Ranker ] → response
↑ ↑
| +-- optional: recent-tail / author merge (hybrid)
|
[ Graph svc ]--+ (follows, blocks)
|
[ Post svc ] ←---------+ (metadata, not media bytes)
|
+→ [ Fan-out queue ] → workers → mailboxes (per-viewer timeline shards)
|
+→ [ Object store ] → [ CDN ] (media)
In the room: Narrate the read path first (latency budget, bounded candidates, ranker timeout). Then one write (durable post → async fan-out → idempotent mailbox writes). Interviewers often let you fill in storage brands once they believe you will not fan-out 100M rows on every post.
Core design approaches
Fan-out on write (push)
After a post is stored, push its ID into each follower’s mailbox (per shard). Reads feel cheap because the heavy work already happened.
The catch is the O(followers) wall. The moment you say “we fan out to everyone,” you owe the room a celebrity exception—or you have accidentally designed a distributed denial-of-service machine that charges you money.
Pull on read
At request time, pull recent posts from followed accounts and merge. That dodges the celebrity write blast, but you can spend a lot of CPU merging wide fan-in if someone follows thousands of accounts—so real systems cap, cache, or hybridize.
Hybrid (what production usually sounds like)
Push for the long tail where fan-out is cheap. For mega-followed authors, do not materialize every edge—merge their recent posts on read, or fan-out only to “active” mailboxes, or store by author and intersect. The exact policy matters less than saying there is a policy and naming the cost driver (queue depth, storage, blast radius).
Detailed design
Write path
Walk it like you are explaining an outage retro: media lands in object storage; the post row gets durable metadata; the client gets success without waiting for every follower’s mailbox row. Fan-out runs as async work with idempotency so retries do not duplicate (post_id → mailbox) inserts.
Everything else—search indexing, notifications, ugly link previews—can trail behind. If you put those on the critical path, you will miss latency and still get the wrong Open Graph image sometimes.
Read path
This is the path that actually matters for “how does Facebook feel fast?” Load a chunk of candidate IDs for the viewer (precomputed + optional recent tail if you hybridize), multi-get bodies and authors, filter blocks and hard safety rules, then rank under a time budget and return a cursor that survives the next request.
If you only say “we call the ML service,” follow up with what you do when ML is slow—because that is what your on-call cares about.
Key challenges
- Skew: One author, 100M followers—fan-out on write is a storage and queue blast radius.
- Pagination vs ranking: Order changes between requests; cursors and occasional dup/skip are expected—don’t fake strong consistency.
- Graph churn: Unfollow/block must become visible without infinite recomputation—TTL, lazy filter, compensating deletes.
- Ranking lag: Scores can be seconds stale; product and safety need explicit contracts.
Scaling the system
Caching is not a vibe; it is a list of keys that catch fire. Hot posts, hot profiles, and “everyone requests the same post ID at once” are where you add request coalescing and TTL jitter. Sharding is usually mailboxes by user_id, posts by id or author-time, and no joins on the hot read path.
Multi-region is almost always “read mostly local, accept some staleness, replay fan-out idempotently”—if you claim one global strongly consistent view of every mailbox, you will get a skeptical follow-up.
Bottlenecks and tradeoffs
Consistency vs latency: Cross-service strong consistency is rare. Most feeds are eventually materialized and best-effort ranked—say what you sacrifice first.
Cost vs freshness: More precompute and storage buys smoother reads; thinner mailboxes push work to read time and hurt tail latency.
Simplicity vs scale: A single relational database is a fine starting point in a classroom; at this prompt’s scale you are usually talking sharded KV or wide-column for mailboxes because the access pattern is “load a lot of IDs for one user, fast.”
Failure handling
- Ranker slow or down: Timeout; fall back to recency-only slice; feature flag off heavy models.
- Fan-out backlog: Still show read path from existing mailboxes + pull-merge; alert on queue age, shed load.
- Cache miss / stampede: TTL jitter, singleflight for hot post IDs.
- DB degraded: Circuit-break; serve stale but safe feed rather than empty 500s where product allows.
API design
Illustrative REST-style surface (GraphQL is fine if you name batching, N+1 avoidance, and field cost—feeds are exactly where naive graphs hurt).
Core resources: posts, users, me/home (collection = feed page). Version the path (/v1/) so you can deprecate fields without breaking every client.
POST /v1/posts
GET /v1/me/home
GET /v1/posts/{post_id}
POST /v1/users/{user_id}/follow
DELETE /v1/users/{user_id}/follow
POST /v1/posts/{post_id}/reactions # optional in scope
GET /v1/posts/{post_id}/comments # optional; often cursor-paged
POST /v1/posts/{post_id}/comments
Home feed — GET /v1/me/home
| Query param | Role |
|---|---|
cursor | Opaque continuation from last response (preferred) or compound (score,post_id) if you must spell it |
limit | Page size (cap server-side, e.g. ≤ 50) |
ranking | top | recent (product mode; may change candidate source + ranker) |
session_id | Optional, for session-scoped snapshot / experiments |
Response sketch: { "items": [ Post ], "next_cursor": "...", "has_more": true }. Each Post embeds or references author, media URLs or placeholders, counts (reactions/comments), and client tokens for ranking debug only if you want to argue observability.
Errors: 401 unauthenticated; 429 with Retry-After when throttled; avoid leaking whether a blocked user exists—use generic 404/403 per product policy.
Create post — POST /v1/posts
- Headers:
Idempotency-Keyfor safe retries (same key → samepost_id, no duplicate stories). - Body: text, link URL,
media_ids[]already uploaded via separate upload URLs (multipart to object storage is common—do not put 4K video bytes in JSON). - Success:
201+Location: /v1/posts/{id}; body includescreated_atand processing state (media_ready: pending/ready).
Follow graph
POST / DELETE on /v1/users/{id}/follow return 204 or 200 with updated counts. Graph updates are eventually visible in feed—say so if asked.
Reactions / comments (if in scope)
Prefer nested routes under the post for clarity and cache keys: POST /v1/posts/{id}/reactions with { "type": "like" }. Comments page the same way as the feed: cursor + limit, same duplicate/skip story when new comments arrive during scroll.
Cross-cutting behavior to name in the interview
- Rate limits: Per-user and per-IP on create and follow; return
X-RateLimit-Remainingif you want extra credit. - Pagination contract: Document whether
next_cursoris opaque (ranker-issued) or structured; either is fine if you explain moving ranks and dupes. - Versioning & mobile: Older app versions may send
Accept-Featureor aclient_versionquery—only if you have time; otherwise acknowledge backward-compatible field additions.
In the room: Spend one minute on idempotency for
POST /v1/posts, cursor semantics forGET /v1/me/home, and why reactions are not on the critical path for feed p95—that is enough to sound like you shipped APIs, not that you memorized a table.
Production angles
These are the kinds of stories that show up in postmortems and on-call—not because the whiteboard diagram was wrong, but because queues, timeouts, and caches behave badly under real skew and real deploys. If you are junior, read them as “what can go wrong and what we measure.” If you are senior, read them as “how I’d steer the room when someone asks what breaks in production.”
Fan-out backlog while posts “succeed”
What it looks like — Traffic spikes (viral moment, live event, celebrity post). Post creation stays healthy: the API writes the row, returns 200/201, media lands in object storage. Feeds feel stale: users refresh and see yesterday’s ordering while the new post is “somewhere in the pipeline.”
Why it happens — Fan-out is asynchronous. Workers enqueue (post_id, author_id) and fan into follower mailboxes at a finite rate. When enqueue depth or per-shard backlog grows, latency to visibility grows—even though the write path succeeded.
What good teams do — Alert on queue age, consumer lag, and per-shard backlog (not just “CPU is fine”). Cap fan-out work per tick; shed load by leaning harder on read-side merge for the tail of the graph; consider priority lanes for very fresh windows vs full historical fan-out. Juniors sometimes stop at “we have a queue”; seniors name which metric goes red first and what product does while the queue drains.
Ranker deploys and thread exhaustion
What it looks like — A deploy touches the ranking service (new model, bad config, regression in feature extraction). p95/p99 for rank jumps. If the feed service blocks on ranking with a generous or missing timeout, worker threads or connection pools at the edge fill up. Unrelated routes can degrade because the fleet is wedged waiting on one dependency.
Why it hurts — Feed read is often one graph hop away from “everything feels broken”: the ranker is not a side quest; it sits on the hot path unless you explicitly isolate it.
What good teams do — Hard timeouts on rank; circuit breakers; fallback ordering (recency-only, last-known-good slice, cached scores). Kill switches and feature flags to bypass heavy models. Separate SLOs: “feed loads” vs “feed is perfectly ranked.” Seniors describe blast radius; juniors learn to never let one service hold the whole edge without an escape hatch.
Partial fan-out, multi-region, and “I see it, you don’t”
What it looks like — A celebrity post fans out to some shards or regions first; others catch up via replay, reconciliation, or pull-merge. Two coworkers in different regions argue whether the post appeared—“it’s there for me, not for you”—for minutes.
Why it happens — You almost never have one synchronous global fan-out to every mailbox. Idempotent consumers and at-least-once delivery mean duplicates are handled; ordering and visibility across regions are eventually aligned.
What good teams do — Idempotency keys on fan-out writes; dedupe in mailbox stores; observability on cross-region replication lag. Product-facing honesty: eventual visibility within N minutes under load beats pretending strong global consistency. Seniors talk SLAs for visibility and blast radius; juniors should at least not claim instant worldwide materialization.
Cache stampede on a hot post ID
What it looks like — One post goes viral. Every feed request multi-gets the same post_id. Cache entry for that key expires; thousands of requests miss together and hit the database or origin at once—thundering herd.
Why it happens — TTL expiry without jitter lines up misses. No request coalescing (singleflight) means N identical fetches become N backend hits.
What good teams do — TTL jitter; singleflight / request coalescing per key; stale-while-revalidate for read-heavy keys; sometimes local in-process caching for the hottest IDs. Seniors mention hot key observability; juniors learn that “we added Redis” does not automatically fix coordination at the moment of expiry.
How to use this in an interview — You do not need to recite four novellas. You do need one concrete story: pick queue lag or ranker timeout or stampede, name one metric and one mitigation, and show you understand why the user still sees a 200 on post while read quality degrades. That is the difference between a diagram and a system someone has operated.
What interviewers expect
- Scope first: post types you care about, follow model, ranking vs chronological, pagination, and what “fresh” means in milliseconds vs seconds.
- Write path: create post → durable storage → async fan-out or enqueue jobs; acknowledge success without waiting for every follower’s mailbox.
- Read path: merge precomputed candidate IDs with recent tail, optional on-read merge for low-follow users, then rank and trim.
- Data stores with a reason: hot social graph, post metadata, feed shards or key-value mailboxes, object storage for media, cache layers with eviction discipline.
- The celebrity problem as a concrete policy (e.g. don’t fan-out 100M rows; store by author, merge on read, cap precomputes).
- Failure and degradation: ranking down → fall back to recency; fan-out lag → still show something; partial graph cache miss → bounded extra lookups.
- Ops-shaped details: idempotent workers, backpressure, observability on fan-out depth and rank latency—not just boxes on a whiteboard.
Interview workflow (template)
- Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
- Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
- APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
- Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
- Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
- Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).
Frequently asked follow-ups
- Fan-out on write vs pull on read—when do you use each, and what’s the hybrid?
- How do you handle a user or page with 50M+ followers?
- Where does ranking run, and how stale can scores be relative to the raw feed?
- How do you paginate when the ranked order can change between requests?
- What breaks if your fan-out queue falls behind by five or thirty minutes?
Deep-dive questions and strong answer outlines
Walk through what happens when a normal user with 400 friends publishes a new post.
Persist post (metadata + media pointers) durably, return success to the client. Enqueue fan-out or write to followers’ feed shards asynchronously. Typical users get push-based materialization; mention idempotency (post_id, follower_shard) so retries don’t duplicate. Latency target is for read, not for every follower to see it in the same second.
How is that different for a celebrity account?
Avoid O(followers) writes. Options: skip fan-out and merge their posts on read into everyone’s candidate set; partial fan-out to active users only; cap stored fan-out and rely on hybrid retrieval. Say explicitly that storage, queue depth, and blast radius drive the choice—not “we use Kafka.”
How do you build the home feed response on a read request?
Fetch a chunk of precomputed post IDs from the user’s feed store (sharded by user_id), merge with a recent tail from followed authors if you use hybrid pull, fetch post bodies and author metadata in batch (often multi-get by id). Pass candidates to ranker with features (recency, engagement, affinity); trim to page size. Mention bounding work: max candidates, timeouts, cache.
Where does “Top stories” vs “Most recent” change the design?
Recency mode can lean on time-ordered materialization with lighter ranking. Top stories needs a scoring pipeline (batch + online), possibly different candidate generation (e.g. more sources, diversity constraints). Call out that switching modes is product config on top of the same storage, not necessarily two separate backends.
How do you paginate without duplicates when ranking shifts?
Cursor tied to (score, post_id) or opaque continuation tokens from the ranker; acknowledge that strict stability is hard—prefer best-effort with rare dupes skipped client-side, or snapshot per session if they push you there. Weak answers pretend offset/limit “works fine” at scale.
What data stores would you actually name and what goes where?
Graph edges in a service-optimized store (often sharded KV or wide-column) with heavy caching. Post bodies and immutable media metadata in object store + CDN; feed mailboxes in KV or wide-column partitioned by user_id. Strong candidates mention hot keys, replication, and read patterns (batch get by id) instead of “we use SQL.”
What happens when ranking is slow or down?
Timeout with fallback ordering (pure recency or last good ranked slice), circuit-break, and feature flags. Don’t block the whole feed on ML; degrade visibly but keep scrolling possible. Bonus: SLOs on p95 read path separate from ranker freshness.
How do unfollows and blocks show up in the feed?
Remove or filter edges in graph service; stale precomputed entries age out via TTL, lazy filtering at read time, or compensating deletes in feed shards. Admit brief inconsistency vs cost; blocks often need hard filtering at read for safety.
AI feedback on your design
After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.
FAQs
Q: Do I have to design ranking and ML end to end?
A: No. Strong answers define interfaces: candidate IDs in, scored ordering out, freshness SLAs, and safety hooks. Interviewers care that you know ranking is compute-heavy and eventually consistent—not that you derive a loss function on the whiteboard.
Q: Is fan-out on write always wrong because of celebrities?
A: It’s wrong as a universal rule. In production you almost always mix push for the long tail and pull or hybrid for the head. Saying only “we fan out” without the exception path is what gets you follow-up pain.
Q: Should the feed be strongly consistent with the graph?
A: Not at the expense of availability and latency. People tolerate tiny lag (unfollow → post still visible once) if abuse paths are covered. Say what you optimize: safety blocks vs eventual graph cleanup.
Q: How deep should I go on media and link previews?
A: Acknowledge async pipelines: upload to blob storage, CDN, crawler for OG tags—off the critical read path. If you spend ten minutes on Open Graph, you’re usually avoiding the hard feed questions.
Q: What’s the difference between this and “design Twitter feed”?
A: Same skeleton (graph, fan-out, timeline), different emphasis—Twitter often stresses public firehose and real-time; Facebook-style prompts often push ranking, hybrid fan-out, and multi-type content. Don’t cargo-cult one diagram onto the other.
Q: How much math do I need?
A: Order-of-magnitude QPS, storage for mailboxes, and fan-out cost per post are enough if assumptions are stated. Interviewers use numbers to see if you notice hot keys and queue depth—not to check mental arithmetic.
Practice interactively
Open the practice session to use the canvas and stages, then review AI feedback.