System design interview guide
Design Facebook Post Search
Users search posts they can see—by keywords, author, time—with latency and freshness expectations that don’t forgive naive “grep the database.” The hard part is inverted index scale plus authorization: two posts might match text, but only one is visible to the searcher. Weak answers index everything and filter later without counting cost.
Problem statement
You’re designing search over social posts: keyword retrieval, filters (author, time, type), pagination, and strict respect for privacy and blocks—at billions of indexed posts and high QPS.
Constraints. Functional: query API, filters, visibility rules. Non-functionally: p95 latency budget (hundreds of ms), high read QPS, near-real-time index updates (state your freshness). Scale: billions of posts; long-tail queries.
Center: inverted index + distributed query + ACL-aware filtering + ranking budget.
Introduction
Social search is information retrieval plus ACLs. The interview tests whether you know posting lists are not enough—you must prune candidates using who can see what without exploding latency.
Weak answers index full text and filter millions of hits in application code. Strong answers constrain retrieval with visibility metadata in the index, precomputed audience keys, or two-phase retrieval with a bounded candidate set.
How to approach
Clarify public vs network scope. Sketch query → shard fan-out → merge → rank → filter (or filter-before-rank when cheap). Then indexing from post events—async, NRT.
Interview tips
- Two-phase: cheap retrieval (posting list intersection), then visibility filter—or index visibility facets so filter is cheap.
- Skew: viral posts and hot terms—cache top queries; rate limit expensive patterns.
- Deletes and privacy changes: tombstone in index; eventual visibility—state SLA.
- Pagination: same cursor story as feeds when scores change—honest about dup/skip.
Capacity estimation
| Axis | Anchor |
|---|---|
| Corpus | Billions of docs—shard by doc_id or term |
| QPS | High—aggregator bottleneck; cache |
| Freshness | NRT often seconds–minutes—product SLA |
| Long tail | Most queries rare—cache hurt less |
Implications: fan-out queries per shard—bound work; timeout ranking.
High-level architecture
Post service emits events on create/update/delete. Indexer tokenizes, updates inverted indices (often Lucene-class or managed OpenSearch). Query service parses query, routes to index shards, merges top-k per shard, global merge + rank + ACL filter + return.
Who owns what:
- Index clusters — Sharding, replication, segment merges.
- Aggregator — Scatter-gather; timeout; partial results policy.
- Graph / ACL — May serve friend lists or block sets for filter phase.
[ Post write ] --> event bus --> Indexer --> Index shards (inverted lists)
[ Search request ]
--> Query svc --> parse + plan
--> shard1, shard2, ... shardN (parallel top-k)
--> merge candidates
--> ACL filter (viewer context)
--> rank + truncate --> response
In the room: Narrate ACL before or after rank—cost tradeoff.
Core design approaches
Shard by doc vs term
Doc-based shards: write local; query fans out to all shards (or pruned by routing).
Term-based shards: query hits few shards; writes touch many terms—expensive updates.
Visibility
Index visibility enum public | friends | custom list id—filter at query time with viewer context.
Precompute per-viewer bitmaps only for small groups—does not scale to all pairs.
Detailed design
Indexing path
- Event
PostCreatedwith text, author, timestamp, visibility, language. - Tokenizer → term ids; append doc to posting lists; store forward index for snippets.
- Delete events tombstone doc id; merge segments async.
Query path
- Parse query; optional spellcheck.
- Fetch posting lists; intersect; score BM25 + recency + social boost.
- Take top N candidates; filter ACL with graph service (batch).
- Re-rank if needed; return cursor.
Key challenges
- Fan-out latency: many shards in parallel—timeout per shard; degrade gracefully.
- ACL cost: batch graph checks; cache friend lists with short TTL.
- Freshness vs write load: smaller segments = faster NRT, slower merges.
- Spam: downrank or drop at index; rate limit queries by user + IP.
Scaling the system
- Horizontal index nodes; replicas for read QPS.
- Cache popular queries (careful with personalization)**.
- Regional indices if latency or compliance requires—federated query.
Failure handling
| Scenario | Mitigation |
|---|---|
| Shard slow | Timeout; omit shard (best-effort) or retry once |
| Graph slow | Timeout ACL; return only public (degraded) if product allows |
| Indexer lag | Alert on lag; tier hot authors |
API design
| Endpoint | Role |
|---|---|
GET /v1/search | Keyword search |
GET /v1/search
| Param | Role |
|---|---|
q | Query string |
cursor | Pagination |
limit | Cap (server) |
author_id | Filter |
since, until | Time window |
type | post type |
Diagram:
Client --> API GW --> Search svc --> index shards --> merge
|
ACL svc (batch)
Errors: 429 query flood; 400 malformed query.
Production angles
Search behind a social graph is not “Elasticsearch plus ACL.” It is posting lists the size of cities, ACL fan-out that can dwarf retrieval, and near-real-time indexing that is never quite real-time enough for safety or legal. The incidents that wake you up are tail latency on head terms, stale harmful content, and aggregator pools that go rigid under celebrity spikes.
p99 explodes on “cheap” queries and common terms
What it looks like — Latency SLOs breach on queries like a celebrity surname or a viral meme string—not because the index is down, but because candidate generation pulls enormous posting list unions before ACL trims them. Median stays pretty; p99/p999 tells the story. Cache hit rates for top queries mask the problem until a new trend breaks the cache.
Why it happens — Inverted indexes are efficient per term, but high-frequency terms are selective in theory and massive in practice. If you intersect late or pull too many doc IDs before graph filter, you do megabytes of work per request. Stopwords alone do not save you—hashtags, mentions, and numeric spikes behave like stopwords seasonally.
What good teams do — Two-phase retrieval (cheap conjunctive core, then expand); hard caps on candidate pool with safe degradation (“refine query”); blocklists and query rewriting for abusive patterns; WAND-style max-score pruning where the stack supports it. Per-shard latency histograms, not just global—one hot shard from uneven doc distribution ruins p99.
Indexed doc says “live” but the post was deleted or moderated
What it looks like — A harmful post disappears from feed but still appears in search snippets for minutes. Legal escalates; trust teams ask for synchronous removal. NRT pipeline dashboards are green—lag is “only” 60 seconds—which is an eternity for abuse.
Why it happens — NRT is eventually consistent by construction. Delete and visibility updates traverse a different path than initial index; ACL changes may not invalidate every denormalized facet. Compliance often needs stronger guarantees than product search.
What good teams do — Tiered SLAs: sub-minute for safety and legal verticals; sync tombstone path to block doc IDs at query time even if index lags; negative cache of deleted IDs in a fast store. Seniors argue who owns truth (moderation service vs index); juniors should never promise zero staleness at web scale without qualifiers.
Merge-tier overload and “everything is slow” during news spikes
What it looks like — 503 or timeouts from the aggregator even though leaf shards are individually fine. Thread pools for merge or score are saturated; circuit breakers start partial results. Empty or truncated result sets spike—users think search is “censoring” when it is shedding.
Why it happens — Fan-out to many shards is O(shards) coordination; ACL batch calls add RTT and head-of-line blocking. A thundering herd of identical queries after a push notification amplifies coordinated overload.
What good teams do — Request budgets per query phase; early termination when score upper bounds cannot beat current top-k; dedupe identical queries in the aggregator (request coalescing); isolate expensive personalized search from simple keyword browse. Measure shard latency histogram, index lag, ACL batch size and p95, and empty result rate (which can mean over-filtering or broken ACL batch).
[ Query spike ] --> aggregator thread pool full --> 503 or shed
How to use this in an interview — Lead with posting list size + ACL, not inverted index 101. Say freshness SLA in seconds for moderation, and name one mitigation (tombstone layer, stricter lane, or sync delete path). That is how you sound like you have seen search incidents, not just Lucene docs.
Bottlenecks and tradeoffs
- Exact ACL vs latency: Strong filtering may require more round trips.
- Personalization vs cache: Generic top queries cache well; the tail does not.
What interviewers expect
- Retrieval: inverted index, posting lists, optional skip lists; BM25-class scoring at high level.
- Sharding: by doc id vs term—query fan-out vs write distribution tradeoffs.
- ACL: filter candidates using precomputed audience sets, bitsets, or query-time graph checks—state cost.
- Pipeline: async indexer from post events; tombstones for delete; NRT freshness budget.
- Ranking: text relevance + recency + social signals—latency budget.
- Abuse: rate limit queries; spam signals on index ingest.
Interview workflow (template)
- Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
- Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
- APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
- Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
- Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
- Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).
Frequently asked follow-ups
- How is social search different from Google web search?
- How do you enforce privacy in search results?
- How do you shard an inverted index?
- How fresh is the index?
- How do you handle spam posts in the index?
Deep-dive questions and strong answer outlines
Outline the indexing pipeline for a new public post.
Event from post service → tokenizer → updates to posting lists for terms; store doc metadata (author, time, privacy flags). NRT updates with segment merges; delete tombstones for removed posts.
How do you ensure a friends-only post doesn’t appear to strangers?
Index visibility facets (e.g. audience id) or filter at query time using viewer’s friend lists / ACL bitmaps. Admit approximate precomputes vs exact graph checks tradeoff.
What breaks if the index lags five minutes?
Users see stale results—product SLA picks freshness vs cost. Mitigate with tiered indexing for hot authors.
Production angles
- Hot term query fans out to many shards—**merge** bottleneck; **cache** popular queries with short TTL.
- Author blocks someone—**delete** from viewer-specific caches or **filter** dynamically—consistency vs cost.
AI feedback on your design
After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.
FAQs
Q: Do I need to know Lucene internals?
A: Helpful but not required. Know posting lists, segments, merges, and why writes are costly at scale.
Q: Is Elasticsearch enough?
A: Often a starting point; hyperscale usually custom stacks or heavily tuned ES with sharding discipline. Say when you’d outgrow defaults.
Q: What about autocomplete?
A: Separate prefix structures (tries, FST) and rate limits; optional scope—don’t eat the whole round.
Q: Do you index deleted posts?
A: Tombstone or delete events remove doc ids from posting lists; eventual visibility of deletion—state SLA. Hard deletes for legal may need async sweep.
Practice interactively
Open the practice session to use the canvas and stages, then review AI feedback.