← Back to practice catalog

System design interview guide

Design Facebook Post Search

TL;DR:

Users search posts they can see—by keywords, author, time—with latency and freshness expectations that don’t forgive naive “grep the database.” The hard part is inverted index scale plus authorization: two posts might match text, but only one is visible to the searcher. Weak answers index everything and filter later without counting cost.

Problem statement

You’re designing search over social posts: keyword retrieval, filters (author, time, type), pagination, and strict respect for privacy and blocks—at billions of indexed posts and high QPS.

Constraints. Functional: query API, filters, visibility rules. Non-functionally: p95 latency budget (hundreds of ms), high read QPS, near-real-time index updates (state your freshness). Scale: billions of posts; long-tail queries.

Center: inverted index + distributed query + ACL-aware filtering + ranking budget.

Introduction

Social search is information retrieval plus ACLs. The interview tests whether you know posting lists are not enough—you must prune candidates using who can see what without exploding latency.

Weak answers index full text and filter millions of hits in application code. Strong answers constrain retrieval with visibility metadata in the index, precomputed audience keys, or two-phase retrieval with a bounded candidate set.

How to approach

Clarify public vs network scope. Sketch query → shard fan-out → merge → rank → filter (or filter-before-rank when cheap). Then indexing from post eventsasync, NRT.

Interview tips

  • Two-phase: cheap retrieval (posting list intersection), then visibility filter—or index visibility facets so filter is cheap.
  • Skew: viral posts and hot terms—cache top queries; rate limit expensive patterns.
  • Deletes and privacy changes: tombstone in index; eventual visibility—state SLA.
  • Pagination: same cursor story as feeds when scores change—honest about dup/skip.

Capacity estimation

AxisAnchor
CorpusBillions of docs—shard by doc_id or term
QPSHigh—aggregator bottleneck; cache
FreshnessNRT often secondsminutes—product SLA
Long tailMost queries rarecache hurt less

Implications: fan-out queries per shardbound work; timeout ranking.

High-level architecture

Post service emits events on create/update/delete. Indexer tokenizes, updates inverted indices (often Lucene-class or managed OpenSearch). Query service parses query, routes to index shards, merges top-k per shard, global merge + rank + ACL filter + return.

Who owns what:

  • Index clustersSharding, replication, segment merges.
  • AggregatorScatter-gather; timeout; partial results policy.
  • Graph / ACLMay serve friend lists or block sets for filter phase.
[ Post write ] --> event bus --> Indexer --> Index shards (inverted lists)

[ Search request ]
  --> Query svc --> parse + plan
       --> shard1, shard2, ... shardN (parallel top-k)
       --> merge candidates
       --> ACL filter (viewer context)
       --> rank + truncate --> response

In the room: Narrate ACL before or after rankcost tradeoff.

Core design approaches

Shard by doc vs term

Doc-based shards: write local; query fans out to all shards (or pruned by routing).

Term-based shards: query hits few shards; writes touch many termsexpensive updates.

Visibility

Index visibility enum public | friends | custom list idfilter at query time with viewer context.

Precompute per-viewer bitmaps only for small groupsdoes not scale to all pairs.

Detailed design

Indexing path

  1. Event PostCreated with text, author, timestamp, visibility, language.
  2. Tokenizerterm ids; append doc to posting lists; store forward index for snippets.
  3. Delete events tombstone doc id; merge segments async.

Query path

  1. Parse query; optional spellcheck.
  2. Fetch posting lists; intersect; score BM25 + recency + social boost.
  3. Take top N candidates; filter ACL with graph service (batch).
  4. Re-rank if needed; return cursor.

Key challenges

  • Fan-out latency: many shards in paralleltimeout per shard; degrade gracefully.
  • ACL cost: batch graph checks; cache friend lists with short TTL.
  • Freshness vs write load: smaller segments = faster NRT, slower merges.
  • Spam: downrank or drop at index; rate limit queries by user + IP.

Scaling the system

  • Horizontal index nodes; replicas for read QPS.
  • Cache popular queries (careful with personalization)**.
  • Regional indices if latency or compliance requiresfederated query.

Failure handling

ScenarioMitigation
Shard slowTimeout; omit shard (best-effort) or retry once
Graph slowTimeout ACL; return only public (degraded) if product allows
Indexer lagAlert on lag; tier hot authors

API design

EndpointRole
GET /v1/searchKeyword search

GET /v1/search

ParamRole
qQuery string
cursorPagination
limitCap (server)
author_idFilter
since, untilTime window
typepost type

Diagram:

Client --> API GW --> Search svc --> index shards --> merge
                                              |
                                         ACL svc (batch)

Errors: 429 query flood; 400 malformed query.

Production angles

Search behind a social graph is not “Elasticsearch plus ACL.” It is posting lists the size of cities, ACL fan-out that can dwarf retrieval, and near-real-time indexing that is never quite real-time enough for safety or legal. The incidents that wake you up are tail latency on head terms, stale harmful content, and aggregator pools that go rigid under celebrity spikes.

p99 explodes on “cheap” queries and common terms

What it looks like — Latency SLOs breach on queries like a celebrity surname or a viral meme string—not because the index is down, but because candidate generation pulls enormous posting list unions before ACL trims them. Median stays pretty; p99/p999 tells the story. Cache hit rates for top queries mask the problem until a new trend breaks the cache.

Why it happensInverted indexes are efficient per term, but high-frequency terms are selective in theory and massive in practice. If you intersect late or pull too many doc IDs before graph filter, you do megabytes of work per request. Stopwords alone do not save you—hashtags, mentions, and numeric spikes behave like stopwords seasonally.

What good teams doTwo-phase retrieval (cheap conjunctive core, then expand); hard caps on candidate pool with safe degradation (“refine query”); blocklists and query rewriting for abusive patterns; WAND-style max-score pruning where the stack supports it. Per-shard latency histograms, not just global—one hot shard from uneven doc distribution ruins p99.

Indexed doc says “live” but the post was deleted or moderated

What it looks like — A harmful post disappears from feed but still appears in search snippets for minutes. Legal escalates; trust teams ask for synchronous removal. NRT pipeline dashboards are green—lag is “only” 60 seconds—which is an eternity for abuse.

Why it happensNRT is eventually consistent by construction. Delete and visibility updates traverse a different path than initial index; ACL changes may not invalidate every denormalized facet. Compliance often needs stronger guarantees than product search.

What good teams doTiered SLAs: sub-minute for safety and legal verticals; sync tombstone path to block doc IDs at query time even if index lags; negative cache of deleted IDs in a fast store. Seniors argue who owns truth (moderation service vs index); juniors should never promise zero staleness at web scale without qualifiers.

Merge-tier overload and “everything is slow” during news spikes

What it looks like503 or timeouts from the aggregator even though leaf shards are individually fine. Thread pools for merge or score are saturated; circuit breakers start partial results. Empty or truncated result sets spike—users think search is “censoring” when it is shedding.

Why it happensFan-out to many shards is O(shards) coordination; ACL batch calls add RTT and head-of-line blocking. A thundering herd of identical queries after a push notification amplifies coordinated overload.

What good teams doRequest budgets per query phase; early termination when score upper bounds cannot beat current top-k; dedupe identical queries in the aggregator (request coalescing); isolate expensive personalized search from simple keyword browse. Measure shard latency histogram, index lag, ACL batch size and p95, and empty result rate (which can mean over-filtering or broken ACL batch).

[ Query spike ] --> aggregator thread pool full --> 503 or shed

How to use this in an interview — Lead with posting list size + ACL, not inverted index 101. Say freshness SLA in seconds for moderation, and name one mitigation (tombstone layer, stricter lane, or sync delete path). That is how you sound like you have seen search incidents, not just Lucene docs.

Bottlenecks and tradeoffs

  • Exact ACL vs latency: Strong filtering may require more round trips.
  • Personalization vs cache: Generic top queries cache well; the tail does not.

What interviewers expect

  • Retrieval: inverted index, posting lists, optional skip lists; BM25-class scoring at high level.
  • Sharding: by doc id vs term—query fan-out vs write distribution tradeoffs.
  • ACL: filter candidates using precomputed audience sets, bitsets, or query-time graph checks—state cost.
  • Pipeline: async indexer from post events; tombstones for delete; NRT freshness budget.
  • Ranking: text relevance + recency + social signals—latency budget.
  • Abuse: rate limit queries; spam signals on index ingest.

Interview workflow (template)

  1. Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
  2. Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
  3. APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
  4. Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
  5. Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
  6. Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).

Frequently asked follow-ups

  • How is social search different from Google web search?
  • How do you enforce privacy in search results?
  • How do you shard an inverted index?
  • How fresh is the index?
  • How do you handle spam posts in the index?

Deep-dive questions and strong answer outlines

Outline the indexing pipeline for a new public post.

Event from post service → tokenizer → updates to posting lists for terms; store doc metadata (author, time, privacy flags). NRT updates with segment merges; delete tombstones for removed posts.

How do you ensure a friends-only post doesn’t appear to strangers?

Index visibility facets (e.g. audience id) or filter at query time using viewer’s friend lists / ACL bitmaps. Admit approximate precomputes vs exact graph checks tradeoff.

What breaks if the index lags five minutes?

Users see stale results—product SLA picks freshness vs cost. Mitigate with tiered indexing for hot authors.

Production angles

  • Hot term query fans out to many shards—**merge** bottleneck; **cache** popular queries with short TTL.
  • Author blocks someone—**delete** from viewer-specific caches or **filter** dynamically—consistency vs cost.

AI feedback on your design

After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.

FAQs

Q: Do I need to know Lucene internals?

A: Helpful but not required. Know posting lists, segments, merges, and why writes are costly at scale.

Q: Is Elasticsearch enough?

A: Often a starting point; hyperscale usually custom stacks or heavily tuned ES with sharding discipline. Say when you’d outgrow defaults.

Q: What about autocomplete?

A: Separate prefix structures (tries, FST) and rate limits; optional scope—don’t eat the whole round.

Q: Do you index deleted posts?

A: Tombstone or delete events remove doc ids from posting lists; eventual visibility of deletion—state SLA. Hard deletes for legal may need async sweep.

Practice interactively

Open the practice session to use the canvas and stages, then review AI feedback.