← Back to practice catalog

System design interview guide

Design Nearby Friends Search

Problem statement

You’re designing Nearby Friends: users who opt in can see which friends are approximately within a distance, with controls on who can see them and modes like frozen or hidden.

Constraints. Functional: opt in/out; visibility subsets; approximate distance; respect frozen/hidden. Non-functional: sub-second queries for typical friend list sizes; minimize sensitive PII; tolerate bursty location writes. Scale: hundreds of millions of users, global footprint, uneven density (events, cities).

Center: spatial indexing + social graph filter + privacy-preserving quantization—not pairwise distance for the entire graph.

Introduction

This problem is half spatial indexing and half privacy. Weak answers compute distance to every friend on every request—O(n) Haversine against a global user table. Strong answers reduce the candidate set using grid cells (geohash, S2, quadtree), then intersect with the friend graph, then refine distance for a small set.

Interviewers dock you if you store precise coordinates forever, ignore opt-in, or forget hot cells (concerts, stadiums).

How to approach

Confirm precision (neighborhood vs exact), who is in the graph, and update policy (how often GPS fires). Sketch write path (quantize → index) and read path (cells → candidates → filter → rank). Privacy and TTL are not footnotes—they shape storage.

Interview tips

  • S2 / geohash: Map lat/lon to cell id; nearby = union of neighboring cells at appropriate resolutionprefix queries in KV.
  • TTL: Raw location blobs expire (e.g. 24h) unless refreshed—reduces legal exposure and storage.
  • Frozen mode: Stop writing location; reads return last known or hidden—state machine, not a boolean flag only.
  • Friend count: If median friends are hundreds, intersect in memory is fine; P99 thousands may need precomputed friend-of sets or Bloom filters—mention if pressed.
  • VPN / mock GPS: Degrade gracefully—show “location unknown” rather than wrong city—optional signals from device.

Capacity estimation

LoadNote
Location writesCan exceed read QPS if unthrottled—sample and quantize
IndexCell → set of user ids or user → cell—choose based on query vs update pattern
Friends per userMedian small; P99 still matters for algorithm choice
Hot cellsStadiumwrite and read spike on same shard

Implications: Batch writes; regional partition of spatial index; rate limit per device.

High-level architecture

Location ingress receives quantized updates (only when cell changes or min interval elapsed). Location service stores user_id → cell_id, coarse_lat, coarse_lng, updated_at with TTL. Social graph service returns friend ids (with permission). Query service runs nearby for viewer: compute cells covering radius → lookup candidates in spatial indexintersect with allowed friend set → Haversine refine → rank.

Who owns what:

  • Graph service — Friend edges, blocks, visibility lists (who can see me).
  • Location index — Short TTL PII; sharded by region or cell prefix.
  • Query APIOrchestrates; enforces blocks on result set.
[ Client GPS ] --throttle--> [ Location ingress ]
                              |
              quantize to cell_id, TTL
                              v
                        [ Location index KV ]
                        (cell_id -> user_ids)  OR  (user_id -> cell_id)

[ Query: nearby friends for viewer V ]
  1) Graph: friend_ids + visibility
  2) V's position -> cells covering R
  3) Index: candidate user_ids in those cells
  4) candidates ∩ friends ∩ not blocked
  5) refine distance, rank, return coarse distance + last_updated

In the room: Say why step 4 is smallspatial pruning before graph, not after scanning all friends.

Core design approaches

Indexing pattern A: user → cell

Write: O(1) update user_id → cell. Read: Union all friend cells’ neighbors—works if friends are few; worse if many friends spread globally.

Indexing pattern B: cell → users (posting lists)

Write: Add user to cell set (or geohash prefix structure). Read: Scan cells in radius → candidatesintersect friends—better when radius is small vs global user density.

Hybrid: Store user→cell for latest; maintain reverse index for query—choose tradeoff for hot cells.

Privacy

Display rounded distance buckets (“within 500m”) not exact meters unless product insists. Coarser cell in rural areas may need dynamic resolution—S2 levels help.

Detailed design

Write path (location update)

  1. Client obtains GPS; drops if accuracy poor or frozen.
  2. Client maps to cell at level L; if same as last sent and time not elapsed, skip network.
  3. Server upserts user_id location row; updates cell index atomically (or async with eventual consistency + honest stale UI).

Read path (nearby friends)

  1. Auth viewer; verify opt-in and not hidden.
  2. Load friend ids from graph (cached).
  3. Map viewer position to cells for radius R (may be ring of cells).
  4. Fetch candidate user ids from spatial structure per cell (merge dedupe).
  5. Filter candidates ∩ friends; exclude blocked; drop stale beyond TTL.
  6. Compute distance for survivors; sort by distance + recency + affinity weights.

Key challenges

  • Write amplification: Raw GPS at 1 Hz × DAU = unsustainablequantize, event-based updates.
  • Hot cells: Millions of users in one cell—shard by subcell or cap result size; precompute nothing global.
  • Stale vs creepy: Showing friend “nearby” when they left 30 min ago—show last updated in UI; expire old dots.
  • Graph churn: Unfriend / block must reflect quickly on query path—cache friend lists with short TTL.
  • Cross-region: Friends travelinglocation region may differ from graph regionfederated query or central index with latency cost.

Scaling the system

  • Partition location data by geographic region (rough) or cell prefixlocality of writes.
  • Cache friend lists at edge with invalidation on social events.
  • Read replicas for graph; primary for location writes in hot regions.
  • Rate limit query API to prevent stalking enumeration (scanning grid).

Failure handling

ScenarioDegraded UXMitigation
Stale indexWrong nearby set brieflyTTL + version; client refresh
Graph service slowSlow queryTimeout partial; cache friends
Bad GPSJump across cellsAccuracy threshold; ignore outliers
Regional outageEmpty or staleFail partial; no fabricated precision

API design

EndpointRole
POST /v1/location:updateOpt-in required; body: quantized cell or raw (server quantizes)
GET /v1/nearby/friendsReturns list with approx_distance_bucket, last_seen
POST /v1/privacy:freezeStops updates

Query params (GET /v1/nearby/friends):

ParamRole
radius_mMax search radius (capped server-side)
limitMax results
include_staleWhether to show expired locations

Diagram:

GET /nearby/friends?radius_m=500
  --> Auth
  --> Graph: friend_ids (cache)
  --> Location index: cells from viewer pos
  --> intersect + filter + rank --> JSON

Errors: 403 if not opt-in; 429 if abuse; empty 200 if no friends in range—not 404.

Production angles

Location and social graphs combine spatial skew, privacy law, and human drama. Production is not “geohash plus Redis”—it is hot cells during events, stalking patterns that look like API traffic, and index writes that outrun mobile battery budgets for location sampling.

Concerts, stadiums, and festivals — one cell becomes a city

What it looks like — p99 latency for GET /nearby/friends spikes when everyone maps to the same H3/S2 cell; CPU on the spatial index shard redlines; results truncate or time out even though global QPS looks normal.

Why it happens — Posting lists per cell grow with crowd density; intersection with friends stays large after graph filter if the user has many friends onsite; no early termination or candidate cap in the retrieval plan.

What good teams do — Adaptive subcell splitting under load; hard caps with deterministic ranking explained to users; optional “event mode” pre-index (friends checked into the same venue)—product scope, but real at scale. Precompute a close-friends subgraph when the product allows to shrink intersection cost.

Stalking reports and abuse — your API is a surveillance tool

What it looks like — User reports “someone knew where I was”; legal asks who could see location; an internal actor queries a celebrity every minute from a support tool.

Why it happens — Fine-grained location plus frequent updates equals a tracking surface; predictable patterns (daily commute) re-identify people even with coarse display; no audit trail on queries—only on writes.

What good teams do — Audit who queried whom when policy requires; coarser display precision or fuzzing for non-mutual friends; rate limits and anomaly detection on repeated queries against one target; break historical trails with TTL and minimum sampling intervals. Privacy review before shipping “nearby friends on a map.”

Location write storms and index write QPS

What it looks like — Mobile clients over-sample GPS after a bad release; index writers fall behind; reads return stale positions or empty sets inconsistently.

Why it happens — Battery vs freshness tradeoff pushed to the server without admission control; fan-out writes to multiple indices (cell + user document).

What good teams do — Throttle POST /location per device with 429 and Retry-After; batch updates client-side; backpressure when index lag exceeds SLO. Measure cells with highest cardinality, query p99, index lag.

[ Location write storm ] → index write QPS ↑
       → throttle clients → 429 on POST /location

How to use this in an interview — Tie spatial hot spots to candidate limits and privacy to audit and coarsening—not just geohash length. Close with why O(friends) refinement works when degree is bounded, but spatial search must never degenerate to O(all users).

Bottlenecks and tradeoffs

  • Precision vs privacy: Finer cells improve accuracy worsen re-identification risk.
  • Pull vs push: Push “friend nearby” notifications are engaging but spammyquiet hours, dedupe.

What interviewers expect

  • Privacy: explicit opt-in; coarse cells; TTL on stored locations; frozen/hidden modes; right to delete.
  • Spatial index: geohash, S2, or quadtreeprefix or cell queries; why not O(all users).
  • Graph filter: restrict candidates to friends (or subset)—intersection strategy.
  • Write path: quantize movement; throttle; update only on cell change or time threshold.
  • Read path: map viewer position → cells covering radius → candidatesintersect friend set → refine distance.
  • Abuse: stalking, mock GPS—rate limits, blocks, optional audit.
  • Scale: hot cells (stadiums); regional sharding; honest freshness SLAs.

Interview workflow (template)

  1. Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
  2. Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
  3. APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
  4. Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
  5. Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
  6. Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).

Frequently asked follow-ups

  • How do you store location updates at scale?
  • Why not SQL `distance < R` for all friends?
  • How do you protect exact location?
  • How often should clients send GPS?
  • How do you combine social graph with geo index?

Deep-dive questions and strong answer outlines

How would you query “friends within 500m” efficiently?

Map user position to cell(s) covering radius; fetch candidate user ids in those cells from spatial index. Intersect with friend id set (hash set in memory if small, or indexed join if huge). Refine distance for final ordering.

How do you avoid updating the index on every GPS tick?

Quantize movement into cells; only emit updates when cell changes or time threshold. Adaptive: faster updates when moving fast; sleep when stationary—saves write QPS.

What goes wrong if you store exact lat/long forever?

Privacy and regulatory risk; retention limits and coarse display in UI. Strong answers mention deletion on opt-out and region differences.

Production angles

  • Concert spike: many users in one cell—**hot shard**; pre-aggregate or cap precision.
  • Stalking reports—**rate limits**, **blocks**, **audit** on who queried whom if policy requires.

AI feedback on your design

After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.

FAQs

Q: Do I need machine learning for ranking?

A: Optional. Heuristics: distance, recency, interaction frequency. ML might rerank—don’t block the round on it unless asked.

Q: Is PostGIS enough?

A: For moderate scale, maybe. At global DAU with bursty writes, you often want sharded spatial indexes (e.g. per region) and cell keys in KV stores—not one giant spatial DB fantasy.

Q: How is this different from Uber driver matching?

A: Nearby friends is graph-filtered proximity with privacy constraints; ride-sharing optimizes supply/demand and ETA. Different ranking and safety surfaces.

Q: Should location live in the same DB as the social graph?

A: Often no—different update rates, retention, and compliance. A location service with short TTL and coarse cells pairs with a graph service for edges. Integrate at query time.

Practice interactively

Open the practice session to use the canvas and stages, then review AI feedback.