System design interview guide
Design Nearby Friends Search
TL;DR: Users who **opt in** want to see **which friends are roughly nearby** without broadcasting exact addresses. The system must combine **location streams** (noisy, bursty) with the **social graph**, respect **privacy** and **freeze** modes, and answer queries fast enough that the feature feels live—not a batch job. The interview is about **spatial indexing + graph membership**, not Haversine in a loop for every friend.
Problem statement
You’re designing Nearby Friends: users who opt in can see which friends are approximately within a distance, with controls on who can see them and modes like frozen or hidden.
Constraints. Functional: opt in/out; visibility subsets; approximate distance; respect frozen/hidden. Non-functional: sub-second queries for typical friend list sizes; minimize sensitive PII; tolerate bursty location writes. Scale: hundreds of millions of users, global footprint, uneven density (events, cities).
Center: spatial indexing + social graph filter + privacy-preserving quantization—not pairwise distance for the entire graph.
Introduction
This problem is half spatial indexing and half privacy. Weak answers compute distance to every friend on every request—O(n) Haversine against a global user table. Strong answers reduce the candidate set using grid cells (geohash, S2, quadtree), then intersect with the friend graph, then refine distance for a small set.
Interviewers dock you if you store precise coordinates forever, ignore opt-in, or forget hot cells (concerts, stadiums).
How to approach
Confirm precision (neighborhood vs exact), who is in the graph, and update policy (how often GPS fires). Sketch write path (quantize → index) and read path (cells → candidates → filter → rank). Privacy and TTL are not footnotes—they shape storage.
Interview tips
- S2 / geohash: Map lat/lon to cell id; nearby = union of neighboring cells at appropriate resolution—prefix queries in KV.
- TTL: Raw location blobs expire (e.g. 24h) unless refreshed—reduces legal exposure and storage.
- Frozen mode: Stop writing location; reads return last known or hidden—state machine, not a boolean flag only.
- Friend count: If median friends are hundreds, intersect in memory is fine; P99 thousands may need precomputed friend-of sets or Bloom filters—mention if pressed.
- VPN / mock GPS: Degrade gracefully—show “location unknown” rather than wrong city—optional signals from device.
Capacity estimation
| Load | Note |
|---|---|
| Location writes | Can exceed read QPS if unthrottled—sample and quantize |
| Index | Cell → set of user ids or user → cell—choose based on query vs update pattern |
| Friends per user | Median small; P99 still matters for algorithm choice |
| Hot cells | Stadium → write and read spike on same shard |
Implications: Batch writes; regional partition of spatial index; rate limit per device.
High-level architecture
Location ingress receives quantized updates (only when cell changes or min interval elapsed). Location service stores user_id → cell_id, coarse_lat, coarse_lng, updated_at with TTL. Social graph service returns friend ids (with permission). Query service runs nearby for viewer: compute cells covering radius → lookup candidates in spatial index → intersect with allowed friend set → Haversine refine → rank.
Who owns what:
- Graph service — Friend edges, blocks, visibility lists (who can see me).
- Location index — Short TTL PII; sharded by region or cell prefix.
- Query API — Orchestrates; enforces blocks on result set.
[ Client GPS ] --throttle--> [ Location ingress ]
|
quantize to cell_id, TTL
v
[ Location index KV ]
(cell_id -> user_ids) OR (user_id -> cell_id)
[ Query: nearby friends for viewer V ]
1) Graph: friend_ids + visibility
2) V's position -> cells covering R
3) Index: candidate user_ids in those cells
4) candidates ∩ friends ∩ not blocked
5) refine distance, rank, return coarse distance + last_updated
In the room: Say why step 4 is small—spatial pruning before graph, not after scanning all friends.
Core design approaches
Indexing pattern A: user → cell
Write: O(1) update user_id → cell. Read: Union all friend cells’ neighbors—works if friends are few; worse if many friends spread globally.
Indexing pattern B: cell → users (posting lists)
Write: Add user to cell set (or geohash prefix structure). Read: Scan cells in radius → candidates → intersect friends—better when radius is small vs global user density.
Hybrid: Store user→cell for latest; maintain reverse index for query—choose tradeoff for hot cells.
Privacy
Display rounded distance buckets (“within 500m”) not exact meters unless product insists. Coarser cell in rural areas may need dynamic resolution—S2 levels help.
Detailed design
Write path (location update)
- Client obtains GPS; drops if accuracy poor or frozen.
- Client maps to cell at level L; if same as last sent and time not elapsed, skip network.
- Server upserts
user_idlocation row; updates cell index atomically (or async with eventual consistency + honest stale UI).
Read path (nearby friends)
- Auth viewer; verify opt-in and not hidden.
- Load friend ids from graph (cached).
- Map viewer position to cells for radius R (may be ring of cells).
- Fetch candidate user ids from spatial structure per cell (merge dedupe).
- Filter
candidates ∩ friends; exclude blocked; drop stale beyond TTL. - Compute distance for survivors; sort by distance + recency + affinity weights.
Key challenges
- Write amplification: Raw GPS at 1 Hz × DAU = unsustainable—quantize, event-based updates.
- Hot cells: Millions of users in one cell—shard by subcell or cap result size; precompute nothing global.
- Stale vs creepy: Showing friend “nearby” when they left 30 min ago—show last updated in UI; expire old dots.
- Graph churn: Unfriend / block must reflect quickly on query path—cache friend lists with short TTL.
- Cross-region: Friends traveling—location region may differ from graph region—federated query or central index with latency cost.
Scaling the system
- Partition location data by geographic region (rough) or cell prefix—locality of writes.
- Cache friend lists at edge with invalidation on social events.
- Read replicas for graph; primary for location writes in hot regions.
- Rate limit query API to prevent stalking enumeration (scanning grid).
Failure handling
| Scenario | Degraded UX | Mitigation |
|---|---|---|
| Stale index | Wrong nearby set briefly | TTL + version; client refresh |
| Graph service slow | Slow query | Timeout partial; cache friends |
| Bad GPS | Jump across cells | Accuracy threshold; ignore outliers |
| Regional outage | Empty or stale | Fail partial; no fabricated precision |
API design
| Endpoint | Role |
|---|---|
POST /v1/location:update | Opt-in required; body: quantized cell or raw (server quantizes) |
GET /v1/nearby/friends | Returns list with approx_distance_bucket, last_seen |
POST /v1/privacy:freeze | Stops updates |
Query params (GET /v1/nearby/friends):
| Param | Role |
|---|---|
radius_m | Max search radius (capped server-side) |
limit | Max results |
include_stale | Whether to show expired locations |
Diagram:
GET /nearby/friends?radius_m=500
--> Auth
--> Graph: friend_ids (cache)
--> Location index: cells from viewer pos
--> intersect + filter + rank --> JSON
Errors: 403 if not opt-in; 429 if abuse; empty 200 if no friends in range—not 404.
Production angles
Location and social graphs combine spatial skew, privacy law, and human drama. Production is not “geohash plus Redis”—it is hot cells during events, stalking patterns that look like API traffic, and index writes that outrun mobile battery budgets for location sampling.
Concerts, stadiums, and festivals — one cell becomes a city
What it looks like — p99 latency for GET /nearby/friends spikes when everyone maps to the same H3/S2 cell; CPU on the spatial index shard redlines; results truncate or time out even though global QPS looks normal.
Why it happens — Posting lists per cell grow with crowd density; intersection with friends stays large after graph filter if the user has many friends onsite; no early termination or candidate cap in the retrieval plan.
What good teams do — Adaptive subcell splitting under load; hard caps with deterministic ranking explained to users; optional “event mode” pre-index (friends checked into the same venue)—product scope, but real at scale. Precompute a close-friends subgraph when the product allows to shrink intersection cost.
Stalking reports and abuse — your API is a surveillance tool
What it looks like — User reports “someone knew where I was”; legal asks who could see location; an internal actor queries a celebrity every minute from a support tool.
Why it happens — Fine-grained location plus frequent updates equals a tracking surface; predictable patterns (daily commute) re-identify people even with coarse display; no audit trail on queries—only on writes.
What good teams do — Audit who queried whom when policy requires; coarser display precision or fuzzing for non-mutual friends; rate limits and anomaly detection on repeated queries against one target; break historical trails with TTL and minimum sampling intervals. Privacy review before shipping “nearby friends on a map.”
Location write storms and index write QPS
What it looks like — Mobile clients over-sample GPS after a bad release; index writers fall behind; reads return stale positions or empty sets inconsistently.
Why it happens — Battery vs freshness tradeoff pushed to the server without admission control; fan-out writes to multiple indices (cell + user document).
What good teams do — Throttle POST /location per device with 429 and Retry-After; batch updates client-side; backpressure when index lag exceeds SLO. Measure cells with highest cardinality, query p99, index lag.
[ Location write storm ] → index write QPS ↑
→ throttle clients → 429 on POST /location
How to use this in an interview — Tie spatial hot spots to candidate limits and privacy to audit and coarsening—not just geohash length. Close with why O(friends) refinement works when degree is bounded, but spatial search must never degenerate to O(all users).
Bottlenecks and tradeoffs
- Precision vs privacy: Finer cells improve accuracy worsen re-identification risk.
- Pull vs push: Push “friend nearby” notifications are engaging but spammy—quiet hours, dedupe.
What interviewers expect
- Privacy: explicit opt-in; coarse cells; TTL on stored locations; frozen/hidden modes; right to delete.
- Spatial index: geohash, S2, or quadtree—prefix or cell queries; why not O(all users).
- Graph filter: restrict candidates to friends (or subset)—intersection strategy.
- Write path: quantize movement; throttle; update only on cell change or time threshold.
- Read path: map viewer position → cells covering radius → candidates → intersect friend set → refine distance.
- Abuse: stalking, mock GPS—rate limits, blocks, optional audit.
- Scale: hot cells (stadiums); regional sharding; honest freshness SLAs.
Interview workflow (template)
- Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
- Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
- APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
- Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
- Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
- Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).
Frequently asked follow-ups
- How do you store location updates at scale?
- Why not SQL `distance < R` for all friends?
- How do you protect exact location?
- How often should clients send GPS?
- How do you combine social graph with geo index?
Deep-dive questions and strong answer outlines
How would you query “friends within 500m” efficiently?
Map user position to cell(s) covering radius; fetch candidate user ids in those cells from spatial index. Intersect with friend id set (hash set in memory if small, or indexed join if huge). Refine distance for final ordering.
How do you avoid updating the index on every GPS tick?
Quantize movement into cells; only emit updates when cell changes or time threshold. Adaptive: faster updates when moving fast; sleep when stationary—saves write QPS.
What goes wrong if you store exact lat/long forever?
Privacy and regulatory risk; retention limits and coarse display in UI. Strong answers mention deletion on opt-out and region differences.
Production angles
- Concert spike: many users in one cell—**hot shard**; pre-aggregate or cap precision.
- Stalking reports—**rate limits**, **blocks**, **audit** on who queried whom if policy requires.
AI feedback on your design
After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.
FAQs
Q: Do I need machine learning for ranking?
A: Optional. Heuristics: distance, recency, interaction frequency. ML might rerank—don’t block the round on it unless asked.
Q: Is PostGIS enough?
A: For moderate scale, maybe. At global DAU with bursty writes, you often want sharded spatial indexes (e.g. per region) and cell keys in KV stores—not one giant spatial DB fantasy.
Q: How is this different from Uber driver matching?
A: Nearby friends is graph-filtered proximity with privacy constraints; ride-sharing optimizes supply/demand and ETA. Different ranking and safety surfaces.
Q: Should location live in the same DB as the social graph?
A: Often no—different update rates, retention, and compliance. A location service with short TTL and coarse cells pairs with a graph service for edges. Integrate at query time.
Practice interactively
Open the practice session to use the canvas and stages, then review AI feedback.