← Back to practice catalog

System design interview guide

Design Instagram Feed

TL;DR: From the user’s side, Instagram is scroll, tap, watch: photos, carousels, Reels, and Stories that disappear tomorrow. From your side, the product is bytes: huge video objects on cellular networks, thumbnails that must beat human perception of “slow,” and the same skewed social graph every large app has—reads crush writes, a few creators behave like broadcasters, and any design that does unbounded work per swipe or ships full-resolution video on every load will fail in cost and p99 before it fails in a diagram review. You separate metadata from media, async anything that smells like ffmpeg, and you name how the feed stays fast when the ranker hiccups or the transcode queue backs up.

Problem statement

You’re designing the Instagram-style home experience: a scrollable stream of photos and video from accounts you follow, plus Stories (24h) in most interview variants, with likes, comments, and a follow graph. At this scale the interesting design is not an ER diagram—it’s how you keep p95 sane when every post drags megabytes behind it.

Product framing. Users post media, follow people, scroll a ranked or time-ordered feed, and open Stories that expire. Non-functionally the bar is mobile-first: first meaningful paint for thumbnails, smooth video start, and feed loads that don’t block on transcoding. Call a concrete target (e.g. feed p95 under a few hundred ms for the metadata + URLs path—not “we downloaded every byte in 50ms” fairy tales).

Scale: hundreds of millions of DAU, enormous read:write ratio, tens of millions of uploads a day, and skew—a handful of accounts have follower counts that break naive fan-out. Egress and encoding cost are first-class constraints, not footnotes.

What to walk through as a story: upload → durable metadata → async media pipeline; read → candidate IDs → batch metadata → CDN variants; fan-out vs hybrid for celebrities; Stories TTL; pagination under moving ranks; degradation when ranker or transcode is slow. Admit eventual consistency where the real world does.

Introduction

The interview question.

You will usually get a prompt like “Design Instagram’s feed” or “Design a scalable photo/video social feed.” The job is to explain end to end: how a user scrolls a stream of posts from accounts they follow, how uploading a photo or video becomes something others can see, and how media files are stored and delivered on mobile networks without putting unlimited work on one server or shipping full-resolution video on every swipe.

Most sessions focus on Home (and often Stories); a full Explore recommender is optional unless the interviewer steers you there.

What skills and technologies get tested.

Interviewers are checking system design, not Figma. You should be ready to discuss:

  • APIs and request path — authentication, rate limits, BFF-style aggregation, timeouts.
  • Media pipeline — object storage (S3-like blobs), transcoding and thumbnails, async jobs or queues, CDN delivery and cache behavior.
  • Events and messaging — task queues vs event logs (Kafka-style) for fan-out, transcode, notifications; at-least-once delivery and idempotent workers; transactional outbox or CDC so “DB committed” implies “work will run.”
  • Feed construction — candidate post IDs (fan-out mailboxes, merge-on-read, hybrid), batch metadata fetches, pagination with cursors.
  • Social graph — follows, blocks, private accounts; often a dedicated graph service or cached edges.
  • Ranking (when in scope) — separating candidate generation from scoring, latency budgets, fallback when the ranker is slow.
  • Scale and cost — read-heavy traffic, hot creators, egress and encoding as money problems, backpressure on queues.
  • Failure modes — what the app shows when transcode or CDN lags; eventual consistency without pretending one global synchronous lock.

Concrete product names (Kafka, Redis, a specific SQL engine) matter less than clear data flow, bounded work per request, and honest degradation stories.

A simple mental model:

Think in two layers. First, a light metadata layer (who posted what, order, permissions).

Second, a heavy media layer (large files in object storage, CDN URLs, transcoding in the background). The feed is not “one big SQL join”—it is pointers to blobs, adaptive playback, and a traffic pattern common to large social apps: many people scroll, few people post, and a small number of accounts behave like broadcasters.

How to approach

Start by clarifying scope—not by drawing every box. Agree whether you mean Home vs Stories vs Explore, whether ranking is part of the story, and how fresh content must be: seconds for “can I see this post?” vs minutes for “every quality tier of the video exists.” That choice steers you toward time-ordered feeds vs heavier online ranking.

Next, do one rough capacity pass: feed reads vs new posts, fan-out (how many followers get a copy of a post ID) for a normal user vs a mega-creator, and bandwidth out (egress) as a real cost—not only “requests per second.” Then walk through one upload (upload → storage → post record → background jobs) and one feed read (candidate IDs → fetch text metadata → turn file hints into CDN URLs → optional rank → cursor). Debate Redis vs Cassandra only after that story is clear.

If time is tight, prioritize video and hot accounts: how you treat celebrities, what happens when encoding queues grow, and how pagination behaves when order can change. That beats adding another unnamed rectangle to the diagram.

Interview tips

  • Thumbnails vs full ladders: Say that small previews (poster frames, low quality rungs) can appear before every bitrate step exists, and that the app may start playing while higher rungs finish.
  • Where feed IDs come from: Name fan-out, pull, or hybrid—not just “we cache the feed.” For huge creators, say merge on read, partial fan-out, or a cap; promising “we fan out to everyone, always” invites hard follow-ups.
  • CDN: Go one level deeper than a vendor name—what identifies a cached object (variant, resolution, region), how long it stays cached, how bad files get removed, and signed or stable URLs.
  • Candidates vs ranking: First produce a list of candidate post IDs; then optionally reorder with a ranker. If ranking is slow, use a time limit and a fallback order.
  • Stories: Often expire in ~24 hours, need cleanup jobs, and may reuse the same social graph as the main feed but not the same rules for storage or ranking.
  • When things lag: Be ready for encoding backlogs and CDN cache misses. Good answers include graceful UX (lower quality, placeholders, clear states) instead of a blank screen.
  • Blocks and unfollows: It is fine to say precomputed lists can be slightly stale if you re-check rules when serving so safety paths stay correct.

Capacity estimation

Use these only as ballpark anchors—adjust with your interviewer:

InputOrder of magnitude
DAU100M+ (varies by region)
Feed fetches per DAUMany sessions; clients reuse data, so not every scroll hits the server like a full reload
Uploads / dayTens of millions; video drives most storage and processing cost
Read vs write QPSFar more feed reads than new posts
EgressOften the main cost alongside compute—say data out, not only request counts

Implications:

Any design where work per post grows with follower count without a cap breaks for celebrities. Transcoding and fan-out should run asynchronously with backpressure when queues grow. The feed request should limit how many candidates enter ranking so tail latency (for example p95) stays under control.

High-level architecture

How to use this section in an interview.

Your goal is one coherent system picture: which services exist, what each one stores, and how a read differs from a write. The architecture is split on purpose: thin orchestration and metadata (fast, per-scroll) stay in the feed path; large files live in object storage and are served from the CDN; slow work (transcode, fan-out to many followers) runs asynchronously so posting and scrolling stay responsive. The subsections below give you the explanation first, then a step-by-step script you can narrate while you draw.

What the architecture is (explain before you label boxes)

From the phone’s perspective, opening or scrolling home means: “give me the next batch of posts after where I left off.” The app receives JSON—captions, author fields, and URLs for thumbnails and video—not multi-megabyte files inlined in the response. The device then fetches media separately from CDN caches close to the user.

From the backend’s perspective, there are two main flows on the same diagram:

  • Read path (scroll): Edge / API authenticates and rate-limits → Feed service asks for the next page (using a cursor) → it loads candidate post IDs from mailbox / merge / hybrid storage → it batch-loads post rows and author info from a post-oriented store → it checks graph rules (blocks, privacy) → it turns media keys into CDN URLs → an optional ranker reorders within a time limit → the API returns items + cursor. Bytes do not stream through the feed tier on this path—only pointers and URLs do.

  • Write path (post + fan-out): Upload lands in object storage; a post record becomes durable with processing flags; transcode / thumbnail jobs run in the media pipeline; when the product allows visibility, fan-out workers enqueue or write post IDs into per-viewer candidate lists (mailboxes) where your push policy applies, or you rely on merge for hot creators. That split—durable metadata first, async encoding and distribution—is the core architectural idea.

Why so many boxes?

Separation of concerns: the feed service is an orchestrator, not the long-term source of truth for every post or every friendship. Candidate storage answers “in what order might posts appear for this viewer?” Post service answers “what is this post?” Graph service answers “who is this viewer allowed to see?” Object storage + CDN answer “where are the bits?” Keeping those questions in different layers is how you bound work per scroll and scale media independently from feed logic.

Read path (narrate in this order while you draw)

  1. Edge + API — TLS, session auth, rate limits; abuse and routing before feed logic.
  2. Feed service — “Next page” for this viewer + cursor; enforces a cap on candidates and downstream calls (work budget).
  3. Candidate IDs — Ordered list from mailbox (prepared on publish), merge (combine at read time), or hybrid; cheap before heavy joins.
  4. Batch metadata — Multi-get posts and authors (captions, timestamps, avatars, media keys)—still text-sized.
  5. Graph filters — Blocks, private accounts, mutes; read-time checks when safety must win over stale lists.
  6. Media → CDN URLs — Map storage keys / ladder state to HTTPS URLs for the right variant (thumb, manifest, etc.).
  7. Rank (optional) — Ranker with hard timeout and fallback ordering if it misses budget.
  8. Response — Feed items + opaque cursor for the next request.

Interview line to remember:

The hot path is IDs → metadata → policy → link strings; pixels ride the CDN.

Who owns what (typical split)

Use this as a legend for your diagram—each bullet is one role, not one mandatory microservice name.

  • API / BFF — Auth, routing, timeouts; may batch downstream calls so the mobile client is not chatty.
  • Feed service — Read-path orchestration: candidates, graph checks, metadata hydration, URL assembly, rank budget, cursor. Caps work here.
  • Mailbox / timeline store — Per-viewer ordered post IDs (or score+id)—not full post JSON at rest.
  • Post service — Source of truth for captions, author id, timestamps, privacy, media keys, processing state.
  • Media pipeline — Transcoders, thumbnails, ABR ladders; triggered by queues or streams after upload.
  • Object storage + CDN — Blobs and segments; regional edge caching; signed or stable paths.
  • Graph service — Follows, blocks, mutes; usually heavily cached; feed asks “may this viewer see this author?”
  • Fan-out workers — Take (post_id, author_id) and append to follower mailboxes where push applies; idempotent writes.
  • Ranker — Scores or reorders candidates; treat as latency-SLA’d with timeout + fallback.

Caching (say it explicitly): Cache post metadata, profiles, following sets, and CDN edges; be honest that some fields can be briefly stale (display name) while safety paths (blocks) are re-checked when serving.

Celebrity / hybrid (point at the diagram): If you only draw push fan-out, expect a follow-up on mega-creators. Show merge from an author-indexed recent tail (or partial fan-out) into the candidate pool—policy, not magic.

Reference diagram (one picture)

Read path flows left to right across the top; upload + fan-out sit on the write path below. Post svc connects to media jobs and object store because metadata points at blobs; CDN is where clients actually download media.

[ Mobile client ]
    → [ Edge / API ] → [ Feed svc ] → [ Candidate store: mailboxes + optional merge ]
              │              │                    │
              │              │                    └──→ [ Ranker ] (timeout + fallback)
              │              │
              │              ├──→ [ Graph svc ] (follows, blocks)
              │              │
              │              └──→ [ Post svc ] → metadata only
              │                         │
              │                         ├──→ [ Media jobs ] → transcode / thumbs
              │                         │
              │                         └──→ [ Object store ] → [ CDN ] → (video/image bytes)
              │
              └── upload: [ Upload API ] → object store → enqueue transcode → post row READY
                    fan-out: [ Queue ] → workers → viewer mailboxes (idempotent)

In the room:

Explain the split (orchestration vs blobs vs async work) in about a minute. Walk the read path with concrete numbers: latency budget, max candidates, ranker timeout, CDN URL shape.

Then add the write path: durable post, async transcode, async fan-out, idempotent mailbox writes. Instagram-specific detail: the HTTP ack for posting should not wait for every encoding ladder—metadata and upload durability come first.

Event- and message-driven architecture (async half of the HLD)

The request/response diagram is only half the story. Transcoding, fan-out, notifications, search indexing, and analytics are too slow and too bursty to run inline in the post HTTP handler. Production-shaped designs decouple with queues or event streams so work can scale, retry, and back off independently of the API.

Why messages or events?

PatternWhat you getWhat you trade
Task queues (SQS, RabbitMQ, Redis Streams as a queue, etc.)Explicit jobs: “transcode post_id,” “fan-out batch for shard X”—easy retry, visibility timeout, dead-letter queuesOrdering and replay semantics vary by product; fan-out may need many messages or batch workers
Log-based events (Kafka, Pulsar, Kinesis)Durable ordered stream of facts (PostCreated, MediaReady); many consumers (fan-out, search, metrics) without coupling the writerOperational complexity; consumers must be idempotent; schema evolution needs discipline
HybridCritical path job queue + Kafka for fan-out and downstream systemsTwo systems to run and monitor

Typical “facts” you might emit (names illustrative): PostCreated / MediaUploadComplete, TranscodeComplete (per variant or ladder), PostVisibleToFollowers, FanOutRequested, StoryExpiringSoon. Consumers include transcoders, mailbox writers, push notification senders, and search indexers—each subscribes to what it needs.

Delivery guarantees (say this in the interview): Most managed queues give at-least-once. Workers must be idempotent (post_id + viewer_shard dedupe, unique keys in DB). Exactly-once end-to-end is rare; dedupe at the sink is the practical pattern.

Reliable handoff from the database to the bus: After a transaction commits a post row, you still need a message to enqueue work. Common patterns: transactional outbox (same DB transaction writes an outbox row; a publisher process reads and pushes to Kafka/SQS), or DB CDC (change data capture) into a stream. Avoid “commit then best-effort send to queue” without a plan for lost messages on process crash.

Small diagram (add to the whiteboard below the sync path):

[ Post API ] → (txn) → [ Posts DB ] + [ Outbox row ]
                              ↓
                    [ Outbox publisher / CDC ]
                              ↓
              [ Queue or Kafka: PostCreated, TranscodeJob, ... ]
                              ↓
         ┌────────────────────┼────────────────────┐
         ▼                    ▼                    ▼
  [ Transcode workers ] [ Fan-out workers ] [ Notifications ]

Other high-level design patterns that fit this problem

You do not need every buzzword—pick one or two that match the story you drew.

  • CQRS-style separation (lightweight): Writes go to the authoritative post store; reads for the feed use precomputed candidate lists and cached metadata—different shapes optimized for different access paths. Full CQRS with event sourcing everywhere is overkill for most interview answers; separate read models for feed IDs vs write model for posts is enough.
  • Choreography vs orchestration: Choreography — services react to events (transcode finishes → emits → fan-out starts). Orchestration — one workflow engine coordinates steps. Feeds often look choreographed in the small; orchestration appears when you need strict ordering and compensation (rare on the hot path).
  • API Gateway / BFF: Edge handles auth, rate limits, routing—already in your diagram; BFF aggregates feed + profile fragments for mobile.
  • Anti-corruption / bounded contexts: Post domain vs Feed domain vs Graph—different teams and release cadences; events are the contract between them.
  • Strangler fig (when migrating): If you replace a legacy feed store, dual-write or shadow read with feature flags—migrate traffic gradually.

Interview line: The sync path is small and bounded; the async path is message-driven, idempotent, and observable (queue depth, consumer lag, DLQ age).

Core design approaches

Most of the interview is not picking a brand of database—it is walking forks where each choice trades latency, cost, complexity, and failure behavior. The sections below are the usual decision surfaces: how uploads relate to encoding, how feed IDs are produced, how Stories differ from feed posts, and how ranking sits on top without blowing the p95 budget.

Upload and media processing

Baseline pattern. Clients upload chunks to object storage; the API creates a durable post row with media_processing: pending (or similar). Transcoders build ABR ladders, thumbnails, and poster frames; the row moves to ready or partial states as artifacts land. The phone may show optimistic UI; server truth stays metadata + pointers, not “every pixel is final.”

Tradeoffs to articulate:

DecisionIf you optimize for…You usually accept…
Resumable / chunked uploadFlaky mobile networks and large videosMore moving parts (session IDs, part completion, cleanup of abandoned uploads)
HTTP success when…Fast “posted” feelingProcessing states in the API—clients must render pending / partial / failed honestly
…metadata + blob durable vs …every ladder existsLow post latency and stable APIsUsers may play before all bitrates exist; client uses ABR and lower rungs first
Many ladder rungs + renditionsPlayback quality and smooth ABRStorage, transcode CPU, and cache cardinality (more URLs to purge and warm)
Fan-out / visibility in the upload requestImmediate visibility in followers’ listsSpiky load and tighter coupling to posting—most designs enqueue fan-out after the row is durable
Synchronous transcode on critical pathSimpler mental model for “ready”Terrible tail latency and fragile post ACK—strong interviewers treat this as a mistake at scale

Interview line: Separate “can we show something?” (poster frame, low rung) from “is the full product-quality ladder done?”—and put only the first class of work anywhere near the user-visible ACK if you can avoid it.

Feed assembly (push / pull / hybrid)

The core tension is where you pay cost: on publish (writes to many mailboxes) or on read (merge many authors into one timeline).

Push (fan-out on write). When someone posts, workers append post IDs to each follower’s mailbox (possibly sharded). Pros: Feed reads are simple and fast—slice a precomputed list, then hydrate. Cons: Write amplification scales with followers; a celebrity post can mean millions of writes or queue depth unless you cap, sample, or refuse naive full fan-out. Staleness in mailboxes is real; teams often filter at read (blocks, deletes) even when IDs were pushed earlier.

Pull (merge on read). At scroll time, fetch recent post IDs from each followed account (or from author-indexed tails) and merge by time or score. Pros: No per-follower write for mega-creators—publish cost stays O(1) with respect to audience size at post time. Cons: Read path gets heavier as fan-in grows (many follows → big merge); hot authors and cache misses show up as p99 pain; you need indexes, limits, and caching of author tails.

Hybrid (what production-shaped answers describe). Push for the long tail; merge or partial fan-out for the head (celebrities). Pros: Balances read latency and write blast radius. Cons: Two code paths and a policy layer (who counts as “big,” how to dedupe when both paths contribute IDs). The tradeoff is operational complexity vs impossible pure-push or pure-pull at both extremes.

Cross-cutting tradeoffs (say them out loud):

  • Mailbox size / retention: Unlimited per-user lists simplify reads until storage and compaction hurt; trimming and pagination semantics must stay consistent with cursors.
  • Ordering: Strict time order across all sources is easy to explain; algorithmic order usually needs candidate generation separate from final sort (see Ranking)—mixing the two without naming budgets fails follow-ups.

Stories

Stories share DNA with feed posts (same graph, often same media pipeline) but differ in product rules—most notably short lifetime.

Tradeoffs:

TopicTypical choiceTradeoff
LifetimeFixed TTL (e.g. 24h)Clock skew and “what counts as viewed” are edge cases; sweepers vs lazy delete change ops load
Storage modelSeparate Story rows / buckets vs same posts table with type=storySeparate keeps queries and TTL jobs simpler; unified reduces duplication but risks wrong retention or ranking if you forget flags
CleanupBatch sweeper + TTL indexesMissed sweeper runs or backlog → orphaned blobs unless object lifecycle matches metadata TTL
Ranking / surfacingOften different rules than main feed (close friends, reactions)Reusing the same ranker without inputs for ephemeral context produces wrong ordering—say different features or caps, not only “another feed”
Seen / replay stateHigh write volume if you persist every viewSampling, batching, or eventual sync vs perfect per-tile state—cost vs fidelity

Interview line: Stories are not “feed with a flag”—they are time-bounded inventory with expiry semantics and usually different notification and ranking expectations.

Ranking

Treat ranking as a contract, not a magic box: candidates in (bounded list of IDs), ordered list out, under latency and safety constraints.

Structural tradeoff: candidate generation vs scoring. Generating candidates (who might appear) is often cheaper and more stable than scoring every post in the world. Scoring can use heavy models and features. If you conflate the two, every feed fetch becomes “run the full recommender on everything”—p95 dies. Strong answers cap candidates, then rank.

Quality vs latency. Deeper models and more online features improve engagement; they also add dependencies and tail latency. The usual mitigation is strict timeouts, fallback to recency or a lighter model, and caching of scores or partial results where product allows.

Product modes (e.g. Following vs algorithmic Home). You do not need two databases for credibility—you need a clear story: different candidate sources (only follows vs follows + exploration), different weights, or different freshness SLAs. Chronological surfaces may skip heavy ranking entirely; Home might blend sources—name caps and timeouts either way.

Pagination under changing rank. If order moves between requests, offset pagination lies; cursors tied to (score, id) or opaque tokens are standard, with honest dup/skip behavior. Trading perfect stability for simplicity is a valid product discussion—pretending strong consistency without cost is not.

Interview line: Say what happens at the ranker budget (e.g. 300ms)—timeout, fallback order, and whether the client refetches—before you argue TensorFlow vs PyTorch.

Detailed design

Write path (post + media)

  1. Client obtains upload URLs or streams chunks to storage.
  2. Post API creates row: author, caption, media keys, timestamps, visibility, processing flags.
  3. Enqueue transcode/thumbnail jobs; do not block HTTP until all ladders exist.
  4. Fan-out or enqueue fan-out when the post is visible per product rules (sometimes after basic processing).
  5. Idempotent workers write (viewer_shard, post_id) with dedupe keys.

Side effects—search index, notifications, Story fan-out—trail the critical path.

Read path (home feed)

  1. Resolve viewer and session context (experiments, mode).
  2. Load candidate IDs (mailbox slice + merge + optional exploration—if in scope).
  3. Batch fetch post metadata and authors; filter blocks and policy.
  4. Resolve media: pick variant (thumb, preview, HLS manifest URL) per client capability.
  5. Rank within budget; trim to page size; emit cursor.

If you only say “we call ranking,” say what happens at 300ms: timeout, cache last good order, or a recency slice.

Data schema, storage, and caching (Redis, DBs, blobs)

At scale there is no single database that wins every access pattern. You separate transactional metadata (posts, users), graph edges (follows, blocks), high-volume read lists (per-viewer candidate IDs), blobs (object storage), and ephemeral / hot data (caches, Redis). The interview goal is to name the pattern and why—not memorize one vendor.

What lives where (logical split)

Kind of dataTypical homeWhy
Post & user recordsRelational DB (PostgreSQL, MySQL, or sharded SQL) or document storeACID-ish create/update for posts; indexes on author_id, created_at; constraints and migrations teams know how to operate
Per-viewer feed candidates (ordered post IDs)Wide-column (Cassandra, Scylla), key-value with range reads, or Redis structures (see below)—often sharded by viewer_idHuge cardinality (one row/stream per user); append on fan-out; range scan for “next page”
Social graph (follows, blocks)Graph DB or edges table + heavy cacheAdjacency queries (“who does X follow?”); blocks must be correct—often read-time checks with cached sets
Media blobsObject storage (S3, GCS, R2, MinIO)Bytes don’t belong in OLTP rows; lifecycle rules for TTL Story objects
Hot read pathRedis (or Memcached), CDN for URLsSub-ms reads for repeated keys; no strong durability requirement for cache
Async workQueues (Kafka, SQS, RabbitMQ, etc.)Fan-out, transcode jobs, notifications—backpressure and replay

Example schemas (illustrative—not one true DDL)

Posts (relational or document collection)—source of truth for metadata and pointers to media:

  • post_id (PK), author_id, caption, visibility, created_at, updated_at
  • media_processing (pending | partial | ready | failed)
  • media_manifest — JSON or columns: original object key, ladder entries (key per variant, width, codec, bitrate), poster key, thumbnail key

Users / profiles:

  • user_id (PK), username, display_name, avatar_url or avatar_object_key, is_private, …

Graph edges (many designs use follows + blocks tables):

  • follower_id, followee_id, created_at — composite PK or unique index; shard by follower_id or followee_id depending on query pattern
  • blocker_id, blocked_id, created_at

Mailbox / timeline (per viewer)—store IDs, not full posts:

  • Key: viewer_id + shard; ordered list: post_id, optional inserted_at or score for ranking
  • Compaction / trim old tail for storage; cursor must align with how you slice this structure

Stories (often separate table or type + TTL column):

  • Same media pointers as posts plus expires_at, story_kind, seen state (or separate seen store if write volume is high)

Sessions / devices (optional dedicated table or auth provider):

  • user_id, device_id, refresh_token hash, expires_at

Object storage layout

  • Prefix by post_id or author_id/post_id so lifecycle and permission deletes can bulk-prefix delete when needed.
  • Originals vs derived (thumbnails, ladder segments): separate keys or folders so transcode failures can retry per artifact.
  • CDN URLs either embed the key path or use signed URLs with short TTL; purge / invalidate on bad encode or takedown.

Redis: what it is good for (and what to avoid)

Redis is often fast RAM with TTL—not a replacement for durable post storage unless you are very explicit about sync and durability (Redis persistence, AOF, or write-through to DB).

Strong fits:

  • Cache-aside for hot post metadata (GET post:{id}), author profiles, follow counts (with eventual consistency and invalidation on update)
  • Rate limiting and abuse tokens (sliding window counters per IP / user)
  • Short-lived session or feature flags for feed experiments (paired with a stable cursor contract)
  • Sorted sets or lists for small per-user tail caches (e.g. “last N post IDs for merge”)—bounded size, TTL, not the only copy of a 10M-follower mailbox
  • Distributed locks or coordination for fan-out idempotency keys (careful: TTL locks, fencing)

Risky or “say it carefully” fits:

  • Entire mailbox for every user in Redis only — memory cost explodes; recovery after failover is hard; production systems often use Cassandra/wide-column or tiered storage for mailboxes, with Redis as cache layer on top
  • Primary store for fan-out without durability story — interviewers may push on what happens when Redis dies

Memcached vs Redis: Memcached is pure cache (simpler); Redis adds structures (sets, sorted sets, streams) and TTL—useful when you need rate limit + small structured state in one place.

Cassandra / wide-column (when interviewers mention it)

Good for: time-ordered wide rows—(viewer_id, bucket) → clustering columns (post_id, ts) or score; high write fan-out from fan-out workers; tunable consistency. Tradeoff: query model is rigid (design for access path first); operational learning curve.

Putting it together in one sentence

Postgres (or similar) holds authoritative posts and users; mailbox/feed storage holds IDs per viewer at scale; object storage holds media; Redis caches hot reads and powers rate limits and small ephemeral structures; queues move async work. Draw the arrows and say what is source of truth vs cache vs derived.

Key challenges

  • Egress and encoding cost: Video dominates bandwidth and CPU in the pipeline; ABR and CDN are not optional decorations.
  • Skew: One creator, huge follower count—naive fan-out is storage and queue blast radius; naive pull is read-time merge cost.
  • Pagination vs ranking: Order moves between requests; cursors and occasional dup/skip are normal—don’t promise offset consistency at scale.
  • Stories vs feed: Different retention, TTL jobs, and product rules; conflating them creates buggy cleanup and ranking.
  • Seen / replay: Updating every story view synchronously can write-storm; batching, sampling, or eventual sync is often the reality.
  • Transcode lag: Users still expect a feed; processing states and lower rungs must be first-class in the API contract.

Scaling the system

Shard mailboxes by viewer; shard post metadata by post id or author-time; CDN is how you scale bytes. For hot posts, use request coalescing (singleflight) and TTL jitter on metadata caches—thundering herd on a viral Reel ID is a classic failure mode.

Multi-region: graph and media replication lag is real; eventual visibility across regions is more honest than strong global consistency for every mailbox row.

Bottlenecks and tradeoffs

Cost vs quality: More ladders and prefetch improve UX; every variant is storage + transcode dollars.

Freshness vs load: NRT indexing of new posts vs batch features for ranking—pick SLAs per surface.

Consistency vs latency: Strong cross-service consistency is rare; eventual mailboxes + read-time filters for safety are typical.

Failure handling

  • Ranker slow or down: Timeout; fall back to recency or last cached ranked slice; feature flags to disable heavy models.
  • Transcode backlog: Show processing or lower bitrate; do not fail the whole feed; alert on queue age and oldest pending job.
  • CDN / origin hot spot: Origin shield, request coalescing, regional failover; monitor origin load separately from API QPS.
  • Fan-out backlog: Reads still succeed from stale mailboxes + merge; visibility latency grows—product messaging beats silent failure.
  • Graph service degraded: Circuit break; for safety paths, fail closed where abuse risk is high; otherwise bounded retries.

API design

Illustrative REST surface (GraphQL is fine if you discuss batching and N+1—feeds amplify bad graphs).

POST   /v1/media/uploads              # or /uploads:init + chunked parts
POST   /v1/posts
GET    /v1/feed
GET    /v1/posts/{post_id}
GET    /v1/stories/reel               # or nested under users
POST   /v1/users/{user_id}/follow
DELETE /v1/users/{user_id}/follow
POST   /v1/posts/{post_id}/like       # async side effects OK
GET    /v1/posts/{post_id}/comments

Home feed — GET /v1/feed

Query paramRole
cursorOpaque continuation; may embed ranker state
limitPage size (cap server-side, e.g. ≤ 20–50)
modee.g. following vs for_you if product has both
session_idExperiments / snapshot behavior (optional)

Response sketch: { "items": [ Post ], "next_cursor": "...", "has_more": true }. Each Post includes media with status (ready/processing), urls or playback descriptors per variant, counts, author summary.

Errors: 401 / 429 with Retry-After; avoid leaking blocked user existence if policy requires.

Upload — POST /v1/media/uploads (pattern)

  • Return upload session id and part URLs or signed PUT targets.
  • Idempotency-Key on POST /v1/posts so flaky networks don’t duplicate posts.

Stories — read

Often separate resource with TTL hints in response; client hides expired client-side but server is source of truth for ordering and privacy.

Cross-cutting

  • Rate limits on upload and create to fight abuse.
  • Version mobile clients; backward-compatible field adds for media states.

In the room:

One minute on idempotent post create, cursor semantics for GET /v1/feed, and why likes don’t own feed p95—that’s enough to sound like you shipped the APIs.

Production angles

These are the postmortem stories—queues, timeouts, and caches under real skew.

Transcode backlog while posts “succeed”

What it looks like — Upload finishes; post returns 201; users see processing spinners or soft placeholders while queue depth grows after a viral event or bad deploy to workers.

Why it happens — Transcode is finite capacity; fan-out and feed still work off metadata, but playback quality lags.

What good teams do — Alert on oldest job age and SLA per priority lane; scale workers or shed non-critical encodes; serve lower rungs first; feature flag to relax quality temporarily.

Ranker regression or slow feature store

What it looks like — p95 for feed spikes; mobile spinners; thread pools wedged if the feed service blocks too long on ranking.

Why it hurts — Ranker sits on the hot path unless you isolate it with strict timeouts.

What good teams do — Hard timeouts, circuit breakers, fallback ordering; separate SLOs for feed loads vs perfect ranking; kill switches for heavy models.

CDN / origin meltdown on one object

What it looks like — One Reel goes viral; every feed includes it; edge is fine until cache miss aligns; origin or shield collapses under duplicate fetches.

Why it happens — TTL alignment without jitter; no request coalescing for hot keys.

What good teams do — Singleflight at API or cache layer; stale-while-revalidate; prewarm for known launches; watch origin CPU and egress, not just 200 rate.

Stories expiry and “ghost” content

What it looks like — Users see Stories that should be gone—or miss new ones—because sweeper lagged or client cache is stale.

Why it happens — TTL is distributed across storage, CDN, and clients; clock skew and regional replication add fuzz.

What good teams do — Server authoritative expiry in API; short CDN TTL for story manifests; sweeps with metrics on orphan segments; honest client refresh rules.


How to use this in an interview — You don’t need to memorize every subsection. You do need one concrete story: e.g. transcode queue age + mitigation, or ranker timeout + fallback, or CDN stampede + singleflight. That’s how you sound like you operated the system, not only drew it.

What interviewers expect

  • Scope first: home vs Stories vs Explore; ranking vs chronological; what “in scope” means for a 40-minute loop.
  • Upload path: resumable/chunked upload to object storage; durable post row; async transcode, poster frames, ABR ladders; client success without waiting for every bitrate.
  • Read path: assemble candidate post IDs (mailbox / merge / hybrid); batch metadata; attach CDN URLs for the right variant (thumb, preview, HLS segment); bound ranker time and candidate count.
  • Storage split: graph service; post metadata; object store for blobs; CDN for delivery; caches with explicit TTL and invalidation story.
  • Skew / celebrity: push for long tail, merge-on-read or partial materialization for the head—say why (queue depth, storage, blast radius).
  • Stories: 24h expiry, separate ranking hooks or surfaces, sweep jobs; don’t pretend they’re “just another post” without TTL semantics.
  • Ranking as a contract: candidate generation vs scoring; timeouts; recency fallback; experimentation and safety as inputs—not a black-box “ML” square.
  • Pagination: opaque or structured cursors; duplicates/skips when order moves; no fake strong consistency.
  • Failure and degradation: ranker slow → fallback ordering; transcode backlog → lower ladder or placeholder; CDN miss → shield origin; graph cache miss → bounded extra lookups.
  • Ops-shaped details: idempotent fan-out workers, backpressure on queues, dashboards on transcode age, fan-out lag, and feed p95 vs rank p95.

Interview workflow (template)

  1. Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
  2. Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
  3. APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
  4. Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
  5. Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
  6. Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).

Frequently asked follow-ups

  • How is this different from a text-heavy Twitter or Facebook feed?
  • Where does transcoding run, and what does the client wait for?
  • How do you handle a creator with tens of millions of followers?
  • How do Stories expiry and cleanup interact with the main feed?
  • What breaks if CDN or transcode falls behind—does the app go blank?

Deep-dive questions and strong answer outlines

Walk through what happens when a user posts a 60-second Reel.

Client uploads via resumable/chunked transfer to object storage; API persists post metadata with processing state. Transcoder workers build ladders, thumbnails, and previews asynchronously. Success returns when metadata is durable and upload is complete—not when every bitrate exists. Fan-out to mailboxes or enqueue jobs after the post row exists; idempotent (post_id, viewer_shard) writes.

How does the home feed read path differ from “SELECT * FROM posts JOIN follows”?

Resolve candidate IDs from precomputed storage and/or merge from followed authors; multi-get post rows and author profiles; attach CDN-signed URLs or stable keys per variant. Apply blocks and policy; rank within a time budget. No giant SQL join on every swipe.

How do you handle Stories vs feed posts in storage and API?

Often separate TTL’d records or buckets, 24h sweep, different notification and ranking hooks. May share graph and identity; retention and read APIs differ—state that explicitly.

What about flaky mobile networks?

ABR for video, smaller thumbs first, resumable upload, stale-while-revalidate for metadata. Feed should still return skeleton posts if media is temporarily unavailable—degrade gracefully.

How do you paginate when ranking shifts between requests?

Cursor from ranker or (score, post_id); accept rare dup/skip; avoid offset/limit at scale. Optional session snapshot if product demands stability—trade memory and complexity.

Production angles

  • Viral Reel: one object key drives huge CDN and origin traffic—shield origin and watch hot-key metrics, not just API QPS.
  • Transcode backlog grows during incidents; new posts show lower quality or processing state until queue age recovers.
  • Regional edge partial outage; clients fall back to another POP or degraded bitrate with honest UX.

AI feedback on your design

After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.

FAQs

Q: Do I need to design Explore and recommendations?

A: Usually no unless the prompt expands. Stub Explore as a separate candidate source and ranker; spend your time on home + media + graph unless they steer you.

Q: Is E2E encryption part of the feed design?

A: Public posts are server-visible for ranking and safety. DMs are another system. Don’t derail on crypto unless asked.

Q: How deep should I go on codecs?

A: Name ABR, ladders, CDN, thumbnails, async pipeline—not GOP tables. Interviewers want ownership of the pipeline, not ffmpeg trivia.

Q: Should comments and likes block feed latency?

A: Typically no—eventual counts, write-behind, idempotent likes. Keep scroll path thin.

Practice interactively

Open the practice session to use the canvas and stages, then review AI feedback.