System design interview guide
Design Dropbox
TL;DR: Users expect folders to **stay in sync** across laptops and phones: edits should converge, **dedup** should save money, and **sharing** should not turn into a permissions nightmare. The interview stress is **metadata** scale, **conflict** handling when two devices edit offline, and **bandwidth**—not “we store files in S3” without a sync story.
Problem statement
You’re designing cloud file storage with sync across devices: upload, download, folders, sharing, version history, offline access with later reconciliation, and search at high level.
Constraints. Functionally: hierarchical files, multi-device sync, share links and permissions, versions. Non-functionally: sync latency in seconds for small changes, high durability for blobs, availability for metadata APIs. Scale: hundreds of millions of users, billions of files, petabytes—metadata QPS and egress dominate planning.
Core story: content-addressed chunks + metadata journal + efficient sync + explicit conflict policy.
Introduction
Dropbox-class systems are two-layer: a metadata plane (trees, versions, ACLs, journal) and a blob plane (content-addressed, cheap, replicated object storage). Interviewers watch whether you chunk and dedupe before you talk about databases, and whether you separate “what changed in the tree” from “where the bytes live.”
Weak answers stop at “S3 for files, Postgres for metadata.” Strong answers walk sync (cursor, conflicts), sharing (ACL on every read), and GC (chunk refcount after delete).
How to approach
Clarify real-time co-editing vs file sync—scope changes the answer. Sketch namespace per user or team, change journal with monotonic seq, chunk upload with content hash, then conflict policy. Search is async—mention after core sync.
Interview tips
- Content addressing: Chunks named by hash—dedup across users and versions; refcount for GC.
- Journal beats mtime: Relying only on timestamps loses when clocks skew—server sequence + client vector is the grown-up story.
- Sharing: Effective permission = ACL + inheritance; cache per-user permission views with TTL; revoke must invalidate quickly.
- Bandwidth: Delta sync for large files where product allows—rsync-style rolling hash; block-level diff—not always in scope for 45 minutes but name it.
- Listing huge folders: Pagination, cursor, lazy expansion—never return 1M children in one response.
Capacity estimation
| Topic | Angle |
|---|---|
| Metadata | Many small rows; shard by user_id / team_id / namespace_id |
| Blobs | EB-scale; egress cost dominates; CDN for shared links |
| Sync ops | Often exceed raw uploads—notification path must be cheap |
| Dedup ratio | Enterprise saves storage—refcount correctness must hold |
Implications: Metadata QPS drives sharding; hot folders need rate limits and efficient fan-out of change notifications.
High-level architecture
Desktop/mobile clients keep a local mirror + journal cursor. Metadata service (strongly consistent per shard) owns paths, file versions, chunk lists, sharing edges. Blob storage (S3-class) stores immutable chunks by hash. Notification service tells clients “something changed” so they pull delta from metadata API. Search indexer consumes async events—off hot path.
Who owns what:
- Metadata API — Authoritative tree; transactional
commitof new version; lists changes sincecursor. - Chunk gateway — Pre-signed uploads; verifies hash on complete; dedup skip if chunk exists.
- Object store — Durability; lifecycle to cold tier; encryption at rest.
- ACL service — Resolves share → effective access; cached with careful invalidation.
[ Client A ] ──sync──► [ Metadata svc ] ──► [ Metadata DB / shard ]
| |
| 1) list changes +-- journal (seq, ops)
| 2) get chunk hashes
v
[ Pre-signed GET/PUT ] ──► [ Object store (chunks by hash) ]
▲
|
[ Client B ] ◄──notify── [ Notification svc ] (WebSocket / FCM)
In the room: Say async clearly: commit returns fast; search index and thumbnails trail; sync correctness does not wait on thumbnail.
Core design approaches
Metadata model
File: (namespace_id, path) → file_id → version → ordered chunk hashes + size + content hash of whole file (optional).
Folder: Tree node with children listing or materialized path table—trade join cost vs rename cost.
Sync strategies
Polling: Simple; wasteful at scale—use cursor or long poll.
Push notification: “Something changed in namespace” → client pulls delta—scales better than pushing full trees.
Conflict
LWW with server timestamp + conflict file foo (conflicted copy).txt—honest UX.
CRDT for text—only if real-time co-editing in scope—heavy.
Detailed design
Write path (upload new version)
- Client splits file into chunks, computes SHA-256 per chunk.
- Client requests upload for missing hashes (server returns skip for dedup).
- PUT chunks to object store with pre-signed URLs.
- Client calls
commit:{path, parent_rev, chunk_list, client_rev}—server transactionally creates new version if parent matches; else conflict response.
Read path (another device sync)
- Client maintains
last_seqfor namespace. GET /changes?since=seqreturns ops: add/update/delete/move.- For each updated file, fetch chunk list if not cached locally; download missing chunks.
Sharing
POST /share creates edge (resource, grantee, role); read path checks grantee + inheritance; public link is separate token with scoped permission.
Key challenges
- Conflict detection: Optimistic concurrency with
rev; merge is product problem—technical answer is detect + surface. - Dedup GC: Delete file decrements refcounts; sweep unreferenced chunks async—race with concurrent uploads—transaction or epoch GC.
- Rename/move at scale: Single metadata txn per namespace shard; hot dirs need care.
- Share link abuse: Rate limit; virus scan async; DLP enterprise—brief mention.
Scaling the system
- Shard metadata by namespace (user/team); avoid cross-shard moves or two-phase commit—design paths to stay in-shard when possible.
- Horizontal stateless metadata API with connection pooling to shards.
- Object store scales itself; hot chunks may need CDN for shared links.
- Rate limit sync list for noisy clients; bulkhead tenants.
Failure handling
| Failure | Effect | Mitigation |
|---|---|---|
| Partial chunk upload | Orphan multipart | Lifecycle rule deletes incomplete after TTL |
| Commit fails after upload | Orphan chunks | GC + idempotent commit retry |
| Client offline | Divergent edits | Conflict rules; server wins on authoritative fields |
| Metadata replica lag | Stale read of ACL | Read-your-writes via primary for security checks |
Degraded UX: Stale file list briefly; outage is cannot access files—metadata HA matters more than blob (blobs are already replicated).
API design
Chunk upload
| Method | Role |
|---|---|
POST /v1/chunks:prepare | Returns URLs + upload session id |
PUT to object store | Raw bytes |
POST /v1/chunks:complete | {hash, size} finalize |
File commit
POST /v1/files:commit
{
"path": "/docs/report.pdf",
"parent_rev": "abc",
"chunks": ["sha256:...", "..."],
"client_rev": "uuid"
}
Change feed
| Param | Role |
|---|---|
cursor | Last journal seq |
limit | Max ops |
Diagram:
Client --GET /changes?cursor=...--> Metadata
|
+-- ops: file X v3, new chunk hashes [h1,h2]
|
Client --GET missing chunks--> Object store (parallel)
Errors: 409 conflict on parent_rev; retry with merge UX; 412 if quota exceeded.
Production angles
These are the failure modes that survive design reviews and still bite you in prod: sync is a distributed state machine stretched across flaky networks, clocks that disagree, and clients you do not fully control. The whiteboard shows “client polls, server returns journal”—operations lives in why the same file hash appears twice, why one laptop never catches up, and why storage bills grow while users think they deleted everything.
Sync loops, battery drain, and “my Mac won’t sleep”
What it looks like — Support sees a spike in “sync stuck” tickets. On-call notices metadata QPS from a small cohort of devices dwarfing healthy clients. Users report fans spinning; mobile telemetry shows wake locks held for hours. The service looks healthy—CPU on the sync tier is flat—because the damage is per-device churn and wasted work, not aggregate load.
Why it happens — Something keeps producing no-op or conflicting journal entries: a client bug re-applies the same patch, clock skew breaks ordering assumptions, or a partial write leaves the client convinced the server is “behind.” Without server-side dedupe of identical ops and per-device circuit breakers, one bad build can self-DDoS your metadata layer while returning 200s.
What good teams do — Treat hash verification on commit as non-negotiable for content-addressed chunks; rate-limit and backoff noisy device IDs; alert on ops per device per minute and journal tail lag per namespace, not only global QPS. Seniors push for kill switches per client version; juniors learn that “retry until success” without a max reconciliation budget is how you burn an entire fleet’s batteries.
Storage cost climbs while users “deleted” the files
What it looks like — Finance asks why object storage grew 30% quarter-over-quarter though MAU is flat. Legal asks why PII still exists for accounts marked deleted. Dashboards show refcount for deduped chunks not reaching zero, or GC jobs blocked behind compliance holds.
Why it happens — Content-addressed storage means chunks live until every reference graph edge disappears. Cross-user dedup, shared links, version history, and litigation holds each pin bytes independently. GC is eventually correct; billing is monthly and impatient.
What good teams do — Metrics on GC backlog, bytes pending delete, and oldest legal hold age; tiered retention by tenant SKU; explicit product switches for “disable cross-user dedup” where dedup is a privacy tradeoff. In interviews, naming refcount + hold + async GC beats pretending DELETE is instant.
Share links and large-file downloads feel slow far from “the” region
What it looks like — Latency to first byte for a shared asset is fine in us-east-1 and awful in APAC. Traceroute shows clients hitting an origin or a single-region bucket. CDN cache hit ratio is low because every share is unique signed URL with short TTL—correct for security, brutal for edge warmth.
Why it happens — You optimized the control plane (signing, ACL) and left data plane egress on one spine. Signed GET patterns defeat naive cache keys unless you architect URL canonicalization and key separation (object ID vs auth) deliberately.
What good teams do — Geo-replicated object storage or regional origins behind the same signer; range requests and transfer acceleration for huge files; separate SLOs for metadata vs bulk download. Seniors describe egress dollars per popular folder; juniors should at least separate “API fast” from “multi-GB blob fast.”
Backpressure: polling storms masquerading as success
What it looks like — /changes p99 creeps up during a client release; 429 rates tick up on older versions. Nothing is “down,” but commit latency for interactive saves rises because metadata is busy serving empty polls.
Why it happens — Mobile clients wake on push hints but still poll as backstop; a bug doubles poll frequency; long-poll misconfigured to short-poll. The server keeps answering 200 with “no updates,” which looks healthy on availability dashboards but wrecks capacity.
What good teams do — Adaptive polling with jitter; 429 + Retry-After for abusive fingerprints; conditional sync tokens so “nothing changed” is cheap. Measure journal lag, commit p99, dedup hit rate, and GC backlog together—when three move at once, you are in a sync incident, not a cache incident.
[ Many clients poll /changes ] → metadata QPS ↑
→ throttle noisy clients → 429 + Retry-After
How to use this in an interview — Pick one pain: sync loop, undeleted bytes, or regional egress. Name one metric that goes red first and one containment lever (circuit break, GC visibility, or CDN/signer split). That proves you have operated—or at least thought like someone who did.
Bottlenecks and tradeoffs
- Consistency vs availability: Strong per-namespace serializability is expensive cross-shard—scope moves carefully.
- Dedup vs privacy: Cross-user dedup reveals shared chunks—enterprise may disable—product tradeoff.
What interviewers expect
- Two planes: metadata (tree, versions, ACLs, change journal) vs blob (content-addressed chunks in object storage).
- Chunking + hashing: dedup via content hash; reference counting; GC of unreferenced chunks.
- Sync: journal of changes with cursor; long poll / WebSocket for notifications; delta or rsync-style where scoped.
- Conflicts: last-writer-wins + conflict copy, or branch files—state tradeoffs; vector clocks to detect concurrency.
- Sharing: ACL graph; every read path checks effective permission—cache with invalidation on share revoke.
- APIs: upload chunks, commit file version, list changes since, download by chunk id.
- Failure: partial upload resume; split-brain clients reconciled by server journal + policy.
Interview workflow (template)
- Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
- Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
- APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
- Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
- Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
- Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).
Frequently asked follow-ups
- How do you detect when a file changed without scanning everything?
- How does deduplication work?
- What happens when two people edit the same file offline?
- How do you sync only deltas for large files?
- Where does metadata live vs file bytes?
Deep-dive questions and strong answer outlines
Walk through uploading a new 2 GB file.
Client chunks file, hashes chunks, uploads missing chunks (content-addressed) with resumable API. Metadata record references chunk list; dedup skips existing chunks. Finalize file version atomically.
How does another device learn something changed?
Long poll, WebSocket, or periodic sync with cursor since last journal seq. Server pushes change feed per user/namespace; client applies ops locally.
How do you handle conflicting edits?
Last writer wins with conflict copy saved, or explicit branch files—state product choice. Vector clocks help detect concurrency; weak answers pretend timestamps always work.
Production angles
- Garbage collection of unreferenced chunks after deletes—**async** sweeps; **delayed** delete for undo.
- Namespace migration or bad deploy—**version** metadata schema; backward-compatible clients.
- Hot folder with millions of children—**pagination** and **lazy** listing APIs.
AI feedback on your design
After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.
FAQs
Q: Is this the same as Google Drive?
A: Same family—office real-time editing (OT/CRDT) is heavier than classic Dropbox sync. If the prompt is file sync, don’t derail into collaborative Excel unless asked.
Q: Do I need to design full-text search?
A: Usually async indexing pipeline off metadata/events—off hot sync path. Mention privacy (customer holds keys) if relevant.
Q: How deep on encryption?
A: At-rest encryption in object store; in-transit TLS; optional client-side keys for enterprise—one layer deep unless they push.
Q: How do mobile clients save battery during sync?
A: Batch changes, push notifications for meaningful updates vs constant polling, delta sync, and defer heavy work on metered networks. Say you’d expose sync policy hooks—not one global poll interval.
Practice interactively
Open the practice session to use the canvas and stages, then review AI feedback.