System design pattern
Messaging / Chat
Design a reliable, low-latency chat system with ordering guarantees, delivery states, and real-time fan-out across devices.
How to Recognize This Pattern
- The prompt involves real-time conversation, online/offline users, and multi-device behavior.
- You hear delivery status, unread counts, and reconnect/replay requirements.
- Ordering, dedupe, and eventual consistency are all mentioned together.
- The main challenge is not storing text; it is reliable delivery under network churn.
Approach (Step-by-step)
This is where senior candidates show decision quality, not just component naming.
- 1
Clarify chat scope, conversation types, and message semantics first.
- 2
Design durable write path with sequence assignment and idempotency keys.
- 3
Design real-time delivery path for online users via gateway + pub-sub.
- 4
Design offline path: queue notifications and cursor-based replay.
- 5
Define delivery/read receipt state machine and source-of-truth updates.
- 6
Plan sharding strategy by conversation/user and load hotspots.
- 7
Design failure handling for disconnects, retries, duplicates, and delayed acks.
- 8
Define observability for delivery latency, ack timeout, queue lag, and reconnect success.
Key Trade-offs
Think of this as decision math: where does load move, what fails first, and what user experience are you willing to protect.
Strict ordering
Simplifies user expectation in a conversation, but coordination overhead rises under scale and multi-region writes.
Eventual local ordering
Improves availability and throughput, but users may briefly see reordering during reconnect.
Decision lens: Guarantee strict order per conversation shard; avoid unnecessary global ordering constraints.
Push-heavy fan-out
Great for low-latency online delivery, but can amplify load in large groups/channels.
Buffered/batched fan-out
Lower infrastructure pressure for very large audiences, but can slightly increase perceived freshness delay.
Decision lens: Use direct push for small chats and buffered fan-out for high-member conversations.
Strong receipt guarantees
Higher confidence in delivery/read state, but more write amplification and state sync complexity.
Best-effort receipts
Lower cost and simpler operations, but occasional receipt inconsistencies can confuse users.
Decision lens: Keep sent/delivered durable; treat read receipts with bounded eventual consistency.
Scale Realism (Numbers That Matter)
- Follower distribution: Chat load is skewed by conversation type: most are low-volume, but a few large groups/channels dominate fan-out traffic.
- Traffic profile: A plausible global profile is 40k-120k message writes/sec with read receipts/typing events often 2x-5x message volume.
- Latency target: Target p95 send-to-deliver under 250ms for online recipients, with deterministic replay for offline devices.
- Failure envelope: If websocket disconnect rate spikes, queue lag exceeds 10s, or ack timeout ratio crosses 2%, trigger degraded mode and prioritize core message delivery.
Hybrid Switching Rules (Operational Logic)
These rules make hybrid strategy measurable and observable.
- Conversation-size rule: small chats use direct fan-out; very large channels use buffered/batched fan-out.
- Reconnect rule: clients request incremental replay from last acknowledged message cursor.
- Ack-timeout rule: if delivery ack timeout grows, reduce non-critical events (typing/presence) first.
- Backpressure rule: if queue lag crosses threshold, prioritize durable message events over read-receipt updates.
Read Path Deep Dive
- Message ordering should be defined per conversation, not globally.
- Use monotonic sequence numbers per conversation shard to simplify replay and dedupe.
- Treat typing and presence as lossy, but message delivery as durable.
- Read receipts need explicit semantics to avoid cross-device confusion.
- Offline replay must be bounded and cursor-driven to prevent unbounded resync storms.
Latency Budget Breakdown
Map each component to a concrete budget so p95 targets are enforceable.
| Component | Target (ms) | Why this budget |
|---|---|---|
| Gateway + auth | 15 | Session validation and routing lookup. |
| Durable write | 45 | Persist message and assign conversation sequence. |
| Fan-out dispatch | 70 | Push to online recipients and queue offline notifications. |
| Client ack processing | 35 | Track delivered/read states with idempotent updates. |
| Response + state emit | 20 | Return send ack and update sender timeline. |
Real-world Challenges
Reconnect replay storms
Network churn can trigger mass replay requests; without cursor windows and throttling, backend load spikes hard.
Duplicate delivery on retries
At-least-once transport creates duplicates unless idempotency and sequence-based dedupe are enforced.
Group fan-out hot shards
Large channels can overload single partitions if conversation sharding and batching are weak.
Receipt state drift
Multi-device updates can diverge if read-state merges are not deterministic.
What Interviewers Expect
- You clearly define delivery semantics and what each state means.
- You show online delivery and offline replay flows end to end.
- You explain ordering guarantees and their scope.
- You design idempotency/dedupe for retries and reconnects.
- You discuss scaling strategy for large-group fan-out.
- You separate critical message path from non-critical presence/typing events.
Practice Problems
These practice sessions map directly to timeline/fan-out decisions. Start with one, then revisit this guide and evaluate where your design leaked latency, correctness, or cost.
- Design WhatsApp
Best direct practice for chat delivery, receipts, and reconnect design.
- Design Facebook Messenger
Good for fan-out and multi-device consistency decisions.
- Design Slack
Helps reason about channels, scale, and message durability tradeoffs.
Architecture Overview
Read this section as a request journey: API receives intent, cache protects latency, database protects correctness, and queue protects the system during spikes. If one box fails, define how the next box keeps user impact limited.
API Layer
Handles auth, routing, and message submission while enforcing per-user and per-conversation limits.
Example: POST /messages writes message intent and returns a server-assigned message id.
Cache
Stores active session routing and recent conversation metadata to speed fan-out decisions.
Example: Online user-device map in Redis lets gateway route websocket delivery quickly.
Database
Durably stores messages and receipt states with conversation-scoped ordering.
Example: Conversation shard table stores sequence-numbered messages for replay.
Queue
Buffers delivery tasks and offline notifications with retry and dead-letter handling.
Example: If recipient is offline, delivery event is queued for push notification and later replay.
Architecture Diagrams
Visual flows below show where latency is paid and where load is absorbed. Use them as memory anchors in interviews.
Write Path (Send Message)
Persist first, then fan out for reliable delivery semantics.
Read/Replay Path (Reconnect)
Resume from last acknowledged cursor to avoid duplicates and gaps.
Design Evolution (v1 → v3)
v1: Reliable core messaging
Build now
- 1:1 and small-group durable messaging
- WebSocket delivery with reconnect replay
- Basic sent/delivered/read states
Avoid for now
- Complex cross-region active-active ordering logic
- Heavy media pipeline in same critical path
v2: Scale fan-out and operations
Build now
- Conversation sharding and better load distribution
- Queue-prioritized delivery and retry controls
- Improved offline sync and multi-device consistency
Avoid for now
- Overloading delivery path with non-critical real-time signals
- Unbounded replay windows without cursor policies
v3: Advanced chat experiences
Build now
- Large-channel optimization and batching
- Regional routing optimization with resilience controls
- Richer moderation and compliance workflows
Avoid for now
- Feature sprawl that weakens core delivery SLOs
- Tight coupling between messaging core and peripheral services
What Not to Build Initially
Strong system design is also about disciplined scope control.
- Do not ship complex end-to-end search/history ranking in v1 before message delivery is stable.
- Do not make typing/presence strongly consistent; keep them lightweight and lossy.
- Do not overbuild global ordering guarantees when per-conversation ordering is sufficient.