System design pattern

Messaging / Chat

Design a reliable, low-latency chat system with ordering guarantees, delivery states, and real-time fan-out across devices.

HardReal-timeScalabilityConsistencyReliability

Practice: Design WhatsApp Practice: Design Facebook Messenger

How to Recognize This Pattern

The prompt involves real-time conversation, online/offline users, and multi-device behavior.
You hear delivery status, unread counts, and reconnect/replay requirements.
Ordering, dedupe, and eventual consistency are all mentioned together.
The main challenge is not storing text; it is reliable delivery under network churn.

Approach (Step-by-step)

This is where senior candidates show decision quality, not just component naming.

1
Clarify chat scope, conversation types, and message semantics first.
2
Design durable write path with sequence assignment and idempotency keys.
3
Design real-time delivery path for online users via gateway + pub-sub.
4
Design offline path: queue notifications and cursor-based replay.
5
Define delivery/read receipt state machine and source-of-truth updates.
6
Plan sharding strategy by conversation/user and load hotspots.
7
Design failure handling for disconnects, retries, duplicates, and delayed acks.
8
Define observability for delivery latency, ack timeout, queue lag, and reconnect success.

Key Trade-offs

Think of this as decision math: where does load move, what fails first, and what user experience are you willing to protect.

Strict ordering

Simplifies user expectation in a conversation, but coordination overhead rises under scale and multi-region writes.

Eventual local ordering

Improves availability and throughput, but users may briefly see reordering during reconnect.

Decision lens: Guarantee strict order per conversation shard; avoid unnecessary global ordering constraints.

Push-heavy fan-out

Great for low-latency online delivery, but can amplify load in large groups/channels.

Buffered/batched fan-out

Lower infrastructure pressure for very large audiences, but can slightly increase perceived freshness delay.

Decision lens: Use direct push for small chats and buffered fan-out for high-member conversations.

Strong receipt guarantees

Higher confidence in delivery/read state, but more write amplification and state sync complexity.

Best-effort receipts

Lower cost and simpler operations, but occasional receipt inconsistencies can confuse users.

Decision lens: Keep sent/delivered durable; treat read receipts with bounded eventual consistency.

Scale Realism (Numbers That Matter)

Follower distribution: Chat load is skewed by conversation type: most are low-volume, but a few large groups/channels dominate fan-out traffic.
Traffic profile: A plausible global profile is 40k-120k message writes/sec with read receipts/typing events often 2x-5x message volume.
Latency target: Target p95 send-to-deliver under 250ms for online recipients, with deterministic replay for offline devices.
Failure envelope: If websocket disconnect rate spikes, queue lag exceeds 10s, or ack timeout ratio crosses 2%, trigger degraded mode and prioritize core message delivery.

Hybrid Switching Rules (Operational Logic)

These rules make hybrid strategy measurable and observable.

Conversation-size rule: small chats use direct fan-out; very large channels use buffered/batched fan-out.
Reconnect rule: clients request incremental replay from last acknowledged message cursor.
Ack-timeout rule: if delivery ack timeout grows, reduce non-critical events (typing/presence) first.
Backpressure rule: if queue lag crosses threshold, prioritize durable message events over read-receipt updates.

Read Path Deep Dive

Message ordering should be defined per conversation, not globally.
Use monotonic sequence numbers per conversation shard to simplify replay and dedupe.
Treat typing and presence as lossy, but message delivery as durable.
Read receipts need explicit semantics to avoid cross-device confusion.
Offline replay must be bounded and cursor-driven to prevent unbounded resync storms.

Latency Budget Breakdown

Map each component to a concrete budget so p95 targets are enforceable.

Component	Target (ms)	Why this budget
Gateway + auth	15	Session validation and routing lookup.
Durable write	45	Persist message and assign conversation sequence.
Fan-out dispatch	70	Push to online recipients and queue offline notifications.
Client ack processing	35	Track delivered/read states with idempotent updates.
Response + state emit	20	Return send ack and update sender timeline.

Real-world Challenges

Reconnect replay storms

Network churn can trigger mass replay requests; without cursor windows and throttling, backend load spikes hard.

Duplicate delivery on retries

At-least-once transport creates duplicates unless idempotency and sequence-based dedupe are enforced.

Group fan-out hot shards

Large channels can overload single partitions if conversation sharding and batching are weak.

Receipt state drift

Multi-device updates can diverge if read-state merges are not deterministic.

What Interviewers Expect

You clearly define delivery semantics and what each state means.
You show online delivery and offline replay flows end to end.
You explain ordering guarantees and their scope.
You design idempotency/dedupe for retries and reconnects.
You discuss scaling strategy for large-group fan-out.
You separate critical message path from non-critical presence/typing events.

Practice Problems

These practice sessions map directly to timeline/fan-out decisions. Start with one, then revisit this guide and evaluate where your design leaked latency, correctness, or cost.

Design WhatsApp
Best direct practice for chat delivery, receipts, and reconnect design.
Design Facebook Messenger
Good for fan-out and multi-device consistency decisions.
Design Slack
Helps reason about channels, scale, and message durability tradeoffs.

Open Design WhatsApp Open Design Facebook Messenger Open Design Slack

Architecture Overview

Read this section as a request journey: API receives intent, cache protects latency, database protects correctness, and queue protects the system during spikes. If one box fails, define how the next box keeps user impact limited.

API Layer

Handles auth, routing, and message submission while enforcing per-user and per-conversation limits.

Example: POST /messages writes message intent and returns a server-assigned message id.

Cache

Stores active session routing and recent conversation metadata to speed fan-out decisions.

Example: Online user-device map in Redis lets gateway route websocket delivery quickly.

Database

Durably stores messages and receipt states with conversation-scoped ordering.

Example: Conversation shard table stores sequence-numbered messages for replay.

Queue

Buffers delivery tasks and offline notifications with retry and dead-letter handling.

Example: If recipient is offline, delivery event is queued for push notification and later replay.

Architecture Diagrams

Visual flows below show where latency is paid and where load is absorbed. Use them as memory anchors in interviews.

Write Path (Send Message)

Persist first, then fan out for reliable delivery semantics.

Sender Client

Gateway/API

Message Store

Event Queue

Delivery Workers

Recipient Devices

Read/Replay Path (Reconnect)

Resume from last acknowledged cursor to avoid duplicates and gaps.

Recipient Client

Gateway/API

Last Ack Cursor

Message Store

Replay Stream

Client Ack Update

Design Evolution (v1 → v3)

v1: Reliable core messaging

Build now

1:1 and small-group durable messaging
WebSocket delivery with reconnect replay
Basic sent/delivered/read states

Avoid for now

Complex cross-region active-active ordering logic
Heavy media pipeline in same critical path

v2: Scale fan-out and operations

Build now

Conversation sharding and better load distribution
Queue-prioritized delivery and retry controls
Improved offline sync and multi-device consistency

Avoid for now

Overloading delivery path with non-critical real-time signals
Unbounded replay windows without cursor policies

v3: Advanced chat experiences

Build now

Large-channel optimization and batching
Regional routing optimization with resilience controls
Richer moderation and compliance workflows

Avoid for now

Feature sprawl that weakens core delivery SLOs
Tight coupling between messaging core and peripheral services

What Not to Build Initially

Strong system design is also about disciplined scope control.

Do not ship complex end-to-end search/history ranking in v1 before message delivery is stable.
Do not make typing/presence strongly consistent; keep them lightweight and lossy.
Do not overbuild global ordering guarantees when per-conversation ordering is sufficient.

How to Recognize This Pattern

Approach (Step-by-step)

Key Trade-offs

Strict ordering

Eventual local ordering

Push-heavy fan-out

Buffered/batched fan-out

Strong receipt guarantees

Best-effort receipts

Scale Realism (Numbers That Matter)

Hybrid Switching Rules (Operational Logic)

Read Path Deep Dive

Latency Budget Breakdown

Real-world Challenges

Reconnect replay storms

Duplicate delivery on retries

Group fan-out hot shards

Receipt state drift

What Interviewers Expect

Practice Problems

Relevant System Design Topics

Architecture Overview

API Layer

Cache

Database

Queue

Architecture Diagrams

Write Path (Send Message)

Read/Replay Path (Reconnect)

Design Evolution (v1 → v3)

v1: Reliable core messaging

v2: Scale fan-out and operations

v3: Advanced chat experiences

What Not to Build Initially