← Back to patterns hub

System design pattern

Messaging / Chat

Design a reliable, low-latency chat system with ordering guarantees, delivery states, and real-time fan-out across devices.

HardReal-timeScalabilityConsistencyReliability

How to Recognize This Pattern

  • The prompt involves real-time conversation, online/offline users, and multi-device behavior.
  • You hear delivery status, unread counts, and reconnect/replay requirements.
  • Ordering, dedupe, and eventual consistency are all mentioned together.
  • The main challenge is not storing text; it is reliable delivery under network churn.

Approach (Step-by-step)

This is where senior candidates show decision quality, not just component naming.

  1. 1

    Clarify chat scope, conversation types, and message semantics first.

  2. 2

    Design durable write path with sequence assignment and idempotency keys.

  3. 3

    Design real-time delivery path for online users via gateway + pub-sub.

  4. 4

    Design offline path: queue notifications and cursor-based replay.

  5. 5

    Define delivery/read receipt state machine and source-of-truth updates.

  6. 6

    Plan sharding strategy by conversation/user and load hotspots.

  7. 7

    Design failure handling for disconnects, retries, duplicates, and delayed acks.

  8. 8

    Define observability for delivery latency, ack timeout, queue lag, and reconnect success.

Key Trade-offs

Think of this as decision math: where does load move, what fails first, and what user experience are you willing to protect.

Strict ordering

Simplifies user expectation in a conversation, but coordination overhead rises under scale and multi-region writes.

Eventual local ordering

Improves availability and throughput, but users may briefly see reordering during reconnect.

Decision lens: Guarantee strict order per conversation shard; avoid unnecessary global ordering constraints.

Push-heavy fan-out

Great for low-latency online delivery, but can amplify load in large groups/channels.

Buffered/batched fan-out

Lower infrastructure pressure for very large audiences, but can slightly increase perceived freshness delay.

Decision lens: Use direct push for small chats and buffered fan-out for high-member conversations.

Strong receipt guarantees

Higher confidence in delivery/read state, but more write amplification and state sync complexity.

Best-effort receipts

Lower cost and simpler operations, but occasional receipt inconsistencies can confuse users.

Decision lens: Keep sent/delivered durable; treat read receipts with bounded eventual consistency.

Scale Realism (Numbers That Matter)

  • Follower distribution: Chat load is skewed by conversation type: most are low-volume, but a few large groups/channels dominate fan-out traffic.
  • Traffic profile: A plausible global profile is 40k-120k message writes/sec with read receipts/typing events often 2x-5x message volume.
  • Latency target: Target p95 send-to-deliver under 250ms for online recipients, with deterministic replay for offline devices.
  • Failure envelope: If websocket disconnect rate spikes, queue lag exceeds 10s, or ack timeout ratio crosses 2%, trigger degraded mode and prioritize core message delivery.

Hybrid Switching Rules (Operational Logic)

These rules make hybrid strategy measurable and observable.

  • Conversation-size rule: small chats use direct fan-out; very large channels use buffered/batched fan-out.
  • Reconnect rule: clients request incremental replay from last acknowledged message cursor.
  • Ack-timeout rule: if delivery ack timeout grows, reduce non-critical events (typing/presence) first.
  • Backpressure rule: if queue lag crosses threshold, prioritize durable message events over read-receipt updates.

Read Path Deep Dive

  • Message ordering should be defined per conversation, not globally.
  • Use monotonic sequence numbers per conversation shard to simplify replay and dedupe.
  • Treat typing and presence as lossy, but message delivery as durable.
  • Read receipts need explicit semantics to avoid cross-device confusion.
  • Offline replay must be bounded and cursor-driven to prevent unbounded resync storms.

Latency Budget Breakdown

Map each component to a concrete budget so p95 targets are enforceable.

ComponentTarget (ms)Why this budget
Gateway + auth15Session validation and routing lookup.
Durable write45Persist message and assign conversation sequence.
Fan-out dispatch70Push to online recipients and queue offline notifications.
Client ack processing35Track delivered/read states with idempotent updates.
Response + state emit20Return send ack and update sender timeline.

Real-world Challenges

Reconnect replay storms

Network churn can trigger mass replay requests; without cursor windows and throttling, backend load spikes hard.

Duplicate delivery on retries

At-least-once transport creates duplicates unless idempotency and sequence-based dedupe are enforced.

Group fan-out hot shards

Large channels can overload single partitions if conversation sharding and batching are weak.

Receipt state drift

Multi-device updates can diverge if read-state merges are not deterministic.

What Interviewers Expect

  • You clearly define delivery semantics and what each state means.
  • You show online delivery and offline replay flows end to end.
  • You explain ordering guarantees and their scope.
  • You design idempotency/dedupe for retries and reconnects.
  • You discuss scaling strategy for large-group fan-out.
  • You separate critical message path from non-critical presence/typing events.

Practice Problems

These practice sessions map directly to timeline/fan-out decisions. Start with one, then revisit this guide and evaluate where your design leaked latency, correctness, or cost.

Architecture Overview

Read this section as a request journey: API receives intent, cache protects latency, database protects correctness, and queue protects the system during spikes. If one box fails, define how the next box keeps user impact limited.

API Layer

Handles auth, routing, and message submission while enforcing per-user and per-conversation limits.

Example: POST /messages writes message intent and returns a server-assigned message id.

Cache

Stores active session routing and recent conversation metadata to speed fan-out decisions.

Example: Online user-device map in Redis lets gateway route websocket delivery quickly.

Database

Durably stores messages and receipt states with conversation-scoped ordering.

Example: Conversation shard table stores sequence-numbered messages for replay.

Queue

Buffers delivery tasks and offline notifications with retry and dead-letter handling.

Example: If recipient is offline, delivery event is queued for push notification and later replay.

Architecture Diagrams

Visual flows below show where latency is paid and where load is absorbed. Use them as memory anchors in interviews.

Write Path (Send Message)

Persist first, then fan out for reliable delivery semantics.

Sender Client
Gateway/API
Message Store
Event Queue
Delivery Workers
Recipient Devices

Read/Replay Path (Reconnect)

Resume from last acknowledged cursor to avoid duplicates and gaps.

Recipient Client
Gateway/API
Last Ack Cursor
Message Store
Replay Stream
Client Ack Update

Design Evolution (v1 → v3)

v1: Reliable core messaging

Build now

  • 1:1 and small-group durable messaging
  • WebSocket delivery with reconnect replay
  • Basic sent/delivered/read states

Avoid for now

  • Complex cross-region active-active ordering logic
  • Heavy media pipeline in same critical path

v2: Scale fan-out and operations

Build now

  • Conversation sharding and better load distribution
  • Queue-prioritized delivery and retry controls
  • Improved offline sync and multi-device consistency

Avoid for now

  • Overloading delivery path with non-critical real-time signals
  • Unbounded replay windows without cursor policies

v3: Advanced chat experiences

Build now

  • Large-channel optimization and batching
  • Regional routing optimization with resilience controls
  • Richer moderation and compliance workflows

Avoid for now

  • Feature sprawl that weakens core delivery SLOs
  • Tight coupling between messaging core and peripheral services

What Not to Build Initially

Strong system design is also about disciplined scope control.

  • Do not ship complex end-to-end search/history ranking in v1 before message delivery is stable.
  • Do not make typing/presence strongly consistent; keep them lightweight and lossy.
  • Do not overbuild global ordering guarantees when per-conversation ordering is sufficient.