Design Thinking

Trade-offs at Scale

Beyond buzzwords: when microservices are wrong, when eventual consistency fails, and how to choose strong consistency, latency vs throughput, and read vs write amplification.

Advanced26 min read

At scale, every choice has a cost. Microservices add operational overhead. Eventual consistency can lose money. Latency and throughput often conflict. Senior engineers don't just name technologies—they articulate the trade-offs at scale and justify when to break the "best practices."

When Microservices Are a Mistake

The Microservices Myth

"Microservices scale" is incomplete. Microservices enable:

Independent scaling of services
Independent deployment by teams
Technology diversity per service

But they add:

Operational complexity: Many services, many failure modes
Network latency: Cross-service calls add ms
Data consistency challenges: Distributed transactions, eventual consistency
Debugging difficulty: Tracing across services
Team overhead: Each service needs ownership

When to Stay Monolith

Scenario	Why Monolith
Small team (under 10)	Coordination overhead of services > benefit
Low scale (under 1M requests/day)	Single deploy can handle it
Strong consistency needs	Transactions across domains easier in monolith
Rapid iteration	No API versioning, no cross-service deploys
Unclear boundaries	Don't split until domains are clear

Real Example: Stripe

Stripe started as a monolith. They scaled it to billions in payments. They split into services when:

Teams grew and needed independent deploy
Clear domain boundaries emerged (payments, billing, radar)
Operational maturity could support distributed systems

Senior Insight

"If we can't draw clear service boundaries, we shouldn't split. If one team owns it all, a monolith is faster. Microservices are an organizational pattern as much as a technical one." — often attributed to Conway's Law.

When Eventual Consistency Is Unacceptable

Domains Requiring Strong Consistency

Financial transactions: Double-spend is unacceptable
Inventory: Overselling is unacceptable
Seats/tickets: Double-booking is unacceptable
User identity/auth: Wrong user data is a security issue
Compliance: Audits require consistent records

Domains Tolerant of Eventual Consistency

Social feeds: Stale by seconds is fine
Recommendations: Stale is acceptable
Analytics: Eventually consistent is the norm
Counters (likes, views): Approximate is often OK
Caches: Stale by definition

The Cost of Strong Consistency

Latency: Cross-node coordination (2PC, Paxos) adds round-trips
Availability: Under partition, strong consistency may refuse writes (CAP)
Throughput: Coordination limits write throughput
Complexity: Distributed transactions are hard

Senior Decision Framework

What breaks if inconsistent? Money, safety, compliance → strong
What's the staleness tolerance? Seconds vs minutes vs hours
What's the write volume? High write + strong consistency = hard
Can we isolate consistency? Strong for critical path, eventual for rest

Example: E-commerce

Cart: Eventual OK (user might see stale cart briefly)
Checkout/payment: Strong (no double-charge)
Inventory: Strong or compensating (oversell = bad)
Recommendations: Eventual (stale is fine)

Choosing Strong Consistency Intentionally

When You Choose Strong Consistency

You're accepting:

Higher latency (coordinating writes)
Lower availability under partition (or complex failover)
More complex implementation (distributed tx, 2PC, etc.)

You're gaining:

Correctness guarantees
Simpler application logic (no conflict resolution)
Compliance and auditability

Implementation Options

Approach	Use Case	Trade-off
Single primary DB	Classic SQL	Single point of failure
Synchronous replication	Multi-AZ	Write latency = 2x network RTT
2PC / Distributed tx	Cross-service consistency	High latency, blocking
Saga with compensation	Cross-service, async	Complex rollback logic
Single-writer partition	Sharded but consistent per partition	Limits write scale per key

Real Example: Financial Systems

Stripe, banks, and exchanges use strong consistency for balances and transactions. They pay with:

Lower write throughput (compared to eventually consistent stores)
Higher latency (synchronous replication)
Operational complexity (failover, split-brain prevention)

They gain: No double-spend, correct balances, audit trails.

Latency vs Throughput Trade-offs

Definitions

Latency: Time for one request (e.g., p50, p99)
Throughput: Requests per second the system can handle

The Trade-off

Often you optimize for one at the expense of the other:

Optimize For	Sacrifice	Example
Low latency	Throughput	Synchronous, no batching, fewer connections
High throughput	Latency	Batching, async, more queuing
p99 latency	p50	More redundancy, caching, oversizing

Example: Write Path

Low latency: Write directly to primary, sync replicate. Each write = 1–2 round-trips.
High throughput: Batch writes, async replicate, buffer in queue. Latency = batch interval + processing.

Example: Read Path

Low latency: Read from nearest replica, maybe stale. Or: cache with short TTL.
High throughput: Read from many replicas, spread load. Latency may increase if replicas lag.

Senior Insight

"Are we latency-bound or throughput-bound?" If users wait on each request (e.g., search), optimize latency. If we process a backlog (e.g., batch jobs), optimize throughput.

Read Amplification vs Write Amplification

Read Amplification

One logical read triggers multiple physical reads.

Examples:

Fan-out: One feed request = read from N users' posts
Join: One query = read from multiple tables
Replication: Read from replica that lagged = re-read from primary

Mitigation: Denormalization, caching, pre-computation, materialized views.

Write Amplification

One logical write triggers multiple physical writes.

Examples:

Replication: 1 write → 3 replicas = 3 writes
Indexes: 1 row insert → N index updates
Event sourcing: 1 event → N downstream processors
Fan-out on write: 1 post → write to N followers' feed caches

Mitigation: Async replication, fewer indexes, batch updates, write coalescing.

The Trade-off

Read-heavy: Optimize reads. Denormalize, cache. Accept write amplification.
Write-heavy: Optimize writes. Minimize indexes, async fan-out. Accept read amplification or eventual consistency.

Fan-out on read: 1 feed request reads from N friends. Read amplification = N. Low write amplification.

Fan-out on write: 1 post writes to N followers' feeds. Write amplification = N. Low read amplification (feed is pre-built).

Choice: Twitter uses fan-out on write for celebrities (many followers) and fan-out on read for regular users. Hybrid based on follower count.

Thinking Aloud Like a Senior Engineer

Problem: "Design a system for 10M writes/sec, strong consistency required."

My first instinct: "Kafka? Cassandra? Distributed something?"

But wait — strong consistency at 10M writes/sec is extremely hard. Let me clarify: Is it 10M logical writes or 10M physical? What's the consistency boundary? Per key or global?

Per-key consistency: Possible. Shard by key, each shard handles a fraction. Strong consistency per shard (single primary or Raft). 10M / 1000 shards = 10K writes/sec per shard. Doable.

Global consistency: Much harder. Distributed transactions don't scale to 10M/sec. Maybe we don't need global—just per-user or per-account?

Latency requirement? If we need under 10ms, we're limited. If we can do async with confirmation, easier.

I'd ask: "What does strong consistency mean here? Per account? Per transaction? And what's the acceptable latency for a write to be visible?"

Summary

Trade-offs at scale require first-principles thinking:

Microservices: Organizational benefit, not automatic scale. Monolith often wins for small teams and unclear boundaries.
Eventual vs strong consistency: Match to domain. Money, inventory, identity → strong. Feeds, analytics → eventual.
Latency vs throughput: Optimize for the bottleneck. Latency-bound (user waiting) vs throughput-bound (batch processing).
Read vs write amplification: Read-heavy → denormalize, cache. Write-heavy → minimize indexes, async fan-out.

FAQs

Q: How do I know if we need strong consistency?

A: Ask: "What breaks if two users see different data for 5 seconds?" If the answer is "money lost" or "safety issue," you need strong. If it's "slightly stale feed," eventual is fine.

Q: When should we split a monolith?

A: When you have: (1) clear domain boundaries, (2) multiple teams needing independent deploy, (3) different scaling needs per domain, and (4) operational readiness for distributed systems.

Q: How do I explain amplification to stakeholders?

A: "One user action triggers X backend operations. At scale, that's Y total operations. We need to reduce X or optimize each operation." Use numbers.

Apply This Thinking

Practice what you've learned with these related system design questions:

Design a Payment Gateway

Apply strong consistency and correctness-at-scale thinking.

Hard

Design Instagram

Reason about read vs write amplification and consistency domains.

Hard

Design Netflix

Balance latency, throughput, and consistency for video streaming.

Hard

Explore More Practice Questions →

Keep exploring

Design thinking works best when combined with practice. Explore more topics or apply what you've learned in our system design practice platform.

View All Topics Practice System Design →

Trade-offs at Scale

When Microservices Are a Mistake

The Microservices Myth

When to Stay Monolith

Real Example: Stripe

Senior Insight

When Eventual Consistency Is Unacceptable

Domains Requiring Strong Consistency

Domains Tolerant of Eventual Consistency

The Cost of Strong Consistency

Senior Decision Framework

Example: E-commerce

Choosing Strong Consistency Intentionally

When You Choose Strong Consistency

Implementation Options

Real Example: Financial Systems

Latency vs Throughput Trade-offs

Definitions

The Trade-off

Example: Write Path

Example: Read Path

Senior Insight

Read Amplification vs Write Amplification

Read Amplification

Write Amplification

The Trade-off

Example: Social Feed

Thinking Aloud Like a Senior Engineer

Summary

FAQs

Apply This Thinking

Design a Payment Gateway

Design Instagram

Design Netflix

Keep exploring