← Back to Design Thinking

Design Thinking

Trade-offs at Scale

Beyond buzzwords: when microservices are wrong, when eventual consistency fails, and how to choose strong consistency, latency vs throughput, and read vs write amplification.

Advanced26 min read

At scale, every choice has a cost. Microservices add operational overhead. Eventual consistency can lose money. Latency and throughput often conflict. Senior engineers don't just name technologies—they articulate the trade-offs at scale and justify when to break the "best practices."


When Microservices Are a Mistake

The Microservices Myth

"Microservices scale" is incomplete. Microservices enable:

  • Independent scaling of services
  • Independent deployment by teams
  • Technology diversity per service

But they add:

  • Operational complexity: Many services, many failure modes
  • Network latency: Cross-service calls add ms
  • Data consistency challenges: Distributed transactions, eventual consistency
  • Debugging difficulty: Tracing across services
  • Team overhead: Each service needs ownership

When to Stay Monolith

ScenarioWhy Monolith
Small team (under 10)Coordination overhead of services > benefit
Low scale (under 1M requests/day)Single deploy can handle it
Strong consistency needsTransactions across domains easier in monolith
Rapid iterationNo API versioning, no cross-service deploys
Unclear boundariesDon't split until domains are clear

Real Example: Stripe

Stripe started as a monolith. They scaled it to billions in payments. They split into services when:

  • Teams grew and needed independent deploy
  • Clear domain boundaries emerged (payments, billing, radar)
  • Operational maturity could support distributed systems

Senior Insight

"If we can't draw clear service boundaries, we shouldn't split. If one team owns it all, a monolith is faster. Microservices are an organizational pattern as much as a technical one." — often attributed to Conway's Law.


When Eventual Consistency Is Unacceptable

Domains Requiring Strong Consistency

  • Financial transactions: Double-spend is unacceptable
  • Inventory: Overselling is unacceptable
  • Seats/tickets: Double-booking is unacceptable
  • User identity/auth: Wrong user data is a security issue
  • Compliance: Audits require consistent records

Domains Tolerant of Eventual Consistency

  • Social feeds: Stale by seconds is fine
  • Recommendations: Stale is acceptable
  • Analytics: Eventually consistent is the norm
  • Counters (likes, views): Approximate is often OK
  • Caches: Stale by definition

The Cost of Strong Consistency

  • Latency: Cross-node coordination (2PC, Paxos) adds round-trips
  • Availability: Under partition, strong consistency may refuse writes (CAP)
  • Throughput: Coordination limits write throughput
  • Complexity: Distributed transactions are hard

Senior Decision Framework

  1. What breaks if inconsistent? Money, safety, compliance → strong
  2. What's the staleness tolerance? Seconds vs minutes vs hours
  3. What's the write volume? High write + strong consistency = hard
  4. Can we isolate consistency? Strong for critical path, eventual for rest

Example: E-commerce

  • Cart: Eventual OK (user might see stale cart briefly)
  • Checkout/payment: Strong (no double-charge)
  • Inventory: Strong or compensating (oversell = bad)
  • Recommendations: Eventual (stale is fine)

Choosing Strong Consistency Intentionally

When You Choose Strong Consistency

You're accepting:

  • Higher latency (coordinating writes)
  • Lower availability under partition (or complex failover)
  • More complex implementation (distributed tx, 2PC, etc.)

You're gaining:

  • Correctness guarantees
  • Simpler application logic (no conflict resolution)
  • Compliance and auditability

Implementation Options

ApproachUse CaseTrade-off
Single primary DBClassic SQLSingle point of failure
Synchronous replicationMulti-AZWrite latency = 2x network RTT
2PC / Distributed txCross-service consistencyHigh latency, blocking
Saga with compensationCross-service, asyncComplex rollback logic
Single-writer partitionSharded but consistent per partitionLimits write scale per key

Real Example: Financial Systems

Stripe, banks, and exchanges use strong consistency for balances and transactions. They pay with:

  • Lower write throughput (compared to eventually consistent stores)
  • Higher latency (synchronous replication)
  • Operational complexity (failover, split-brain prevention)

They gain: No double-spend, correct balances, audit trails.


Latency vs Throughput Trade-offs

Definitions

  • Latency: Time for one request (e.g., p50, p99)
  • Throughput: Requests per second the system can handle

The Trade-off

Often you optimize for one at the expense of the other:

Optimize ForSacrificeExample
Low latencyThroughputSynchronous, no batching, fewer connections
High throughputLatencyBatching, async, more queuing
p99 latencyp50More redundancy, caching, oversizing

Example: Write Path

  • Low latency: Write directly to primary, sync replicate. Each write = 1–2 round-trips.
  • High throughput: Batch writes, async replicate, buffer in queue. Latency = batch interval + processing.

Example: Read Path

  • Low latency: Read from nearest replica, maybe stale. Or: cache with short TTL.
  • High throughput: Read from many replicas, spread load. Latency may increase if replicas lag.

Senior Insight

"Are we latency-bound or throughput-bound?" If users wait on each request (e.g., search), optimize latency. If we process a backlog (e.g., batch jobs), optimize throughput.


Read Amplification vs Write Amplification

Read Amplification

One logical read triggers multiple physical reads.

Examples:

  • Fan-out: One feed request = read from N users' posts
  • Join: One query = read from multiple tables
  • Replication: Read from replica that lagged = re-read from primary

Mitigation: Denormalization, caching, pre-computation, materialized views.

Write Amplification

One logical write triggers multiple physical writes.

Examples:

  • Replication: 1 write → 3 replicas = 3 writes
  • Indexes: 1 row insert → N index updates
  • Event sourcing: 1 event → N downstream processors
  • Fan-out on write: 1 post → write to N followers' feed caches

Mitigation: Async replication, fewer indexes, batch updates, write coalescing.

The Trade-off

  • Read-heavy: Optimize reads. Denormalize, cache. Accept write amplification.
  • Write-heavy: Optimize writes. Minimize indexes, async fan-out. Accept read amplification or eventual consistency.

Example: Social Feed

Fan-out on read: 1 feed request reads from N friends. Read amplification = N. Low write amplification.

Fan-out on write: 1 post writes to N followers' feeds. Write amplification = N. Low read amplification (feed is pre-built).

Choice: Twitter uses fan-out on write for celebrities (many followers) and fan-out on read for regular users. Hybrid based on follower count.


Thinking Aloud Like a Senior Engineer

Problem: "Design a system for 10M writes/sec, strong consistency required."

My first instinct: "Kafka? Cassandra? Distributed something?"

But wait — strong consistency at 10M writes/sec is extremely hard. Let me clarify: Is it 10M logical writes or 10M physical? What's the consistency boundary? Per key or global?

Per-key consistency: Possible. Shard by key, each shard handles a fraction. Strong consistency per shard (single primary or Raft). 10M / 1000 shards = 10K writes/sec per shard. Doable.

Global consistency: Much harder. Distributed transactions don't scale to 10M/sec. Maybe we don't need global—just per-user or per-account?

Latency requirement? If we need under 10ms, we're limited. If we can do async with confirmation, easier.

I'd ask: "What does strong consistency mean here? Per account? Per transaction? And what's the acceptable latency for a write to be visible?"


Summary

Trade-offs at scale require first-principles thinking:

  • Microservices: Organizational benefit, not automatic scale. Monolith often wins for small teams and unclear boundaries.
  • Eventual vs strong consistency: Match to domain. Money, inventory, identity → strong. Feeds, analytics → eventual.
  • Latency vs throughput: Optimize for the bottleneck. Latency-bound (user waiting) vs throughput-bound (batch processing).
  • Read vs write amplification: Read-heavy → denormalize, cache. Write-heavy → minimize indexes, async fan-out.

FAQs

Q: How do I know if we need strong consistency?

A: Ask: "What breaks if two users see different data for 5 seconds?" If the answer is "money lost" or "safety issue," you need strong. If it's "slightly stale feed," eventual is fine.

Q: When should we split a monolith?

A: When you have: (1) clear domain boundaries, (2) multiple teams needing independent deploy, (3) different scaling needs per domain, and (4) operational readiness for distributed systems.

Q: How do I explain amplification to stakeholders?

A: "One user action triggers X backend operations. At scale, that's Y total operations. We need to reduce X or optimize each operation." Use numbers.

Keep exploring

Design thinking works best when combined with practice. Explore more topics or apply what you've learned in our system design practice platform.