← Back to Design Thinking

Design Thinking

Evolutionary Design

Design for change, not perfection. Migration strategies, schema evolution, backward compatibility, incremental rollouts, and feature flags. Staff-level thinking.

Advanced24 min read

Staff engineers don't design for the present—they design for change. Systems evolve: requirements shift, scale grows, technology improves. Evolutionary design means building so that change is possible without catastrophic rewrites. This is staff-level thinking: planning for migrations, schema evolution, and incremental rollout before you need them.


Designing for Change, Not Perfection

The Reality

  • Requirements change: Product pivots, new features, new constraints
  • Scale changes: 10x growth forces different architecture
  • Technology changes: New databases, frameworks, platforms emerge
  • Org changes: Teams split, ownership shifts, Conway's Law applies

Evolutionary Design Principles

  1. Assume change: Don't optimize for today's snapshot
  2. Minimize lock-in: Avoid decisions that are hard to reverse
  3. Clear boundaries: Modular design makes replacement possible
  4. Version everything: APIs, schemas, configs
  5. Feature flags and gradual rollout: Ship change incrementally

Migration Strategies: Monolith to Microservices

Strangler Fig Pattern

Gradually replace monolith by routing new functionality to new services while keeping old code running.

Steps:

  1. Identify a bounded context to extract (e.g., "notifications")
  2. Create new service with the same interface as the monolith's module
  3. Route new traffic to new service via feature flag or routing rule
  4. Dual-write or sync data as needed
  5. Migrate reads to new service
  6. Migrate writes to new service
  7. Decommission old code in monolith

Parallel Run

Run old and new systems in parallel, compare results, switch when confident.

  • Use case: Critical path (payments, orders) where errors are costly
  • Cost: 2x infra during migration
  • Benefit: Validation before cutover

Database Migration Strategies

StrategyDowntimeRiskUse Case
Big bangYesHighRarely, small datasets
Dual-write, then cutoverMinimalMediumMost common
Change data capture (CDC)NoneLowLarge, high-traffic
Read replicas, flipBriefLowRead-heavy
Logical replicationNoneLowPostgres, etc.

Real Example: Netflix

Netflix migrated from monolith to microservices over years. They used:

  • Strangler fig for most services
  • Chaos engineering to validate resilience
  • Feature flags to route traffic gradually
  • Multiple phases: Not one big migration, many small ones

Schema Evolution & Backward Compatibility

The Challenge

  • Schema changes are inevitable: new fields, renames, type changes
  • Backward compatibility: Old clients must work with new schema
  • Forward compatibility: New clients must work with old schema (during rollout)

Strategies

Additive changes (safe):

  • Add optional fields
  • Add new tables/collections
  • Add new endpoints

Breaking changes (risky):

  • Remove fields
  • Change types
  • Rename fields
  • Change semantics

Handling Breaking Changes

  1. Versioned APIs: /v1/users, /v2/users. Old clients stay on v1.
  2. Deprecation period: Announce removal, give clients time to migrate, then remove
  3. Dual-write: Write to both old and new format during transition
  4. Expand-contract: Add new field (expand), migrate consumers, remove old (contract)

Example: Adding a Required Field

Wrong: Add required field, deploy. Old clients fail.

Right:

  1. Add field as optional. Deploy.
  2. Backfill data. Ensure all records have value.
  3. Make required in new version. Old API still accepts without it (default).
  4. Migrate consumers to send it.
  5. Eventually remove old API version.

Real Example: Stripe API

Stripe versions APIs (/v1/, 2023-10-16, etc.). They add fields additively. Renames or removals go through deprecation. Old versions supported for years.


Incremental Rollouts and Feature Flags

Why Incremental?

  • Reduce risk: One bad deploy doesn't affect everyone
  • Validate in production: 1% traffic can surface issues
  • Easy rollback: Turn off flag, no redeploy
  • A/B testing: Compare old vs new behavior

Rollout Strategies

StrategyUse CaseRollback
Percentage rollout1% → 10% → 50% → 100%Reduce %
CanaryNew version for one server/groupRoute back
User segmentInternal users, beta users firstExclude segment
GeographicOne region firstRoute away
Kill switchFeature flag to disableFlip flag

Feature Flags in Design

When designing, consider:

  • Where do we need flags? New code paths, experiments, migrations
  • How do we clean up? Flags have cost: complexity, tech debt
  • Who controls flags? Eng, product, ops
  • What's the blast radius? One flag or many?

Senior Insight

"Design the rollout before you design the feature. If you can't roll it out incrementally, you'll either delay launch or risk a big-bang deploy. Both are costly." — Plan for rollout as part of the design.


Case Studies: Netflix, Spotify, Stripe

Netflix

  • Migration: DVD to streaming, datacenter to cloud
  • Approach: Phased migration, chaos engineering, regional rollout
  • Lesson: Multi-year journey, not one project. Evolve continuously.

Spotify

  • Squad model: Small teams own services. Conway's Law in action.
  • Migration: Monolith to "microservices" (they call them something else)
  • Lesson: Org structure drove service boundaries. Migration followed team autonomy.

Stripe

  • API versioning: Multiple versions live. Deprecation with long runway.
  • Schema evolution: Additive changes, expand-contract for breaking
  • Lesson: Backward compatibility is a product commitment. Plan for it.

Thinking Aloud Like a Senior Engineer

Problem: "We need to migrate from MySQL to PostgreSQL. 100M rows, high traffic."

My first instinct: "Dual-write, sync, cutover."

But let me think about phases:

  1. Phase 1: Add PostgreSQL as read replica. Sync via CDC or dual-write. Validate data.
  2. Phase 2: Route read traffic to PostgreSQL (percentage-based). Compare results.
  3. Phase 3: Switch writes. Use feature flag: new writes go to both, or only PG with MySQL as fallback.
  4. Phase 4: Migrate remaining reads. Decommission MySQL.

Rollback: At each phase, we can revert. Phase 2: route back to MySQL. Phase 3: write to MySQL only. No big bang.

Schema: PostgreSQL and MySQL differ. We need an abstraction or adapter. Or: same schema in both during migration. Extra work but simpler.

Downtime: Zero if we do it right. Dual-write, then cutover writes, then cutover reads. Brief inconsistency window? Use distributed transaction or accept eventual consistency for that window.


Best Practices

  1. Assume migration: Design so components can be replaced
  2. Version APIs and schemas: From day one
  3. Prefer additive changes: Avoid breaking changes when possible
  4. Plan rollout: Percentage, canary, region—before building
  5. Clean up flags: Technical debt if left forever

Summary

Evolutionary design means:

  • Design for change—modular, replaceable components
  • Migration strategies—strangler fig, parallel run, phased cutover
  • Schema evolution—additive changes, versioning, deprecation
  • Incremental rollout—feature flags, percentage rollout, canary
  • Avoid big-bang—many small steps, each reversible

FAQs

Q: When is a big-bang migration acceptable?

A: Rarely. Only when: small dataset, low traffic, short downtime acceptable, and no incremental path is feasible. Even then, consider if there's a way to do it in phases.

Q: How do we handle schema changes in a distributed system?

A: Version the schema. Support multiple versions during transition. Use expand-contract: add new, migrate, remove old. CDC can help with async sync.

Q: How many feature flags are too many?

A: When they're hard to reason about, slow down release, or never get cleaned up. Aim to remove flags after rollout. Use a flag management system to track lifecycle.

Keep exploring

Design thinking works best when combined with practice. Explore more topics or apply what you've learned in our system design practice platform.