Design Thinking

Evolutionary Design

Design for change, not perfection. Migration strategies, schema evolution, backward compatibility, incremental rollouts, and feature flags. Staff-level thinking.

Advanced24 min read

Staff engineers don't design for the present—they design for change. Systems evolve: requirements shift, scale grows, technology improves. Evolutionary design means building so that change is possible without catastrophic rewrites. This is staff-level thinking: planning for migrations, schema evolution, and incremental rollout before you need them.

Designing for Change, Not Perfection

The Reality

Requirements change: Product pivots, new features, new constraints
Scale changes: 10x growth forces different architecture
Technology changes: New databases, frameworks, platforms emerge
Org changes: Teams split, ownership shifts, Conway's Law applies

Evolutionary Design Principles

Assume change: Don't optimize for today's snapshot
Minimize lock-in: Avoid decisions that are hard to reverse
Clear boundaries: Modular design makes replacement possible
Version everything: APIs, schemas, configs
Feature flags and gradual rollout: Ship change incrementally

Migration Strategies: Monolith to Microservices

Strangler Fig Pattern

Gradually replace monolith by routing new functionality to new services while keeping old code running.

Steps:

Identify a bounded context to extract (e.g., "notifications")
Create new service with the same interface as the monolith's module
Route new traffic to new service via feature flag or routing rule
Dual-write or sync data as needed
Migrate reads to new service
Migrate writes to new service
Decommission old code in monolith

Parallel Run

Run old and new systems in parallel, compare results, switch when confident.

Use case: Critical path (payments, orders) where errors are costly
Cost: 2x infra during migration
Benefit: Validation before cutover

Database Migration Strategies

Strategy	Downtime	Risk	Use Case
Big bang	Yes	High	Rarely, small datasets
Dual-write, then cutover	Minimal	Medium	Most common
Change data capture (CDC)	None	Low	Large, high-traffic
Read replicas, flip	Brief	Low	Read-heavy
Logical replication	None	Low	Postgres, etc.

Real Example: Netflix

Netflix migrated from monolith to microservices over years. They used:

Strangler fig for most services
Chaos engineering to validate resilience
Feature flags to route traffic gradually
Multiple phases: Not one big migration, many small ones

Schema Evolution & Backward Compatibility

The Challenge

Schema changes are inevitable: new fields, renames, type changes
Backward compatibility: Old clients must work with new schema
Forward compatibility: New clients must work with old schema (during rollout)

Strategies

Additive changes (safe):

Add optional fields
Add new tables/collections
Add new endpoints

Breaking changes (risky):

Remove fields
Change types
Rename fields
Change semantics

Handling Breaking Changes

Versioned APIs: /v1/users, /v2/users. Old clients stay on v1.
Deprecation period: Announce removal, give clients time to migrate, then remove
Dual-write: Write to both old and new format during transition
Expand-contract: Add new field (expand), migrate consumers, remove old (contract)

Example: Adding a Required Field

Wrong: Add required field, deploy. Old clients fail.

Right:

Add field as optional. Deploy.
Backfill data. Ensure all records have value.
Make required in new version. Old API still accepts without it (default).
Migrate consumers to send it.
Eventually remove old API version.

Real Example: Stripe API

Stripe versions APIs (/v1/, 2023-10-16, etc.). They add fields additively. Renames or removals go through deprecation. Old versions supported for years.

Incremental Rollouts and Feature Flags

Why Incremental?

Reduce risk: One bad deploy doesn't affect everyone
Validate in production: 1% traffic can surface issues
Easy rollback: Turn off flag, no redeploy
A/B testing: Compare old vs new behavior

Rollout Strategies

Strategy	Use Case	Rollback
Percentage rollout	1% → 10% → 50% → 100%	Reduce %
Canary	New version for one server/group	Route back
User segment	Internal users, beta users first	Exclude segment
Geographic	One region first	Route away
Kill switch	Feature flag to disable	Flip flag

Feature Flags in Design

When designing, consider:

Where do we need flags? New code paths, experiments, migrations
How do we clean up? Flags have cost: complexity, tech debt
Who controls flags? Eng, product, ops
What's the blast radius? One flag or many?

Senior Insight

"Design the rollout before you design the feature. If you can't roll it out incrementally, you'll either delay launch or risk a big-bang deploy. Both are costly." — Plan for rollout as part of the design.

Case Studies: Netflix, Spotify, Stripe

Netflix

Migration: DVD to streaming, datacenter to cloud
Approach: Phased migration, chaos engineering, regional rollout
Lesson: Multi-year journey, not one project. Evolve continuously.

Spotify

Squad model: Small teams own services. Conway's Law in action.
Migration: Monolith to "microservices" (they call them something else)
Lesson: Org structure drove service boundaries. Migration followed team autonomy.

Stripe

API versioning: Multiple versions live. Deprecation with long runway.
Schema evolution: Additive changes, expand-contract for breaking
Lesson: Backward compatibility is a product commitment. Plan for it.

Thinking Aloud Like a Senior Engineer

Problem: "We need to migrate from MySQL to PostgreSQL. 100M rows, high traffic."

My first instinct: "Dual-write, sync, cutover."

But let me think about phases:

Phase 1: Add PostgreSQL as read replica. Sync via CDC or dual-write. Validate data.
Phase 2: Route read traffic to PostgreSQL (percentage-based). Compare results.
Phase 3: Switch writes. Use feature flag: new writes go to both, or only PG with MySQL as fallback.
Phase 4: Migrate remaining reads. Decommission MySQL.

Rollback: At each phase, we can revert. Phase 2: route back to MySQL. Phase 3: write to MySQL only. No big bang.

Schema: PostgreSQL and MySQL differ. We need an abstraction or adapter. Or: same schema in both during migration. Extra work but simpler.

Downtime: Zero if we do it right. Dual-write, then cutover writes, then cutover reads. Brief inconsistency window? Use distributed transaction or accept eventual consistency for that window.

Best Practices

Assume migration: Design so components can be replaced
Version APIs and schemas: From day one
Prefer additive changes: Avoid breaking changes when possible
Plan rollout: Percentage, canary, region—before building
Clean up flags: Technical debt if left forever

Summary

Evolutionary design means:

Design for change—modular, replaceable components
Migration strategies—strangler fig, parallel run, phased cutover
Schema evolution—additive changes, versioning, deprecation
Incremental rollout—feature flags, percentage rollout, canary
Avoid big-bang—many small steps, each reversible

FAQs

Q: When is a big-bang migration acceptable?

A: Rarely. Only when: small dataset, low traffic, short downtime acceptable, and no incremental path is feasible. Even then, consider if there's a way to do it in phases.

Q: How do we handle schema changes in a distributed system?

A: Version the schema. Support multiple versions during transition. Use expand-contract: add new, migrate, remove old. CDC can help with async sync.

Q: How many feature flags are too many?

A: When they're hard to reason about, slow down release, or never get cleaned up. Aim to remove flags after rollout. Use a flag management system to track lifecycle.

Apply This Thinking

Practice what you've learned with these related system design questions:

Design Netflix

Apply evolutionary design to a system that migrated from DVDs to streaming.

Hard

Design Twitter

Consider feed algorithm evolution and schema changes at scale.

Hard

Design Amazon

Reason about catalog evolution, recommendations, and multi-year migrations.

Hard

Explore More Practice Questions →

Keep exploring

Design thinking works best when combined with practice. Explore more topics or apply what you've learned in our system design practice platform.

View All Topics Practice System Design →