← Back to Design Thinking

Design Thinking

Non-Functional Requirements Deep Dive

Security vs UX trade-offs, compliance (PII, GDPR, SOC2), multi-region vs multi-AZ, and observability as part of system design. Senior interview gold.

Advanced24 min read

Non-functional requirements (NFRs) often drive more architectural decisions than functional ones. Security, compliance, and observability shape where data lives, how it flows, and what we build. Senior engineers treat NFRs as first-class design constraints, not afterthoughts.


Security Trade-offs: Auth vs UX

The Tension

Strong security often hurts UX:

  • MFA adds friction
  • Session timeouts force re-login
  • Rate limiting blocks legitimate users
  • Strict password rules frustrate users

Smooth UX can weaken security:

  • "Remember me" extends session risk
  • Social login reduces control
  • Fewer auth steps = easier for attackers
  • Password-less can have different attack surface

Senior Approach

  1. Segment by risk: High-value actions (payments, settings) need stronger auth. Low-risk (browsing) can be lighter.
  2. Progressive auth: Require step-up (MFA, re-auth) for sensitive operations.
  3. Context-aware: Require re-auth for new device, new location, or after long idle.
  4. Quantify the trade-off: "MFA reduces account takeover by X% but adds Y seconds. For payments, we accept Y. For feed view, we don't."

Example: Banking App

  • Login: MFA required. UX cost accepted for security.
  • View balance: Session token OK. Low risk.
  • Transfer money: Re-auth or step-up MFA. High risk.
  • Change password: Full re-auth. Critical.

Interview Tip

"When we have auth vs UX trade-off, I segment by risk. For [sensitive operation], we'd require [stronger auth]. For [low-risk operation], session token is fine. I'd also consider step-up auth—only prompt when risk increases."


Compliance-Driven Design: PII, GDPR, SOC2

PII (Personally Identifiable Information)

What it is: Data that identifies a person (name, email, phone, IP, etc.)

Design implications:

  • Encryption at rest and in transit: Default for PII
  • Access control: Who can read/write? Audit logs.
  • Retention: How long do we keep it? Deletion policy?
  • Minimization: Collect only what we need
  • Purpose limitation: Use only for stated purpose

GDPR

Key requirements:

  • Right to access: User can request their data
  • Right to deletion: "Right to be forgotten"
  • Data portability: Export in machine-readable format
  • Consent: Explicit consent for processing
  • Data residency: EU data in EU (for some interpretations)

Design implications:

  • User ID as key: Need to find and delete all user data
  • Soft delete first: Hard delete after retention
  • Export pipeline: Generate and deliver user data export
  • Consent tracking: Store what user consented to, when
  • Data map: Know where PII lives (DBs, logs, backups)

SOC2

Focus: Security, availability, confidentiality, processing integrity, privacy.

Design implications:

  • Access logging: Who accessed what, when
  • Change management: Audit trail for config and code changes
  • Incident response: Documented process, runbooks
  • Encryption: Standard for sensitive data
  • Redundancy: Availability requirements

Senior Insight

"Compliance isn't a feature you add later. It shapes data model (can we delete by user?), storage (where does data live?), and access (who can see what?). Design for it from the start or pay for retrofits."


Multi-Region vs Multi-AZ Decisions

Multi-AZ (Same Region)

What it is: Deploy across multiple availability zones in one region (e.g., us-east-1a, us-east-1b).

Pros:

  • Protects against AZ failure
  • Lower latency (same region)
  • Simpler than multi-region
  • Often sufficient for most apps

Cons:

  • Region outage = full outage
  • Data residency: all in one region
  • Latency: distant users still far

When to use: Default for production. Protects against AZ failure, not region failure.

Multi-Region

What it is: Deploy in multiple regions (e.g., us-east, eu-west, ap-south).

Pros:

  • Region failure = failover to another
  • Lower latency for global users (serve from nearest)
  • Data residency (EU data in EU)
  • Compliance (some regulations require it)

Cons:

  • Complex: replication, conflict resolution, failover
  • Cost: 2–3x infra
  • Data consistency across regions is hard
  • Operational burden

When to use:

  • Global user base with latency requirements
  • Compliance (data residency)
  • Region-level redundancy requirement
  • Business continuity for region outage

Decision Framework

RequirementMulti-AZMulti-Region
AZ failure protection
Region failure protection
Global low latency
EU data in EU
Simplicity
CostLower2–3x

Senior Interview Move

"I'd start with multi-AZ for redundancy. If we have global users and need low latency, or if we have EU users and need data residency, I'd add multi-region. Multi-region adds complexity—replication, failover, consistency—so I'd only do it when the requirement justifies it."


Observability as Part of Design

The Three Pillars

  • Logs: Discrete events. "User X did Y at Z." Debugging, audit.
  • Metrics: Aggregated numbers. Request rate, error rate, latency p99. Dashboards, alerting.
  • Traces: Request flow across services. "This request hit A, then B, then C." Distributed debugging.

Design Implications

Logs:

  • Structured format (JSON). Include request ID, user ID, timestamp.
  • Log levels: debug, info, warn, error.
  • PII: Redact or hash in logs. GDPR consideration.
  • Retention: How long? Cost and compliance.
  • Centralized: Ship to CloudWatch, Datadog, etc.

Metrics:

  • Key metrics: latency (p50, p99), throughput (RPS), error rate.
  • Business metrics: signups, conversions, revenue.
  • Export: Prometheus, StatsD, or cloud native.
  • Dashboards: Pre-define what we need to see.

Traces:

  • Trace ID propagated across services.
  • Span per service/DB call. Parent-child relationship.
  • Sampling at scale (can't trace every request).
  • Tools: Jaeger, Zipkin, X-Ray, Datadog APM.

Senior Approach

"Observability isn't bolted on. We need: (1) Request IDs for tracing, (2) Structured logs with context, (3) Key metrics (latency, errors, throughput) from day one, (4) Dashboards for operational visibility. I'd design the logging and tracing format into the service interfaces."


Compliance Failure Stories

Example: Unplanned Data Deletion

A company had user data spread across 50+ services and tables. GDPR deletion request. No single "delete user" path. Manual process took weeks. Lesson: Design for deletion. User ID as partition key. Cascade or job to purge.

Example: Logs Containing PII

A company logged full request bodies for debugging. Request bodies contained passwords, tokens, SSN. SOC2 audit found it. Costly remediation: purge logs, fix logging, retrain. Lesson: Never log PII. Redact by default.

Example: Single Region, Region Outage

A company had multi-AZ but single region (us-east). AWS us-east outage. Full downtime for hours. Lesson: For critical systems, multi-region may be worth the complexity. Evaluate based on business impact.


Thinking Aloud Like a Senior Engineer

Problem: "Design a system that stores user health data. HIPAA implied."

My first thought: Compliance drives the design. HIPAA = encryption, access control, audit logs, BAA with vendors.

Data model: User health data. PII + PHI. Encrypt at rest (KMS). Encrypt in transit (TLS). Access control: only authorised roles. Audit: who accessed what, when. Retention: HIPAA has rules. Deletion: user can request; we must purge.

Infra: HIPAA-eligible services (AWS has a list). No logging PII/PHI. Logs to isolated, access-controlled store.

Multi-region?: Depends. If US-only, single region + multi-AZ may suffice. If global, data residency matters. EU health data = different rules. I'd clarify: US only or global?

Observability: Metrics (latency, errors) without PII. Traces with trace ID, no PHI in span names. Logs redacted.


Summary

NFRs are design drivers:

  • Security vs UX: Segment by risk, use step-up auth
  • Compliance: PII, GDPR, SOC2—design for deletion, encryption, audit, residency
  • Multi-region vs multi-AZ: Multi-AZ default; multi-region for global latency or compliance
  • Observability: Logs, metrics, traces—design in from the start, redact PII

FAQs

Q: How do I bring up compliance in an interview?

A: "Are there compliance requirements—GDPR, SOC2, HIPAA? That would affect data residency, encryption, and retention." Or assume: "I'll assume we need to handle PII responsibly—encryption, access control, deletion path."

Q: When is multi-region worth the cost?

A: When you have: global users with latency needs, data residency requirements, or business continuity needs that justify 2–3x infra cost and operational complexity.

Q: What's the most common NFR mistake?

A: Treating observability as an afterthought. You can't debug or operate what you can't see. Design logs, metrics, and traces into the architecture from the start.

Keep exploring

Design thinking works best when combined with practice. Explore more topics or apply what you've learned in our system design practice platform.