← Back to practice catalog

System design interview guide

Design Vaccine Eligibility & Booking

TL;DR: A public program must verify **eligibility** under **changing regional rules**, expose **appointment inventory** honestly, and survive **thundering herds** when new cohorts qualify—without double-booking slots or leaking sensitive records. The interview blends **workflow**, **inventory**, **compliance**, and **ops** more than a typical CRUD app.

Problem statement

You’re designing eligibility verification and appointment booking for a national-scale immunization program: rules vary by region and cohort; users need search, book, cancel/reschedule, reminders; admins update supply and rules.

Constraints. Functional: identity to required level; rule-based eligibility; slot inventory; audit of decisions; admin tooling. Non-functionally: correctness of counts; fairness under load; availability during spikes. Scale: tens of millions of users; millions of bookings; regional variance.

Core: versioned rules + atomic inventory + immutable audit + spike handling.

Introduction

This prompt mixes marketplace inventory with policy engines and public trust. Weak answers treat slots as integers you decrement in the app layer without transactions. Strong answers add versioned rules, append-only audit, race-safe booking, and a spike story that is not “we autoscale to infinity.”

Governments and users remember wrong eligibility and double-booked vials more than your microservice diagram.

How to approach

Separate eligibility (read-heavy, cacheable with care) from booking (write-heavy, must be exact). Sketch the spike path: waiting room → signed token → transactional book. Always log rule_version on every eligibility decision.

Interview tips

  • Client-trusted eligibility is a non-starter—evaluate on the server with the same rule version as the audit record.
  • Last-slot races: use a unique constraint on (slot_id, user_id) or SELECT … FOR UPDATE on remaining inventory—mention both patterns.
  • PII: minimize fields; encrypt at rest; redact logs; aggregate reporting without row-level export to unauthorized roles.
  • Thundering herd: a waiting room is real backpressure—token issue rate should match booking capacity, not HTTP thread count.
  • Bad rule deploy: freeze new bookings, rollback config, re-evaluate open sessions—playbook language scores points.

Capacity estimation

DimensionNote
Concurrent users on cohort openOrders of magnitude above steady state—admission control required
Slots per siteHot sites contend on the same rows
Audit volumeOften retained longer than OLTP—separate store or partition
Reminder sendsBursts before appointments—queue workers

Implications: protect the booking API with tokens; shard inventory by region or site; never fan out email synchronously on the booking critical path.

High-level architecture

Identity (or government IdP) establishes who is calling. The rules service loads versioned policy from a config store, evaluates eligibility from attributes (age, region, job codes), and returns decision_id, rule_version, and reason codes for support (no raw PHI in logs). Scheduling exposes slot search (read replicas or cache). Booking runs short transactions on slot inventory. An audit pipeline appends immutable records to WORM or append-only storage. Notification workers consume outbox events.

[ User ] --> [ CDN / static ] --> [ Waiting room edge ]
                    |
                    v signed admission token (rate limited)
[ Eligibility API ] --> Rules engine (versioned) --> decision + audit
        |
        v
[ Slot search ] --> read-optimized (may be slightly stale)
        |
        v
[ Book slot ] --> [ TX: decrement remaining OR lock row ] --> confirm
        |
        +--> outbox --> reminders / calendar / analytics (async)

In the room: Say clearly: HTTP 200/201 after DB commit; email later; inventory truth is not “SMTP accepted.”

Core design approaches

Rules as data

Version every deploy; shadow-evaluate before flipping traffic; rollback = moving a pointer to a prior config version.

Inventory

Either a remaining count per slot with CHECK (remaining >= 0) or one row per seat with a unique booking—second pattern is stronger against lost updates under concurrency.

Spike handling

The edge issues tokens at a fixed RPS so the origin sees smooth load; queue discipline (FIFO vs lottery) is a product choice.

Detailed design

Eligibility

  1. User submits attributes (some verified externally).
  2. Server loads rules_vN, evaluates, persists an eligibility_decision row with rule_version.
  3. Cache decision reference in session with TTL; invalidate on rule bump if needed.

Booking

  1. User selects slot id from search (may be stale).
  2. POST /appointments with Idempotency-Key.
  3. BEGIN: verify eligibility still valid for rule_version; lock slot row; ensure remaining > 0; decrement; insert appointment; COMMIT.
  4. Return confirmation; notify asynchronously.

Key challenges

  • Rule vs inventory consistency: user passed eligibility yesterday; rules changed today—re-check at book time.
  • Double submit: idempotency keys and unique constraints.
  • Bot traffic: CAPTCHA, per-IP limits, device signals, staggered cohort opens.
  • Reporting: aggregate counts by region without PII exports—separate pipeline with RBAC.
  • Provider integration: calendar sync is often async reconcile—the appointment row in your DB is source of truth.

Scaling the system

  • Regional deployments for data residency; route global read traffic to nearest edge.
  • Read replicas for search; primary for booking transactions.
  • Partition the slots table by region_id or site_id.
  • Autoscale workers on queue depth, not only CPU.

Failure handling

ScenarioResponse
Rules service downOften fail closed for new eligibility—or serve last known read-only decisions with a banner (product call)
Slot DB partitionFail closed for booking in affected region
Notification failureAppointment still valid; retry reminders; DLQ alert
Wrong rule publishedFreeze; rollback; re-evaluate edge cases manually or in batch

API design

EndpointRole
POST /v1/eligibility:evaluateReturns decision_ref, rule_version
GET /v1/slotsSearch by geo and time window
POST /v1/appointmentsBook; Idempotency-Key required
DELETE /v1/appointments/{id}Cancel per policy

GET /v1/slots

ParamRole
lat, lng, radius_kmGeo filter
from, toTime window
rule_versionOptional client hint—server still validates

Booking flow diagram:

POST /eligibility:evaluate --> store decision + rule_version
GET /slots (cached / replica)
POST /appointments + Idempotency-Key
       --> TX: eligibility still OK + slot lock + decrement
       --> 201 Created

Errors: 409 slot gone; 403 ineligible under current rules; 429 rate limit.

Production angles

High-stakes booking flows combine traffic spikes, policy engines, and inventory that must never go negative—while auditors expect a paper trail for every “no.” Production pain is rarely “the button failed”; it is waiting rooms that never issue tokens, double books under concurrency, and support staring at an opaque ineligible flag during a crisis.

National news spike: site “up,” nobody can complete a booking

What it looks like — Status page is green; edge serves HTML; users queue forever or bounce at token exchange. Origin RPS looks healthy because few requests make it past admission control. Social feeds fill with “system is broken” while your p95 API latency is fine for authenticated paths that never fire.

Why it happensWaiting-room or token bucket misconfiguration: tokens/sec too low, clock skew between edge and origin, HMAC verification failing under key rotation, or feature flag accidentally requiring a header only new clients send. Load tests often skip the full browser path.

What good teams do — Dashboards on token issue rate, validation failure reasons (signature vs expiry vs scope), and end-to-end funnel from queue exit to appointment row. Runbooks with feature-flag bypass for verified cohorts when policy allows—ethics and legal pre-approved. Seniors treat admission control as a first-class service with SLOs, not a CDN add-on.

Inventory goes negative or double-booked under load

What it looks likeSupport finds two confirmations for one slot; DB constraints fire in logs; or worse, no constraint and negative remaining counts. Postmortems mention Black Friday patterns even for public-health scale.

Why it happensRead–modify–write in application code without serializable boundaries; optimistic locking with retry storms; cache of “slots left” that lies. Idempotency keys prevent duplicate charges but not duplicate locks if scope is wrong.

What good teams doSingle transaction (or one authoritative row per slot) that decrements only if count > 0; unique constraint on (slot_id, user_id) where business allows; postmortem on every negative row—no normalization as “edge case.” Reconciliation jobs for orphan holds vs payment state.

“Why was I denied?” — Support and regulators need an answer

What it looks like — User is angry; journalist calls; ombudsman request. Your audit log says eligible: false with no branch detail. Engineers grep rule engine versions manually.

Why it happens — Product shipped fast with boolean eligibility; rules composed in code without persisted decision records; PII concerns led to over-redaction.

What good teams doStructured reason codes and rule_version on every evaluation; explain object per branch (not raw PHI in logs—hashed inputs + code references); immutable append-only audit with tamper evidence where required. Seniors separate what we show the user from what we retain for compliance.

Spike diagram: where the funnel actually breaks

What it looks likeQueue wait SLO red while origin CPU is blue. That pattern means misplaced bottleneck.

[ News spike ] --> edge tokens/sec capped --> origin RPS flat
                        |
              queue wait time SLO monitored

What to measureToken issue vs validate success; booking transaction latency and conflict rate (409); audit write lag; rule evaluation errors by version. Correlate queue depth with geography—CDNs can mask regional token failures.

How to use this in an interview — Lead with concurrency on slots and explainability of policy—not just REST. Close with audit: what you would show an ombudsman after a bad deploy—rule_version, hashed decision inputs, not a clipboard of raw PHI.

Bottlenecks and tradeoffs

  • Fairness vs velocity: lottery for slots vs FIFO—political product choice.
  • Stale search vs fresh booking: honest UX copy reduces support tickets.

Reference diagram

High-level diagram for Design Vaccine Eligibility & Booking

What interviewers expect

  • Entities: identity (to required level), eligibility decision, site, slot, appointment, audit record.
  • Rules: versioned config; server-side evaluation only; explain decision for support (no raw PHI in logs).
  • Booking: atomic slot decrement or row-level lock; idempotency on submit; optional short hold.
  • Spikes: waiting room, token admission, rate limits, CDN for static—protect origin.
  • Compliance: immutable audit trail; admin override with reason; rollback story for bad rule deploy.
  • Notifications: reminders; opt-out; at-least-once delivery with dedupe.

Interview workflow (template)

  1. Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
  2. Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
  3. APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
  4. Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
  5. Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
  6. Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).

Frequently asked follow-ups

  • How do you prevent double-booking the last slot?
  • How do changing eligibility rules get deployed safely?
  • How do you handle a traffic spike when a new cohort opens?
  • What gets audited vs what lives only in operational DB?
  • How do you integrate with provider calendars?

Deep-dive questions and strong answer outlines

Walk through booking the last appointment at a site.

Transactional decrement of remaining count or lock slot row with unique constraint on (site, time, user) attempts. Idempotency key on submit; return 409 on conflict with user-friendly retry guidance.

How are eligibility rules stored and evaluated?

Versioned JSON or DSL evaluated server-side; log rule_version_id on each decision for audit. Never trust client-only checks for eligibility.

What if we publish wrong rules?

Rollback to prior version; re-evaluate affected users; communicate transparently. Immutable audit of who saw which version—compliance angle.

Production angles

  • Provider cancels day-of—**cascade** reschedule workflows; **notify** with backoff.
  • Regional rule bug—**freeze** bookings; **replay** eligibility with fixed evaluator version.

AI feedback on your design

After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.

FAQs

Q: Is a waiting room “fake”?

A: It’s admission control—real queues (SQS/Kafka) behind it for fairness and origin protection. Explain token handoff to booking service.

Q: Do I need blockchain for audit?

A: Almost certainly no. Append-only logs + WORM storage + access controls usually suffice. Don’t reach for novelty.

Q: How is this different from Ticketmaster?

A: Eligibility and compliance dominate; inventory is per site with medical constraints; fairness narrative differs from entertainment ticketing.

Q: Can users hold multiple appointment slots at once?

A: Product decision—often no or short holds to prevent hoarding. Implement with one active hold per user or atomic swap when booking confirms—state the fairness rule you enforce.

Practice interactively

Open the practice session to use the canvas and stages, then review AI feedback.