System design interview guide
Airbnb Reservation System Design
Guest pays for July 4 weekend while host calendar still shows open on another channel—channel manager sync and atomic date-range booking prevent double-booking nightmares. Focus on reservation core, not full marketplace search.
Problem statement
Reservation subsystem: calendars, date-range booking, payments, cancellations.
Introduction
Two people open the same viral beach house for the same weekend. Both see "available." Both enter card details. One gets a confirmation email. The other gets an error—or worse, both get charged and the host has two families at the door.
That is the nightmare. The job is simple to say and hard to build: never double-book overlapping nights on one listing.
Search, pricing, and host dashboards all sit on top of that rule. If two confirmed reservations claim the same night, every pretty UI becomes debt.
Interviewers do not want CRUD on rows. They want inventory, transactions, and honest talk about stale search vs strong checkout.
If you remember one thing: One authoritative calendar + concurrency control + payment saga—search is a hint, not a lock.
How to approach
Walk through the room like you are booking one trip—not listing every microservice.
- Ask scope — Host approval flow? Multi-listing carts? Cancellation policy depth?
- Draw the state machine —
draft → held → payment_pending → confirmed(pluscancelled,expired_hold). - One race — Two
POST /reservationsfor the same nights. Only one transaction commits. - Hold and pay — Soft lock with TTL; authorize then capture; sweeper for expired holds.
- Search last — Discovery can lag; checkout re-validates inside the transaction.
In the room: "I'll draw the reservation state machine, walk one concurrent booking race, then hold expiry and payment capture. Search indexing comes after the invariant is clear."
If you remember one thing: Start with one race on one listing's nights—everything else hangs off that.
Interview tips
Five exchanges that come up often. Each has what you might say, what they push on, and where to land.
Preventing double booking
You: "We check if the dates are free, then insert a reservation row."
They ask: "Two requests hit at the same millisecond—what stops both from passing the check?"
Land here: Put read and write in one transaction. Lock night rows (SELECT … FOR UPDATE) or rely on a unique constraint on (listing_id, night_date). First commit wins; second gets 409 conflict.
Time zones and night boundaries
You: "We store dates as UTC timestamps."
They ask: "Check-in is 3 PM local—how do you know which 'night' a guest booked?"
Land here: Anchor nights in listing-local time (midnight boundaries). Minimum-stay rules and daylight saving shifts depend on this—state it explicitly.
Idempotency on checkout
You: "If the user double-clicks Pay, we create two reservations."
They ask: "Mobile networks retry POST—how do you dedupe?"
Land here: Require Idempotency-Key on POST /v1/reservations. Same key returns the same reservation id—not a second hold.
Search vs authoritative calendar
You: "Search shows real-time availability from our index."
They ask: "User sees available, pays, then gets an error—was the system wrong?"
Land here: Search can be stale. Checkout re-reads authoritative night rows inside the booking transaction. 409 with clear copy ("someone else booked—refresh dates") is healthy contention—not always a bug.
Side effects after confirm
You: "We send confirmation email before returning 201."
They ask: "SMTP is slow—does the user wait?"
Land here: Confirm in the database first. Emit email, calendar sync, and search updates via outbox workers after commit—never block the booking response on SMTP.
If you remember one thing: After each push, name one mechanism—transaction boundary, idempotency key, outbox—not "we'll scale it."
Capacity estimation
Rough numbers keep you from promising impossible checkout on a viral listing.
| Concern | Angle |
|---|---|
| Hot listing | Contention on one listing's nights—queue checkout or optimistic UX ("someone else booked") |
| Read vs write | Search is read-heavy; checkout is write-light but must be exact |
| Index | Availability index for range queries, updated on confirm/cancel—lag OK for browsing, not for paying |
So we cannot: scan the whole database for "locked rows" at search time. Serialize conflicts at the listing calendar shard. We also cannot treat search results as a reservation—only the transactional calendar is truth at pay time.
If you remember one thing: A viral listing is one hot shard—short transactions and honest 409 UX beat wedging the database.
High-level architecture
What breaks if you treat booking like CRUD
"Check availability, then insert" outside one transaction is how double bookings happen. Caching "free nights" in Redis without tying it to the commit path is the same bug in a fancier costume.
What works: calendar owns truth, saga orchestrates pay
The listing / calendar service owns authoritative night rows per listing, sharded by listing_id. The booking orchestrator runs a saga: create hold → authorize payment → capture → confirm; on failure, compensate (release hold, void auth). Search / discovery reads a replica or index—eventually consistent. Notification workers consume outbox events.
Who owns what:
- Calendar service — Invariant enforcement; ACID transactions on night rows.
- Payments adapter — PCI boundary; idempotent charges.
- Booking API — Orchestration, timeouts, human-readable errors.
- Search indexer — Denormalized availability for filters—not source of truth.
[ Guest ] --> [ Booking API ]
|
+---------+---------+
| |
[ Calendar svc ] [ Payments ]
(ACID per listing) (authorize/capture)
|
| confirm: mark nights RESERVED
v
[ Reservation DB ] ----outbox----> [ Email / calendar / search indexer ]
In the room: Narrate the synchronous confirm path (few round trips) vs async side effects—never block confirm on email.
If you remember one thing: Calendar service is source of truth for nights; search is a downstream hint.
Core design approaches
Night inventory table
Each row (listing_id, night_date) is available | held | booked. A uniqueness constraint prevents double booking. Holds use held_until.
Good for interviews: easy to explain, easy to lock in a transaction.
Interval / exclusion
Postgres exclusion constraints on tstzrange are powerful for overlapping ranges. Many interviews stay at simpler night granularity—say which you pick.
Viral listing funnel
Optimistic: Many short-TTL holds; first capture wins; others get 409.
Queue: Serialize checkout attempts—fairer but slower UX.
If you remember one thing: Pick optimistic vs queue for hot listings and defend the fairness tradeoff.
Detailed design
Walk one reservation like production—not a diagram-only answer.
Write path (reserve)
BEGIN; lock listing calendar rows for the date range (orSELECT … FOR UPDATEon nights).- Verify all nights available; insert hold rows with
held_until = now + 15m. COMMIT; returnreservation_idand client secret for payment.- Authorize payment asynchronously; on success
BEGINcapture transaction: mark booked, clear hold, emit outbox event.
Same idempotency key on retry returns the existing reservation—no duplicate hold.
Read path (availability)
GET availability reads nights or a short-TTL cache for speed. Checkout re-runs the same check inside the booking transaction—never trust the GET alone at pay time.
In the room: Draw once: GET may be stale → POST re-validates inside txn.
If you remember one thing: Availability for browsing and availability for booking are two different reads.
Key challenges
For each, say what the guest or host sees if you get it wrong.
- Phantom reads — Availability checked outside the txn, then booked inside—lost update. Fix: transaction boundary includes the read.
- Payment vs hold mismatch — Auth expires before capture—reconciliation voids or re-auths; state machine must cover the edge.
- Modification — Changing dates means release old + acquire new—two-phase risk—saga or single txn if same listing.
- Search index lag — User sees available then fails checkout—UX copy must explain the race, not blame the user.
If you remember one thing: Every edge case needs a state transition—not an ops runbook you hand-wave away.
Scaling the system
- Shard calendar by
listing_id—hot listing is one shard hotspot; acceptable if transactions stay short. - Mitigate with checkout queue or optimistic "refresh and retry" UX.
- Read replicas for browsing; primary for booking transactions.
- Cache availability per listing with short TTL; invalidate on write.
If you remember one thing: One hot listing hurts more than average traffic—measure conflict rate and hold duration per listing.
Failure handling
| Failure | What user sees | Mitigation |
|---|---|---|
| Payment down | Cannot complete checkout | Hold expires; user retries; no charge |
| DB failover during txn | Stuck "processing" | Idempotent retry; reconcile orphan holds |
| Worker crash after auth | Bank shows pending; app unclear | Reconciliation completes capture or voids |
| Double submit | Two charges or two holds | Idempotency key maps to same reservation |
Degraded UX: Occasional 409 on a hot listing—expected contention.
Outage: Wrong double-booking—that is unacceptable.
If you remember one thing: Reconciliation jobs exist because workers crash—plan for orphan holds and stale auths.
API design
GET /v1/listings/{id}/availability?from=YYYY-MM-DD&to=YYYY-MM-DD
POST /v1/reservations
POST /v1/reservations/{id}/confirm
POST /v1/reservations/{id}/cancel
POST /v1/reservations
| Field | Role |
|---|---|
listing_id | Target |
check_in, check_out | Half-open range [check_in, check_out)—state clearly |
guest_count | Policy |
idempotency_key | Dedup retries |
GET /v1/listings/{id}/availability
| Param | Role |
|---|---|
from, to | Range in listing-local dates |
expand | Optional pricing components |
In simple terms: browse with GET (may be slightly stale), book with POST inside a transaction that locks nights.
Diagram:
GET availability (may be slightly stale index)
|
v
POST /reservations --> [ txn: lock nights + hold ] --> 201 + payment_client_secret
|
v
POST /payments/authorize (PCI boundary)
|
v
POST /reservations/{id}/confirm --> [ txn: capture + booked ]
Errors: 409 conflict; 410 hold expired; 402 payment failed.
In the room: Walk GET → POST hold → authorize → confirm. Say 409 is often healthy race, not always a bug.
If you remember one thing: idempotency_key on POST makes mobile retries safe.
Production angles
Marketplace reservations sit on the knife-edge between search (eventually consistent) and checkout (serializable truth). The expensive incidents are ghost holds, double books, and money in limbo—not 409 rates during a viral listing, which can be healthy if the product story is clear.
Spike in 409 on a viral property — race vs bug
What users saw — A trending stay blows up on social. Many guests get "dates no longer available" at checkout. Product asks if the site is broken. Engineering asks if inventory math is wrong.
Why — Organic contention produces 409 by design when only one transaction wins. Bad 409s come from split brains between cache, search index, and authoritative rows—different shape on dashboards (retry storms vs flat failure with weird state transitions).
What good teams do — Optimistic UI that refreshes availability after conflict. Optional fair queue or waitlist for drops. Metrics for conflict_after_index_said_available vs simple race. Seniors separate expected business contention from correctness bugs.
Ghost holds block the calendar — nobody paid, nobody can book
What users saw — Calendar blocked for hours with no completed booking. Hosts complain. Finance sees authorized cards with no matching reservation row, or orphan holds past held_until.
Why — Worker crashed after auth but before commit or release. Saga lost the compensating action. Client closed mid-flow. Payment idempotency does not automatically release inventory rows.
What good teams do — Sweeper jobs for held_until < now() with metrics on hold age distribution. Alert when p95 hold duration exceeds the product promise. Reconciliation between payment intent, reservation state, and availability locks—hourly if not continuous.
Payment authorized but capture delayed — two SLOs, one user journey
What users saw — Bank shows pending charge. App shows "processing." User books elsewhere thinking this one failed. Support refunds pile up.
Why — Auth succeeds at issuer. Capture delayed by partition, retries, or fraud holds. Inventory lock TTL does not match issuer behavior.
What good teams do — Reconciliation ledger linking payment_intent, reservation_id, and capture events. Void stale auths with documented SLA to finance. Separate SLOs for inventory correctness vs payment settlement. Customer-visible timeout after which holds release deterministically.
How to use this in an interview — Lead with search index staleness vs checkout truth: always re-check live inventory inside the booking transaction. Name one sweeper or reconciliation story for ghost holds.
Bottlenecks and tradeoffs
Search freshness vs booking truth
The tension — Fast discovery needs denormalized indexes; correct booking needs live night rows.
What breaks — Users trust search, fail checkout, blame the product.
What teams do — Re-validate in txn; metric stale conflict rate; honest UX copy.
Say in the interview — "Search is a hint; checkout is the lock."
Hold duration vs conversion
The tension — Long holds reduce double-book risk but anger hosts waiting for payment. Short holds increase 409s.
What breaks — Ghost holds or angry hosts—or both if sweeper lags.
What teams do — Tune TTL with experiments; align with payment auth window.
Say in the interview — Name your hold minutes and why.
Global vs local inventory
The tension — Sharding by listing is natural; cross-listing packages complicate atomicity.
What breaks — Partial confirms across listings.
What teams do — Scope honestly; saga across listings if product requires it.
Say in the interview — Start with one listing per transaction unless they expand scope.
If you remember one thing: Search can lag; checkout cannot lie. Holds and payment are two clocks—reconcile them.
What should stick
You do not need to memorize every box. After this guide, you should be able to:
- One invariant — No overlapping confirmed nights on one listing—everything else is downstream.
- Transaction boundary — Read availability and write hold/book in one ACID transaction per listing shard.
- Hold + saga — Soft lock with TTL; authorize then capture; sweeper and reconciliation for orphans.
- Search vs truth — Index may be stale; checkout re-reads authoritative rows; 409 can be healthy race.
- Outbox side effects — Email, calendar, search indexer after confirm—never block on SMTP.
Tell it in the room: "I shard the calendar by listing_id. Two concurrent POSTs race on night rows inside one transaction—first wins, second gets 409. Hold with TTL, authorize payment, capture on confirm, outbox for async work. Search is eventually consistent; checkout always re-validates live inventory."
Reference diagram

What interviewers expect
Calendar availability bitmap; transactional book; hold+capture; sync conflicts.
Interview workflow (template)
- Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
- Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
- APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
- Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
- Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
- Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).
Frequently asked follow-ups
- Double booking?
- iCal sync?
- Pricing?
- Cancel refund?
- Instant book?
Deep-dive questions and strong answer outlines
Book dates?
Transaction locks listing_id+date range; status held→confirmed; reject overlap.
External calendar?
Async import iCal; conflict detection flags host; eventual sync.
Pricing?
Nightly rates + fees computed at quote; lock quote id at hold.
Cancel?
Policy engine computes refund; release dates; idempotent refund API.
Instant book?
Skip host approve; still inventory lock required.
AI feedback on your design
After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.
FAQs
Q: Full Airbnb?
A: This scope is reservation subsystem only.
Q: Multi-room listings?
A: Inventory units count >1 per night.
Q: Taxes?
A: Quote service adds tax lines—brief.
Q: Disputes?
A: Workflow out of band.
Practice interactively
Open the practice session to use the canvas and stages, then review AI feedback.