← Back to practice catalog

System design interview guide

Design Airbnb Reservation System

TL;DR: Guests search listings and book **date ranges** on **one calendar per listing**. The job is to never **double-book** overlapping nights while still feeling fast enough to complete checkout—holds expire, payments fail, and two people will absolutely click the same viral listing at once. Strong answers sound like **inventory** and **transactions**, not CRUD on rows.

Problem statement

You’re designing the reservation subsystem for short-term rentals: search availability, create reservation with hold, payment, confirm/cancel/modify under policy, optional host approval.

Constraints. Functional: no double-booking for overlapping nights; predictable state machine; auditable transitions. Non-functionally: strong correctness on inventory; latency acceptable for human checkout flows. Scale: many listings; spiky demand on popular cities.

Story: authoritative calendar + concurrency control + payment saga—search is secondary to truth at booking time.

Introduction

Marketplace booking is a distributed systems problem wearing a CRUD costume. The invariant is sharp: no two confirmed reservations may claim the same listing for overlapping nights. Search, pricing, and host dashboards are downstream of that truth—if you get the invariant wrong, every pretty UI is debt.

Interviewers dock “we check availability then insert” without a transaction. They reward night-level inventory (or equivalent), hold TTL, payment saga, and honest stale search versus strong checkout.

How to approach

Draw the state machine: draft → held → payment_pending → confirmed (plus cancelled, expired_hold). Walk one race: two POSTs for the same nights—only one commits. Then hold expiry and authorize/capture. Search indexing last or simplified.

Interview tips

  • Time zones: anchor nights in listing-local time—not UTC hand-waving—minimum stay and DST boundaries bite.
  • Idempotency: double-click and mobile retries need Idempotency-Key on reserve—same key returns the same reservation id.
  • Search vs truth: discovery can show stale “available”; checkout must re-read authoritative rows; 409 if the user lost the race.
  • Host approval: extra pending_host state—inventory may still need a soft hold; state the machine clearly.
  • Outbox: after confirm, emit events (email, calendar, analytics) reliably—not only inline.

Capacity estimation

ConcernAngle
Hot listingContention on one listing’s nights—queue or optimistic UX (“someone else booked—refresh”)
Read vs writeSearch is read-heavy; checkout is write-light but must be exact
IndexAvailability index for range queries, updated on confirm/cancel—lag OK for browsing, not for paying

Implications: serialize conflicts at the listing calendar shard; do not rely on scanning globally “locked” rows for search.

High-level architecture

The listing / calendar service owns authoritative night rows (or an interval tree) per listing, sharded by listing_id. The booking orchestrator implements a saga: create hold → payments authorize → capture → confirm reservation; on failure compensate (release hold, void auth). Search / discovery reads a replica or index—eventually consistent. Notification workers consume outbox events.

Who owns what:

  • Calendar service — invariant enforcement; transactions on night rows.
  • Payments adapter — PCI boundary; idempotent charges.
  • Booking API — orchestration, timeouts, human-readable errors.
  • Search indexer — denormalized availability for filters—not source of truth.
[ Guest ] --> [ Booking API ]
                  |
        +---------+---------+
        |                   |
   [ Calendar svc ]    [ Payments ]
   (ACID per listing)   (authorize/capture)
        |
        | confirm: mark nights RESERVED
        v
   [ Reservation DB ] ----outbox----> [ Email / calendar / search indexer ]

In the room: Narrate the synchronous path for confirm (few RTTs) versus async side effects—never block confirm on SMTP.

Core design approaches

Night inventory table

Each row (listing_id, night_date) is available | held | booked. A uniqueness constraint prevents double booking. Holds use held_until.

Interval / exclusion

Postgres exclusion constraints on tstzrange are powerful; many interviews stay at simpler night granularity.

Viral listing funnel

Optimistic: many short-TTL holds; first capture wins; others get 409. Queue: serialize checkout attempts—fairness tradeoff.

Detailed design

Write path (reserve)

  1. BEGIN; lock listing calendar rows for the date range (or SELECT … FOR UPDATE on nights).
  2. Verify all nights available; insert hold rows with held_until = now + 15m.
  3. COMMIT; return reservation_id and client secret for payment.
  4. Authorize payment asynchronously; on success BEGIN capture transaction: mark booked, clear hold, emit event.

Read path (availability)

GET availability reads nights or cache for speed; checkout re-runs the same check inside the booking transaction.

Key challenges

  • Phantom reads: availability checked outside the txn then booked inside—lost update. Fix: the transaction boundary includes the read.
  • Payment vs hold mismatch: auth expires before capture—reconciliation voids or re-auth—the state machine must cover the edge.
  • Modification: changing dates means release old + acquire new—two-phase risk—saga or single txn if same listing.
  • Search index lag: user sees available then fails checkout—UX copy must explain the race.

Scaling the system

  • Shard calendar by listing_id—hot listing is one shard hotspot; acceptable if transactions stay short; mitigate with queue or optimistic UX.
  • Read replicas for browsing; primary for booking transactions.
  • Cache availability per listing with short TTL; invalidate on write.

Failure handling

FailureMitigation
Payment downHold expires; user retries; no charge
DB failover during txnIdempotent retry; reconcile orphan holds
Worker crash after auth before captureReconciliation job completes or voids
Double submitIdempotency key maps to same reservation

Degraded UX: occasional 409 on a hot listing; outage is wrong double-booking—that is unacceptable.

API design

GET  /v1/listings/{id}/availability?from=YYYY-MM-DD&to=YYYY-MM-DD
POST /v1/reservations
POST /v1/reservations/{id}/confirm
POST /v1/reservations/{id}/cancel

POST /v1/reservations

FieldRole
listing_idTarget
check_in, check_outHalf-open range [check_in, check_out)—state clearly
guest_countPolicy
idempotency_keyDedup retries

GET /v1/listings/{id}/availability

ParamRole
from, toRange in listing-local dates
expandOptional pricing components

Diagram:

GET availability (may be slightly stale index)
       |
       v
POST /reservations --> [ txn: lock nights + hold ] --> 201 + payment_client_secret
       |
       v
POST /payments/authorize (PCI boundary)
       |
       v
POST /reservations/{id}/confirm --> [ txn: capture + booked ]

Errors: 409 conflict; 410 hold expired; 402 payment failed.

Production angles

Marketplace reservations sit on the knife-edge between search (eventually consistent indices) and checkout (serializable truth about nights). The expensive incidents are ghost holds, double books, and money in limbo—not 409 rates during a viral listing, which can be healthy competition if the product story is clear.

Spike in 409 on a viral property — race vs bug

What it looks like — Support and social channels light up on a trending stay; metrics show 409 rate up on POST /reservations. Product asks if the site is broken; engineering asks if inventory math is wrong.

Why it happens — Organic contention—many users racing for finite nights—produces 409 by design when only one transaction wins. Bad 409s come from split brains between cache, search index, and authoritative rows—different shape on dashboards (retry storms vs flat failure rate with weird state transitions).

What good teams do — Optimistic UI that refreshes availability after conflict; optional fair queue or waitlist for drops—product policy; metrics for conflict_after_index_said_available vs simple race. Seniors separate expected business contention from correctness bugs.

Ghost holds block the calendar — nobody paid, nobody can book

What it looks like — Calendar blocked for hours with no completed booking; hosts complain; finance sees authorized cards with no matching reservation row, or orphan holds past held_until.

Why it happens — Worker crashed after auth but before commit or release; saga lost the compensating action; client closed mid-flow. Payment idempotency does not automatically release inventory rows.

What good teams do — Sweeper jobs for held_until < now() with metrics on hold age distribution; alert when p95 hold duration exceeds the product promise; reconciliation between payment intent, reservation state, and availability locks—hourly if not continuous.

Payment authorized but capture delayed — two SLOs, one user journey

What it looks like — Bank shows pending charge; app shows “processing”; user books elsewhere thinking this one failed. Support refunds pile up.

Why it happens — Auth succeeds at issuer; capture delayed by partition, retries, or fraud holds; inventory lock TTL does not match issuer behavior.

What good teams do — Reconciliation ledger linking payment_intent, reservation_id, and capture events; void stale auths with documented SLA to finance; separate SLOs for inventory correctness vs payment settlement, and a customer-visible timeout after which holds release deterministically.

How to use this in an interview — Lead with search index staleness vs checkout truth: always re-check live inventory inside the booking transaction—search results are hints, not locks. Name one sweeper or reconciliation story for ghost holds.

Bottlenecks and tradeoffs

  • Search freshness vs booking truth: index can lag; checkout must hit authoritative inventory—honest UX and metrics on stale conflict rate.
  • Hold duration vs conversion: long holds reduce double-book risk but anger hosts; short holds increase 409s—tune with experiments.
  • Global vs local inventory: sharding by listing is natural; cross-listing packages and multi-night atomicity complicate transactions—scope honestly.

What interviewers expect

  • Entities: listing, night inventory or blocked intervals, reservation state machine, pricing line items.
  • Concurrency: serializable transaction, unique constraint on (listing_id, night) or per-listing mutex; optimistic concurrency with version.
  • Hold: soft lock TTL before payment; release on expiry job.
  • Payment: authorize then capture; saga with void on failure; idempotency on POST /reservations.
  • Search vs checkout: search index may be stale; checkout re-validates authoritative calendar.
  • Cancellation: policy-driven refunds; inventory re-release; outbox for events.
  • Failure: partial authorize; worker crash mid-saga—reconciliation jobs.

Interview workflow (template)

  1. Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
  2. Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
  3. APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
  4. Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
  5. Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
  6. Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).

Frequently asked follow-ups

  • How do you prevent two users booking the same dates?
  • What is the schema for availability?
  • How does payment interact with the hold?
  • How do you query “available between these dates” at scale?
  • What happens when payment authorization expires?

Deep-dive questions and strong answer outlines

Two checkout requests hit the same listing for overlapping nights—what happens?

Serializable transaction, row lock on listing calendar shard, or optimistic concurrency with version. First commit wins; second gets conflict and must refresh availability. Idempotency keys for retries of the same client attempt.

How do you represent availability?

Reserved intervals per listing, or night-level inventory rows. Avoid naive “boolean per day” without handling partial overlaps if product needs it. Strong answers mention timezone anchor (listing local midnight).

Walk through hold → pay → confirm.

Create hold row with TTL; authorize card; on success capture and confirm reservation; release hold on timeout path. Compensating void if capture fails after hold—saga orchestration.

Production angles

  • Payment authorized but worker crashes before capture—reconciliation job completes or voids.
  • Viral listing: queue checkout attempts or optimistic UX with someone-else-booked messaging—better than wedging the DB.

AI feedback on your design

After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.

FAQs

Q: Do I need distributed transactions (2PC)?

A: Often no across payment and DB if you orchestrate saga: each step compensatable. Within inventory shard, one ACID transaction may suffice. Know when you’d reach for outbox pattern for reliable events.

Q: Is Elasticsearch OK for availability search?

A: Maybe for discovery, but source of truth for “can book” stays in transactional store. Search index lags; checkout must re-validate against authoritative calendar.

Q: How do cancellations free inventory?

A: State transition to cancelled with policy-based refund; release nights for resale; event to search indexer—ordering matters to avoid double-sell during transition.

Q: Do hosts and guests see the same availability instantly?

A: Eventually for search indexes; authoritative calendar on checkout must be strong. Say read-your-writes for the booking session vs stale search—honesty beats pretending one global view.

Practice interactively

Open the practice session to use the canvas and stages, then review AI feedback.