← Back to practice catalog

System design interview guide

Auth System Design

TL;DR:

Credential stuffing hits 50K logins/min with reused passwords while legitimate users on mobile need refresh tokens that survive app restarts without keeping JWTs forever. Sessions, password hashing, and MFA hooks are the interview—not OAuth provider logos.

Problem statement

You're designing an authentication system for a product with email/password sign-up: register, login, logout, password reset, optional MFA, and the ability to revoke sessions when a device is lost or a password changes. Every other service trusts this layer to turn a cookie or bearer token into a user_id on each request—without re-checking the password every time.

Constraints in plain language. Functionally: create accounts, verify credentials, issue and validate sessions (opaque ids in Redis or short JWT plus refresh for mobile), secure reset links, optional TOTP after password, and sign-out everywhere. Non-functionally: under 200 ms login p95, resist brute force and credential stuffing, never store plaintext passwords, and keep auth highly available (99.99%). Scale: 10M users, ~1K logins/s peak, global sessions—so session validation on every API call dominates login volume.

What interviewers reward: how you hash and verify passwords (Argon2id/bcrypt), JWT vs server sessions with a real revoke story, rate limits and generic errors against enumeration, one-time reset tokens hashed at rest, and what happens when the session store fails—not OAuth logos or a box labeled "auth service."

Introduction

Monday morning. A user opens your app on a new phone. They expect to stay signed in after a restart—but not forever if their old laptop was stolen.

At the same time, a botnet fires fifty thousand login attempts per minute with passwords from last week's breach. Every response is HTTP 200. Your availability chart stays green while Argon2 queues grow and real users time out.

Auth interviews are threat-led. Interviewers listen for whether you know what you are protecting before you name microservices.

Strong candidates name credential stuffing, session hijack, and account enumeration early. Weak candidates say "we'll use JWT" without saying what the server stores or how you kill a stolen session in five minutes.

You are guarding three kinds of secrets:

  • Password hashes — prove identity once, slowly.
  • Session ids — prove identity on every API call, quickly.
  • Reset tokens — prove identity for one password change, then disappear.

Each has different rotation, TTL, and revocation rules. Mix them up and you get the classic failures: plaintext passwords, tokens in localStorage, reset links that never expire.

If you remember one thing: Three bearer secrets—password, session id, reset token—each needs its own lifecycle rules.

How to approach

Talk like you are walking through one login and one authenticated API call, not reciting OWASP chapter titles.

  1. Ask scope — Web cookies or mobile bearer tokens? MFA in scope? Password reset required? OAuth only or email/password too?
  2. What you store — Salt plus slow hash; never plaintext. Sessions in Redis (opaque) or signed JWT claims—pick one and defend revoke.
  3. Write path (login) — Rate limits → lookup user → verify hash → optional MFA → issue session → audit (no secrets in logs).
  4. Read path (every request) — Session lookup or JWT verify → attach user_id → check session version if user changed password.
  5. Reset and logout — One-time reset tokens; delete session row or bump global version for "sign out everywhere."

In the room: "I'll list what we store, walk login and session check as two separate paths, then reset and abuse controls."

If you remember one thing: Login pays the hash cost once; session check must stay fast on every request.

Interview tips

Six exchanges that separate security theater from a design you could ship. Each block: a trap answer, one pushback, and where to land.

Jumping straight to JWT

You: "We'll use JWT for everything—it's stateless and scales."

They ask: "User reports stolen laptop—how do you kill their session in five minutes?"

Land here: Opaque sessions in Redis: delete the row, done. JWT needs short access TTL, rotating refresh tokens, or a blocklist on compromise. Pick JWT when edge verify without Redis wins; pick opaque when instant revoke matters.

Password storage

You: "We encrypt passwords in the database."

They ask: "Encryption with what key—and can you still verify login?"

Land here: Store salt plus slow hash (Argon2id/bcrypt)—one-way, tuned for ~tens of ms per login. Never plaintext. Never "encrypt" passwords you must verify on every login.

Login error messages

You: "We tell them 'email not found' vs 'wrong password' so UX is clear."

They ask: "What can an attacker learn by probing emails?"

Land here: Return generic "Invalid credentials" on login. Log detail server-side for support. Password reset uses the same shape—do not confirm whether an email is registered.

You: "We email a link with the user id and a timestamp."

They ask: "Someone forwards that email—who owns the account?"

Land here: Random unguessable token, hashed at rest, one-time use, short TTL, HTTPS only. After password change, bump session version or delete all sessions so the forwarded link cannot keep old devices alive.

Session cookies in the browser

You: "We store the token in localStorage so JS can read it."

They ask: "What happens when XSS runs on your page?"

Land here: HttpOnly Secure SameSite cookies for browser sessions—JavaScript cannot read the token. XSS still hurts, but token theft from storage is harder. Mobile apps often use Authorization: Bearer with secure keychain storage instead.

MFA as an afterthought

You: "We'll add MFA later—login returns a token immediately."

They ask: "Attacker has the password from a breach—what stops account takeover?"

Land here: After password verifies, require TOTP or WebAuthn when enabled or risk score is high. Issue the session only after the second factor. Store TOTP secrets encrypted; recovery codes hashed like passwords.

If you remember one thing: After each push, name one artifact—hash params, session_id row, reset token hash, rate limit key—not "we'll add security later."

Capacity estimation

Rough numbers from the problem statement: 10M users, ~1K logins/s peak, under 200 ms login p95. Those numbers sound modest until you remember session checks run on every API call—orders of magnitude more than logins.

TopicRough scaleWhat it means for design
Login QPS~1K/s peak; spikes during attacksRate limits protect hash CPU, not only UX
Session validationEvery API call (millions/s at scale)Must be O(1) lookup; shard Redis by session_id
Password hash~tens of ms per loginArgon2/bcrypt budget—never per request
Session storageOne row per active deviceTTL + eviction; version stamp beats N deletes for global logout

So we cannot: re-hash the password on every API call. We cannot keep all sessions on one Redis node without sharding. We cannot ignore login spikes—they return HTTP 200 and look healthy in availability metrics while hash queues melt.

If you remember one thing: Session store read QPS dominates—design for millions of lookups, not thousands of logins.

High-level architecture

What breaks if auth is an afterthought

Early teams store passwords in plaintext "for debugging." Session ids show up in URL query params. Reset links last forever and sit in email inboxes like spare keys.

One Redis instance holds every session. Every page load does a lookup. A celebrity user's session key becomes a hot shard. Logout everywhere means deleting millions of rows while users still get 401 loops from a buggy retry client.

What works: dedicated auth path

Split the problem into two speeds.

Slow path (login, register, reset): The auth service owns the credentials table (hash column only), issues sessions, and sends reset emails. It pays the Argon2 cost here—and nowhere else.

Fast path (every API call): The gateway or app middleware validates the session via Redis lookup or JWT signature verify. Attach user_id and pass upstream. No password hash on this path.

Reset tokens live in their own table—hashed like passwords, short TTL. Signing keys and pepper live in KMS. Audit logs record success/failure—never the password or raw reset token.

In plain terms: prove the password once on login; prove the session id on every request after that.

[ Client ] --> Login --> Auth svc --> verify hash --> issue session id
                              |
                              v
                    Session store (Redis): session_id -> user_id, exp, version

Subsequent requests:
  Client --> GW --> session lookup / JWT verify --> upstream

In the room: Draw login (rare, expensive hash) and session check (frequent, cheap lookup) as two separate arrows before you add MFA or OAuth boxes.

If you remember one thing: Auth is two paths—prove password once, then prove session every request.

Core design approaches

Each approach answers a different tension. Name which one you pick for web revoke vs mobile offline vs reset safety.

Opaque server-side sessions

Random session_id in an HttpOnly cookie or Authorization header. The server stores user_id, expiry, device hint, and optional session version.

Revoke by deleting the row—logout, compromise response, admin kill.

Scale with Redis cluster sharded by session_id hash.

Wins when: Instant revoke matters (banking, admin panels, "sign out everywhere").

Hurts when: Every request needs a Redis round trip—mitigate with local cache only if brief staleness on revoke is acceptable.

If you remember one thing: Opaque sessions make logout and compromise response trivial.

JWT access tokens

Signed claims (sub, exp, optional session_version). Gateway verifies signature without Redis—fast at the edge.

Revocation is the hard part: short access TTL (minutes), rotating refresh tokens in HttpOnly cookie, or blocklist on reported theft.

Wins when: Stateless verify at CDN/gateway scale; mobile APIs with bearer headers.

Hurts when: User needs immediate kill—long-lived JWTs without refresh rotation are a liability.

If you remember one thing: JWT trades lookup cost for revocation complexity—say how you handle stolen tokens before you draw the box.

Password reset

Random token → HTTPS email link → POST with token + new password → store hash of token at rest, one-time use, ~1h TTL → bump session version or delete all sessions.

Wins when: Users forget passwords without calling support.

Hurts when: Tokens are guessable, logged in URLs, or reused—each mistake is account takeover.

If you remember one thing: Reset links are bearer secrets—hash them at rest like passwords.

Hybrid (common in production)

Short-lived access JWT plus opaque refresh in HttpOnly cookie. Gateway verifies JWT locally; refresh path hits Redis for rotation and revoke.

If you remember one thing: Hybrid is not indecision—it is matching fast verify to strong revoke on different artifacts.

Detailed design

Follow one user from sign-up through login to one authenticated read. Split write path (mutations) from read path (session check on every request).

Write path

Sign-up

  1. Validate password policy (length, breach list via Have I Been Pwned optional).
  2. Hash with Argon2id; store per-user salt, hash, and algorithm version for future upgrades.
  3. Create session—or require email verification first (product call). Send verification link with same token hygiene as reset.

Login

  1. Rate limit by (IP, normalized email) with exponential backoff.
  2. Lookup user; constant-time hash compare. On unknown email, run hash against a dummy salt so timing does not leak existence.
  3. If MFA enabled or risk score high → return mfa_required with challenge id; verify TOTP before issuing session.
  4. Issue new session_id; rotate on login (mitigates session fixation). Set HttpOnly Secure SameSite cookie or return bearer + refresh pair for mobile.
  5. Audit success/failure—IP, device, outcome—no secrets in audit rows.

Password reset request

  1. Same rate limits as login. Same generic response whether email exists.
  2. Store hash of random token + expiry. Email link with token only—no user id in URL.

Password reset confirm

  1. Verify token hash, burn token on success.
  2. Re-hash password with new salt. Bump session version on user row so all old sessions fail version check.

Logout

Delete session row (opaque) or invalidate refresh token family (JWT). Clear cookie client-side; server-side delete is source of truth.

Read path

Every authenticated API call:

  1. Extract session_id from cookie or Authorization: Bearer.
  2. Opaque: Redis GET session_iduser_id, exp, version. Compare version to user's current version (password change / global logout).
  3. JWT: Verify signature and exp at gateway. Optional introspection or version claim check on sensitive routes.
  4. Attach user_id to request context; upstream services trust gateway—not re-validating unless zero-trust demands it.

In the room: Draw login once on the left, then GET /resource on the right with Redis lookup in the middle. Mention idempotency only where it matters (reset confirm, refresh rotation).

If you remember one thing: Constant-time login and generic errors reduce enumeration and timing leaks—session check stays the hot path.

Key challenges

Account enumeration

Attackers probe login and reset forms to learn which emails are registered.

Same response shape for unknown email vs wrong password. Reset flow has the same tension—generic "if an account exists, we sent email" messaging.

Consequence: Confirmed emails become targets for stuffing on your site and others.

Session theft and fixation

Stolen cookies or fixation before login grant full account access.

HTTPS only. Rotate session id on login. Optional device binding and "new login from unknown device" email.

Consequence: One XSS or shared Wi‑Fi leak equals account takeover until revoke works.

Scale on session validation

Every API call hits the session store—or you trade revoke freshness for edge JWT verify.

Redis cluster sharded by session_id. Local cache only if brief staleness on revoke is acceptable; banks usually read primary for kill.

Consequence: Hot keys and retry storms look like random 401 loops, not "auth is down."

Reset phishing and token reuse

Users click fake reset pages. Forwarded real reset emails let a third party set a new password.

Short TTL. "Password changed" notification email. No user id in URL. One-time token burn.

Consequence: Reset is account takeover without ever guessing the password.

If you remember one thing: Attackers automate stuffing and reset abuse—rate limits are part of the design, not an add-on.

Scaling the system

10M users does not mean 10M Redis keys at once—only active sessions. Still, validation QPS dwarfs login QPS.

Shard sessions by session_id hash across a Redis cluster. Avoid one hot user melting a shard if you ever key by user_id instead of session.

Read replicas for session lookup only if brief staleness on revoke is acceptable. Security-sensitive apps often read primary after password change.

JWT at edge drops Redis QPS on verify—you pay with revocation complexity and key rotation discipline.

Separate login endpoints behind stricter WAF and rate-limit rules from general API traffic. Stuffing targets /login, not /catalog.

Global sign out everywhere: Bumping one session version on the user row invalidates all sessions on next check—faster than deleting millions of Redis keys.

If you remember one thing: Scale auth by optimizing the read path and containing login abuse—not by re-architecting password hash on every request.

Failure handling

Auth failures split into outage (nobody can prove identity) vs degraded (existing sessions work, new login broken) vs silent security fail (sessions work but should not).

What happensWhat user seesWhat to build
Redis session store downLogged-out or 401 on every callFail closed for banks; short-lived JWT fallback only if product accepts revoke risk
Auth DB downCannot login or registerQueue or 503 on write path; existing sessions may still work if Redis is up
DB leak (hashes only)Forced reset campaignSlow KDF bought time; force reset; upgrade Argon2 params on next login
Reset token reuseAttacker owns accountOne-time token; hash at rest; burn on use; alert user
Signing key leak (JWT)Forged tokens possibleRotate keys; shorten TTL; invalidate refresh families

Real outage = login and session check both fail. Degraded = existing JWTs verify at edge but refresh/login returns 503—state which your product chooses.

Monitor login success rate, session validate latency p99, hash queue depth, and 429 rate on auth endpoints—not only generic API 5xx.

If you remember one thing: Fail closed vs degraded read-only is a product decision—name it aloud for security-sensitive apps.

API design

Picture the hottest path: user logs in, then loads their dashboard. The API should make session check boring and login well-guarded.

EndpointRole
POST /v1/auth/registerCreate user + password hash; optional verification email
POST /v1/auth/loginVerify credentials; optional MFA step; return session cookie or token pair
POST /v1/auth/mfa:verifyComplete login after TOTP when mfa_required
POST /v1/auth/logoutInvalidate session / refresh family
POST /v1/auth/password:resetRequest reset email (generic response)
POST /v1/auth/password:confirmToken + new password; bump session version
POST /v1/auth/token:refreshRotate refresh token; issue new access JWT (hybrid model)

Session on authenticated requests:

MechanismRole
Cookie: sid=...Browser; HttpOnly, Secure, SameSite=Lax (or Strict where UX allows)
Authorization: Bearer <access_jwt>Mobile / API clients
Idempotency-KeyOn reset confirm and refresh rotation to survive retries

Login request (write path):

Param / headerRole
emailNormalized (lowercase, trim)
passwordNever logged; TLS only
device_idOptional; risk scoring, session binding

Authenticated read (read path):

Param / headerRole
Cookie: sid or AuthorizationSession proof
(none on body)Gateway attaches user_id; apps do not re-parse passwords

Session hot path in one breath—login writes the cookie; every read validates it.

POST /login --> verify hash --> Set-Cookie: sid=opaque
GET /resource --> Cookie: sid --> Redis GET sid --> user_id --> 200
POST /logout --> DEL sid --> 204

Errors: 401 unauthorized (generic on login failure). 429 too many attempts with Retry-After. 403 when MFA required but not completed. Never return "email not found."

In the room: Walk POST /loginSet-CookieGET /resource with Redis lookup without opening a second diagram.

If you remember one thing: Every authenticated endpoint assumes session check is cheap—design APIs so apps never re-verify passwords per request.

Production angles

Identity systems look boring until they are the bottleneck or the crime scene. Green dashboards lie easily here—credential stuffing returns HTTP 200 and burns CPU instead of tripping error rates.

Credential stuffing: traffic looks like success

What users saw

Support tickets for account takeovers. Fraud dashboards showed logins from impossible geographies.

Legitimate users got locked out when naive per-account lockout kicked in after the attacker triggered it.

Why

Attackers reuse leaked passwords from other sites. Residential proxies defeat simple IP limits.

Every attempt returns valid HTTP. Availability alerts stay green while bcrypt/Argon2 queues grow and real logins slow down.

What good teams do

Layered friction: WAF, device reputation, CAPTCHA step-up after risk score.

Rate limits at edge and per identity with slow backoff—not only per IP.

Notify on new device; "was this you?" revert paths.

Separate SLOs and dashboards for happy-path login vs attack traffic.

How to use this in an interview: Tie stuffing to edge rate limits and risk step-up, not only slow hash. Name one metric: login p99 or hash queue depth during attacks.

Session store hot keys and retry storms

What users saw

Random 401 loops on mobile. Redis latency spiked for tenants sharing the cluster.

One buggy client hammered session validate after every 401 without backoff.

Why

Sessions are centralized state. Logout everywhere or password change can invalidate many keys at once.

Retries without jitter turn a blip into a storm. One hot session_id or shard can melt a Redis node.

What good teams do

Per-IP and per-device limits at gateway. Jittered backoff in client SDKs.

Session version stamp instead of N deletes for global sign-out.

Measure commands/sec per key and retry ratio after 401.

How to use this in an interview: Say session kill at scale is a data plane problem—version stamp vs millions of DELs—and that 401 retry storms are a client + gateway concern.

Breach, recovery, and "kill every session in five minutes"

What users saw

Password table exfil or predictable reset tokens. Incident commander demanded global revoke and audit proof.

Some devices stayed logged in because edge caches or long refresh tokens survived partial invalidation.

Why

Sessions and reset tokens are bearer secrets—whoever holds them is the user.

Long-lived refresh tokens amplify blast radius. Partial invalidation leaves zombie sessions.

What good teams do

Invalidate all refresh tokens for affected users. Rotate JWT signing keys if asymmetric.

Structured audit events (who, when, IP, device)—no secrets in logs.

Forced logout UX pain vs security—communicate tradeoff clearly to product.

How to use this in an interview: One failure (hash leak), one mitigation (force reset + version bump), one metric (time-to-revoke-all-sessions).

Password reset abuse and enumeration

What users saw

Spike in reset emails. Users reported phishing pages that looked identical to the real flow.

Attackers learned which emails were registered by comparing response times or message wording.

Why

Reset endpoints are unauthenticated and cheap to call. Tokens in URLs end up in proxy logs and browser history.

What good teams do

Same generic response for known and unknown emails. Rate limit reset requests per IP and per email hash.

Short TTL, one-time burn, HTTPS-only links with random tokens only—no user id in path.

Monitor reset request rate vs confirm rate; alert on geographic anomalies.

How to use this in an interview: Reset tokens are bearer secrets—hash at rest, one-time use—and enumeration defense matches login (same error shape).

Bottlenecks and tradeoffs

Opaque sessions vs JWT

The tension — Server-side sessions make revocation trivial but add Redis hot paths and fan-out on global logout.

What breaks — Redis outage locks users out (fail closed) or stale JWTs stay valid (fail open).

What teams do — Opaque for web with strict revoke; short JWT + refresh for mobile; hybrid is common.

Say in the interview — Name your revoke story before you name JWT.

Security vs UX on login errors

The tension — Generic errors reduce enumeration but frustrate users who typo email.

What breaks — Attackers probe sign-up and reset endpoints instead.

What teams do — Same shape on login; risk-based step-up instead of revealing which field failed.

Say in the interview — "Invalid credentials" plus server-side logging.

Cost factor vs login latency

The tension — Stronger Argon2 parameters raise attacker cost and legitimate login CPU.

What breaks — Login p99 during stuffing spikes; user churn on slow mobile login.

What teams do — Tune per threat tier; edge rate limits so hash CPU serves real users first.

Say in the interview — Quote ~tens of ms target per hash, not "make it slow."

If you remember one thing: Auth tradeoffs are revoke vs lookup, enumeration vs UX, and hash cost vs stuffing—not cookie vs JWT religion.

What should stick

You do not need to memorize every box. After this guide, you should be able to:

  1. Two speeds — Slow hash on login only; fast session lookup (or JWT verify) on every request.
  2. Three bearer secrets — Password hash, session id, reset token—each with its own TTL, storage, and revoke rule.
  3. Opaque vs JWT — Revoke easy vs verify-at-edge; hybrid (short access + rotating refresh) is common for mobile.
  4. Stuffing is economics — Looks like HTTP 200; edge rate limits and step-up protect hash CPU, not only user lockout.
  5. Fail closed — Session store down means no silent anonymous access for security-sensitive apps—state the product choice.

Tell it in the room: "Passwords stored as salt plus Argon2id—never plaintext. Login is rate-limited with generic errors and optional MFA. Issue opaque session in HttpOnly cookie; every API call does Redis lookup and version check. Reset uses one-time hashed token; password change bumps session version to sign out all devices. Stuffing handled with edge limits and CAPTCHA step-up—not only slow hash."

Reference diagram

High-level diagram for Auth System Design

What interviewers expect

Password storage — Salt plus slow hash (Argon2id/bcrypt). Never plaintext. Never "encrypt" what you must verify. Login path — Rate limit by IP and account. Generic errors. Constant-time compare. Session model — Opaque session in Redis or short JWT plus rotating refresh. Say how you revoke on stolen laptop. Reset flow — Random token, hashed at rest, one-time, short TTL. Invalidate sessions on password change. Abuse — Credential stuffing looks like HTTP 200. Edge limits and step-up, not only slow hash.

Interview workflow (template)

  1. Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
  2. Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
  3. APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
  4. Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
  5. Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
  6. Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).

Frequently asked follow-ups

  • Store passwords?
  • JWT vs session?
  • Refresh tokens?
  • Reset flow?
  • OAuth?

Deep-dive questions and strong answer outlines

How do you store passwords?

Step 1: Per-user random salt plus Argon2id or bcrypt—one-way, tuned to ~tens of ms per login. Step 2: Never log passwords. Compare in constant time. On unknown email, still run a dummy hash so timing does not leak which accounts exist.

JWT vs server session?

Opaque session: Random session_id in HttpOnly cookie; Redis maps id → user_id. Revoke = delete row. JWT: Verify signature at the gateway without Redis—fast, but revocation needs short TTL, refresh rotation, or blocklist. Hybrid (short access JWT + refresh in HttpOnly cookie) is common for mobile.

How do you stop brute force?

Step 1: Rate limit at edge and per (IP, username) with exponential backoff. Step 2: CAPTCHA or risk step-up after a threshold. Separate login SLO from general API traffic so stuffing does not hide in green dashboards.

Password reset flow?

Step 1: User requests reset → store hash of random token with ~1h TTL → email HTTPS link only. Step 2: POST with token + new password → verify once, burn token, re-hash password, bump session version or delete all sessions. Same response shape whether email exists.

Where does MFA fit?

After password verifies, check TOTP (or WebAuthn) if enabled or risk score is high. Issue session only after second factor passes. Recovery codes stored hashed; support cannot read them.

OAuth with Google?

Google proves identity once. Your auth service still issues your session or tokens and owns logout/revoke. Do not confuse "signed in with Google" with skipping session design.

AI feedback on your design

After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.

FAQs

Q: OAuth Google?

A: Delegate identity; still issue your session.

Q: JWT in localStorage?

A: XSS risk—prefer httpOnly cookie.

Q: Biometrics?

A: Device keychain—client concern.

Q: GDPR delete?

A: Cascade user_id references.

Practice interactively

Open the practice session to use the canvas and stages, then review AI feedback.