System design interview guide
Design a Basic Auth System
TL;DR: You’re building **username/password auth** with **sessions** for web and mobile: sign-up, login, logout, optional **remember device**, and **password reset**—while attackers run **credential stuffing**, **brute force**, and **session theft** against you every day. The interview rewards **threat modeling** and **boring** crypto hygiene, not inventing a new token format.
Problem statement
You’re designing authentication for a web and mobile product: registration, login, logout, session establishment and validation, optional remember device, and password reset—with industry-standard password handling and abuse resistance.
Constraints. Functional: credential verification; session lifecycle; reset flow at high level. Non-functionally: low latency on auth check path; resilience if session store degrades (with clear tradeoffs). Scale: millions of users; high read volume on session validation.
Center: hashing, session design, rate limits, reset safety—not custom crypto.
Introduction
Auth interviews are threat-led. Strong candidates name credential stuffing, session hijack, and enumeration before drawing five microservices. Weak candidates jump to JWT without saying why.
The core assets are password hashes, session identifiers, and reset tokens—each has different rotation and revocation rules.
How to approach
List what you store (hash parameters), session shape (opaque vs JWT), abuse controls on login and reset, and logout semantics (server-side revoke). Order beats breadth.
Interview tips
- HttpOnly Secure SameSite cookies for browser sessions—reduces XSS stealing token from JS (not eliminates).
- Rate limit login by IP + account with exponential backoff; CAPTCHA after N failures.
- Generic errors on login (“Invalid credentials”)—log detail server-side for support.
- Password reset links are bearer secrets—one-time, short TTL, hash token at rest.
- JWT: short access + refresh rotation if you need revocation story—or opaque sessions in Redis.
Capacity estimation
| Topic | Note |
|---|---|
| Login QPS | Spiky during abuse—rate limits protect you |
| Session validation | Every API call—cache hit ratio dominates |
| Password hash | Argon2 budget ~tens of ms per login—not per request |
Implications: session lookup must be O(1) and partitioned; do not re-hash password on every API request.
High-level architecture
The auth service owns the credentials table (hash column), session store (Redis), and reset tokens. The API gateway or app services validate sessions via a shared library or central introspection. Secrets live in KMS; no plaintext passwords in logs.
[ Client ] --> Login --> Auth svc --> verify hash --> issue session id
|
v
Session store (Redis): session_id -> user_id, exp, version
Subsequent requests:
Client --> GW --> session lookup / JWT verify --> upstream
In the room: Separate login (rare, expensive hash) from session check (frequent, cheap lookup).
Core design approaches
Opaque sessions
Random session_id in cookie; server stores metadata; revoke by delete.
JWT
Signed claims; stateless verify; revocation is hard—pair with short TTL + refresh or blocklist for logout and compromise.
Password reset
Random token → email link → POST to set new password → optionally invalidate all sessions.
Detailed design
Sign-up
- Validate password policy; optionally reject breached passwords (Have I Been Pwned API).
- Hash with Argon2id; store salt + hash + params version.
- Create session or require email verification first—product call.
Login
- Rate limit by
(IP, username). - Lookup user by normalized email; constant-time compare hash (or always hash dummy) to reduce timing leaks.
- Rotate session; audit success/failure.
Session validation
Cookie or Authorization: Bearer; lookup session → user_id + version; check global logout version if used.
Logout
Delete session row (opaque) or increment version (JWT family).
Key challenges
- Enumeration: same response time and shape for unknown email is hard; balance with UX.
- Session theft: HTTPS only; rotate on privilege change; optional device binding.
- Scale validation: Redis cluster with local cache—be careful with staleness for revocation.
- Reset phishing: short TTL; notify user with “password changed” email.
Scaling the system
- Shard sessions by
session_idhash; Redis cluster. - Read replicas for session only if acceptable staleness—usually primary for strict revocation.
- Stateless JWT verification at edge reduces Redis load—trades revocation complexity.
Failure handling
| Scenario | Mitigation |
|---|---|
| Redis down | Fail closed for sensitive apps; or JWT verify-only degraded mode with short TTL |
| DB leak | Slow KDF; force reset; rotate sessions |
API design
| Endpoint | Role |
|---|---|
POST /v1/auth/register | Create user + password hash |
POST /v1/auth/login | Returns session cookie or tokens |
POST /v1/auth/logout | Invalidate session |
POST /v1/auth/password:reset | Request reset email |
POST /v1/auth/password:confirm | Token + new password |
Session on API requests
| Mechanism | Role |
|---|---|
Cookie: sid=... | Browser; HttpOnly |
Authorization: Bearer | Mobile / APIs |
Diagram:
POST /login --> verify --> Set-Cookie: sid=opaque
GET /resource --> Cookie: sid --> Redis GET sid --> user_id --> 200
POST /logout --> DEL sid --> 204
Errors: 401 unauthorized; 429 too many login attempts; avoid revealing whether an email exists—use generic 401 on login failure.
Production angles
Identity systems look boring until they are the bottleneck or the crime scene. Credential stuffing does not trip your application error rate—it saturates login endpoints, locks accounts, and burns SMS budgets. Session stores fail as hot keys. Breaches are process and audit problems as much as crypto. These are the lessons from years of abuse teams and on-call rotations that owned auth.
Credential stuffing: traffic looks like success
What it looks like — Login 200 rate is flat or up; fraud dashboards show impossible geography velocity; customer support sees account takeovers. Redis or DB behind credential verify runs hot; downstream MFA providers throttle. Traditional availability alerts stay quiet—every request is “valid HTTP.”
Why it happens — Attackers reuse leaked passwords across sites; per-IP limits are defeated by residential proxies; per-account lockout becomes a DoS against real users if tuned naïvely. p99 latency on password verify rises because bcrypt/argon is intentionally slow and attackers parallelize.
What good teams do — Layered friction: WAF, device reputation, CAPTCHA step-up after risk score, rate limits at edge and per-identity with slow backoff; notify on new device and push revert paths. Separate SLOs for happy-path login vs attack traffic—you may shed abusive IPs while keeping known good ASNs healthy. Seniors talk credential stuffing as economics (cost per attempt); juniors learn not to rely on HTTP 429 alone.
Session store hot keys and retry storms
What it looks like — One session_id or user_id shard sees absurd GET/DEL QPS. A buggy mobile client loops on 401, hammering refresh or session validate. Redis single-threaded hot key drives latency for unrelated tenants on the same proxy.
Why it happens — Sessions are centralized state; thundering herd on logout everywhere or password change invalidates millions of keys in a pattern that collides on infrastructure. Retries without jitter turn a blip into a storm.
What good teams do — Per-IP and per-device limits at gateway; jittered backoff in client SDKs; session sharding so one user cannot melt a node; bulk invalidation via version stamps instead of N deletes when possible. Measure commands/sec per key, connection churn, and retry ratio after 401.
Breach, account recovery, and “we need every session dead in five minutes”
What it looks like — Password table exfil, OAuth client secret leak, or reset token predictability. Incident commander asks for global session revoke, forced rotation, and audit proof for regulators.
Why it happens — Sessions and reset tokens are bearer secrets—whoever holds them is the user. Long-lived refresh tokens amplify blast radius. Partial invalidation leaves zombie sessions on edge caches or mobile biometric vaults.
What good teams do — Invalidate all refresh tokens for affected users; rotate signing keys for JWT if asymmetric; structured audit events (who, when, IP, device); communicate forced logout tradeoffs (UX pain vs security). Password reset tokens: one-time, short TTL, hashed at rest—same secrecy bar as passwords in transit.
How to use this in an interview — Tie stuffing to edge controls and risk-based step-up, not only “bcrypt.” Say session invalidation is a data plane problem at scale. One sentence: password reset tokens are bearer secrets—treat them like passwords in transit and in storage.
Bottlenecks and tradeoffs
- Opaque sessions vs JWT: Server-side sessions make revocation trivial but add Redis hot paths and fan-out on global logout; JWTs cut lookup cost but push revocation complexity to short TTL, refresh rotation, or blocklists.
- Security vs UX: Generic login errors reduce enumeration but frustrate users who typo email; risk-based step-up balances friction with fraud loss.
- Cost factor vs login latency: Stronger Argon2 parameters raise attacker cost and legitimate login CPU—tune per threat tier, not one global number.
In the room: One line: password reset tokens are bearer secrets—same care as passwords in transit.
What interviewers expect
- Sign-up/login: Argon2id/bcrypt/scrypt with per-user salt; rate limits; generic client errors.
- Sessions: opaque server-side sessions vs signed JWT—revocation and TTL tradeoffs.
- Transport: HTTPS only; Secure, HttpOnly, SameSite cookies for browsers.
- Reset: signed one-time token, short TTL, optional session invalidation after reset.
- Abuse: IP + account rate limits, CAPTCHA escalation, breached-password checks (optional).
- Boundary: auth service vs app DB; no secrets in logs.
Interview workflow (template)
- Clarify requirements. Confirm functional scope, users, consistency needs, and which non-functional goals matter most (latency, availability, cost).
- Rough capacity. Estimate QPS, storage, and bandwidth so your data model and partitioning story are grounded.
- APIs and core flows. Define a minimal API and walk 1–2 critical read/write paths end to end.
- Data model and storage. Choose stores for each access pattern; call out hot keys, indexes, and retention.
- Scale and failure. Add caching, sharding, replication, queues, or fan-out as needed; say what breaks in failure modes.
- Tradeoffs. Name alternatives you rejected and why (e.g. strong vs eventual consistency, sync vs async).
Frequently asked follow-ups
- Where are passwords stored and in what form?
- Session cookie vs JWT—when would you pick each?
- How do you handle password reset safely?
- How do you stop brute force attacks?
- What happens on logout?
Deep-dive questions and strong answer outlines
What exactly is stored in the database for a password?
Salt + slow hash output (Argon2id/bcrypt/scrypt parameters tuned for your hardware budget). Never plaintext; upgrade path when parameters change—rehash on successful login.
How does session validation work on each API request?
Opaque session id in cookie or Authorization header → lookup in Redis/DB with TTL and user id. Or JWT verified with signature + exp; consider revocation difficulty—often short-lived access + refresh pattern.
Describe a safe password reset email flow.
Random unguessable token (hashed at rest), one-time use, short expiry, HTTPS link, invalidate after use. Same response time for unknown emails to reduce enumeration—tradeoff with UX.
Production angles
- DB leak of password hashes—**slow KDF** buys time; **force** reset if parameters weak historically.
- Session store outage—**fail closed** for security-sensitive apps vs **degraded** read-only—state product choice.
AI feedback on your design
After a practice session, InterviewCrafted summarizes strengths, gaps, and interviewer-style expectations—similar to a written debrief. See a static example report, then practice this problem to get feedback on your own answer.
FAQs
Q: Is JWT always better?
A: No. JWTs simplify stateless verification but revocation is hard—often pair with short TTL + refresh or blocklist for compromise response. Opaque sessions make server-side kill trivial.
Q: Do I need MFA in scope?
A: Mention as defense in depth if time; core loop often stops at password + session. If prompted, TOTP/WebAuthn at high level.
Q: How is this different from OAuth “Login with Google”?
A: This prompt owns credential storage; OAuth delegates identity—different threat model and account linking concerns.
Q: Should sessions be invalidated on password change?
A: Usually yes for all devices except maybe current session—reduces stolen password window. Say session version or global sign-out flag checked on each request for high-security apps.
Practice interactively
Open the practice session to use the canvas and stages, then review AI feedback.