Database Topic

Isolation Levels

Master transaction isolation levels to balance consistency and concurrency. Essential for database and system design interviews.

Isolation Levels

Why This Matters

Think of isolation levels like different levels of privacy in an office. READ UNCOMMITTED is like working in an open office where you can see everyone's work, even if they haven't finished. READ COMMITTED is like having cubicles—you can only see work that's been finalized. SERIALIZABLE is like having private offices with locks—only one person can work on something at a time.

This matters because databases need to balance consistency and performance. Higher isolation (SERIALIZABLE) prevents all anomalies but is slow—transactions must run one at a time for conflicting data. Lower isolation (READ COMMITTED) allows more concurrency but can cause anomalies like dirty reads or lost updates. Understanding isolation levels helps you choose the right balance.

In interviews, when someone asks "How would you design a high-throughput system?", they're testing whether you understand isolation levels. Do you know when to use each level? Do you understand the trade-offs? Most engineers don't. They use the default isolation level and wonder why their system is slow or has data inconsistencies.

What Engineers Usually Get Wrong

Most engineers think "higher isolation is always better." But higher isolation has overhead. SERIALIZABLE isolation requires transactions to run one at a time for conflicting rows, which can cause contention and slow down the system. For high-throughput systems, you often need to use lower isolation levels and handle race conditions in application code.

Engineers also don't understand the anomalies that each level prevents. READ COMMITTED prevents dirty reads but allows non-repeatable reads. REPEATABLE READ prevents non-repeatable reads but allows phantom reads. SERIALIZABLE prevents all anomalies but is slow. Understanding these helps you choose the right level.

How This Breaks Systems in the Real World

An e-commerce system was processing orders. It used SERIALIZABLE isolation to prevent race conditions. Under normal load, this worked. But during a sale (high traffic), many transactions tried to update the same inventory row simultaneously. SERIALIZABLE isolation requires transactions to run one at a time for conflicting rows. Transactions started queuing. Response times went from 100ms to 10 seconds.

The fix? Use READ COMMITTED isolation and handle race conditions in application code (optimistic locking). The lesson? Isolation levels are trade-offs. Higher isolation is safer but slower. Choose based on your use case.

Another story: A service was using READ UNCOMMITTED isolation (the lowest level). It was reading data that other transactions were updating but hadn't committed yet. When those transactions rolled back, the service had seen data that never actually existed. This caused incorrect calculations and data corruption. The fix? Use at least READ COMMITTED isolation to prevent dirty reads.

The Four Standard Levels

READ UNCOMMITTED

Lowest isolation, highest concurrency

Allows dirty reads (reading uncommitted data)
No locks on reads
Fastest but least safe

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN;
  SELECT balance FROM accounts WHERE id = 1;  -- May read uncommitted data
COMMIT;

Problem: Transaction A reads data that Transaction B later rolls back.

Use case: Rarely used in practice due to data integrity risks.

READ COMMITTED

Default in PostgreSQL and most databases

Prevents dirty reads
Allows non-repeatable reads (same query returns different results)
Allows phantom reads (new rows appear)

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN;
  SELECT balance FROM accounts WHERE id = 1;  -- Reads committed value: $100
  -- Another transaction commits: UPDATE accounts SET balance = 50 WHERE id = 1;
  SELECT balance FROM accounts WHERE id = 1;  -- Reads new committed value: $50
COMMIT;

Implementation: Row-level locks, released after statement completion.

Use case: Most applications. Good balance of safety and performance.

REPEATABLE READ

Prevents non-repeatable reads

Prevents dirty reads
Prevents non-repeatable reads
Still allows phantom reads

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
  SELECT balance FROM accounts WHERE id = 1;  -- $100
  -- Another transaction commits: UPDATE accounts SET balance = 50 WHERE id = 1;
  SELECT balance FROM accounts WHERE id = 1;  -- Still $100 (snapshot isolation)
COMMIT;

Implementation: Snapshot isolation. Each transaction sees a consistent snapshot of the database.

Use case: When you need consistent reads within a transaction (e.g., calculating totals).

SERIALIZABLE

Highest isolation, strictest consistency

Prevents all anomalies: dirty reads, non-repeatable reads, phantom reads
Transactions appear to execute serially
Slowest but safest

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
  SELECT SUM(balance) FROM accounts;  -- Locks prevent concurrent modifications
  -- All concurrent transactions wait
COMMIT;

Implementation: Predicate locking, serialization conflict detection.

Use case: Critical financial operations, when absolute consistency is required.

Isolation Anomalies

Dirty Read

Reading uncommitted data that may be rolled back.

T1: UPDATE accounts SET balance = 200 WHERE id = 1;
T2: SELECT balance FROM accounts WHERE id = 1;  -- Reads 200 (uncommitted)
T1: ROLLBACK;  -- Balance is back to 100
T2: Now has incorrect data (200)

Prevented by: READ COMMITTED and above

Non-Repeatable Read

Same query returns different results within a transaction.

T1: SELECT balance FROM accounts WHERE id = 1;  -- Returns 100
T2: UPDATE accounts SET balance = 50 WHERE id = 1; COMMIT;
T1: SELECT balance FROM accounts WHERE id = 1;  -- Returns 50 (different!)

Prevented by: REPEATABLE READ and above

Phantom Read

New rows appear in a result set within a transaction.

T1: SELECT COUNT(*) FROM orders WHERE user_id = 1;  -- Returns 5
T2: INSERT INTO orders (user_id, ...) VALUES (1, ...); COMMIT;
T1: SELECT COUNT(*) FROM orders WHERE user_id = 1;  -- Returns 6 (phantom row!)

Prevented by: SERIALIZABLE

Choosing the Right Level

READ COMMITTED (Default)

Use when:

Most web applications
Good balance of safety and performance
Occasional inconsistencies are acceptable

REPEATABLE READ

Use when:

Calculating aggregates within a transaction
Need consistent snapshot for multiple reads
Reporting queries

SERIALIZABLE

Use when:

Financial transactions
Critical data integrity requirements
Can tolerate serialization conflicts/retries

Database-Specific Behavior

PostgreSQL

Default: READ COMMITTED
REPEATABLE READ uses snapshot isolation
SERIALIZABLE uses serialization conflict detection

MySQL (InnoDB)

Default: REPEATABLE READ
Uses next-key locking to prevent phantoms
READ COMMITTED requires different locking strategy

SQL Server

Default: READ COMMITTED
Supports all four levels
Can use snapshot isolation (READ COMMITTED SNAPSHOT)

Performance Implications

Lower isolation = Higher concurrency = Better performance
Higher isolation = More locking = Potential deadlocks

Best practice: Start with READ COMMITTED, increase only when necessary.

Deadlocks

Higher isolation levels increase deadlock risk:

-- Transaction 1
BEGIN;
  UPDATE accounts SET balance = balance - 50 WHERE id = 1;
  UPDATE accounts SET balance = balance + 50 WHERE id = 2;  -- Waits for T2

-- Transaction 2
BEGIN;
  UPDATE accounts SET balance = balance - 30 WHERE id = 2;
  UPDATE accounts SET balance = balance + 30 WHERE id = 1;  -- Waits for T1

-- Deadlock! Database aborts one transaction

Mitigation: Consistent lock ordering, shorter transactions, deadlock detection.

Interview Questions

1. Beginner Question

Q: What are the four isolation levels, and what problems does each prevent?

READ UNCOMMITTED:
- Lowest isolation, allows dirty reads
- Prevents: Nothing (least safe)
- Use case: Rarely used
READ COMMITTED:
- Default in most databases
- Prevents: Dirty reads
- Allows: Non-repeatable reads, phantom reads
- Use case: Most applications
REPEATABLE READ:
- Prevents: Dirty reads, non-repeatable reads
- Allows: Phantom reads
- Use case: When you need consistent reads within a transaction
SERIALIZABLE:
- Highest isolation, prevents all anomalies
- Prevents: Dirty reads, non-repeatable reads, phantom reads
- Use case: Critical financial operations

Example:

-- READ COMMITTED (default)
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN;
  SELECT balance FROM accounts WHERE id = 1;  -- $100
  -- Another transaction commits: balance = $50
  SELECT balance FROM accounts WHERE id = 1;  -- $50 (non-repeatable read)
COMMIT;

2. Intermediate Question

Q: Explain the difference between non-repeatable reads and phantom reads with examples.

Non-repeatable read: The same query returns different results within a transaction.

-- Transaction 1
BEGIN;
  SELECT balance FROM accounts WHERE id = 1;  -- Returns $100
  -- Transaction 2 commits: UPDATE accounts SET balance = 50 WHERE id = 1;
  SELECT balance FROM accounts WHERE id = 1;  -- Returns $50 (different!)
COMMIT;

Phantom read: New rows appear in a result set within a transaction.

-- Transaction 1
BEGIN;
  SELECT COUNT(*) FROM orders WHERE user_id = 1;  -- Returns 5
  -- Transaction 2 commits: INSERT INTO orders (user_id, ...) VALUES (1, ...);
  SELECT COUNT(*) FROM orders WHERE user_id = 1;  -- Returns 6 (phantom row!)
COMMIT;

Key difference: Non-repeatable reads affect existing rows, phantom reads involve new rows.

Prevention:

REPEATABLE READ prevents non-repeatable reads (uses snapshot isolation)
SERIALIZABLE prevents phantom reads (uses predicate locking)

3. Senior-Level System Question

Q: Design a ticket booking system for a concert with 10,000 seats. Multiple users can book simultaneously. How would you use isolation levels to prevent double-booking while maintaining performance?

Challenge: Prevent two users from booking the same seat while handling high concurrency.

Solution 1: SERIALIZABLE (Strong consistency, lower performance)

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
  -- Check availability
  SELECT COUNT(*) FROM bookings WHERE seat_id = ? AND event_id = ?;
  
  -- If available, book
  INSERT INTO bookings (user_id, seat_id, event_id) VALUES (?, ?, ?);
COMMIT;

Pros: Guarantees no double-booking Cons: Lower throughput, potential for serialization conflicts

Solution 2: READ COMMITTED + Optimistic Locking (Better performance)

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN;
  -- Use SELECT FOR UPDATE to lock the seat
  SELECT * FROM seats WHERE id = ? AND event_id = ? FOR UPDATE;
  
  -- Check if still available
  IF seat.status = 'available' THEN
    UPDATE seats SET status = 'booked', user_id = ? WHERE id = ?;
    INSERT INTO bookings (user_id, seat_id, event_id) VALUES (?, ?, ?);
  ELSE
    ROLLBACK;
    RETURN 'Seat already booked';
  END IF;
COMMIT;

Pros: Better performance, prevents double-booking Cons: Requires explicit locking

Solution 3: Application-level with database constraints (Best for high concurrency)

-- Database constraint prevents double-booking
ALTER TABLE bookings ADD CONSTRAINT unique_seat_event 
UNIQUE (seat_id, event_id);

-- Application code (READ COMMITTED)
BEGIN;
  INSERT INTO bookings (user_id, seat_id, event_id) 
  VALUES (?, ?, ?)
  ON CONFLICT (seat_id, event_id) DO NOTHING;
  
  IF ROW_COUNT() = 0 THEN
    ROLLBACK;
    RETURN 'Seat already booked';
  END IF;
COMMIT;

Pros:

Highest throughput (no explicit locks)
Database constraint ensures integrity
Simple application code

Cons:

Requires retry logic on conflict
Users may see "seat available" but booking fails

Solution 4: Distributed system approach (For very high scale)

# Use distributed lock (Redis) + database
def book_seat(user_id, seat_id, event_id):
    lock_key = f"seat:{event_id}:{seat_id}"
    
    # Acquire distributed lock
    if not redis.set(lock_key, user_id, nx=True, ex=30):
        return "Seat being booked by another user"
    
    try:
        # Check and book in database
        with transaction():
            seat = db.query("SELECT * FROM seats WHERE id = ? FOR UPDATE", seat_id)
            if seat.status != 'available':
                return "Seat already booked"
            
            db.execute("INSERT INTO bookings ...")
            db.execute("UPDATE seats SET status = 'booked' WHERE id = ?", seat_id)
    finally:
        redis.delete(lock_key)

Trade-offs:

SERIALIZABLE: Strongest guarantee, lowest performance
READ COMMITTED + FOR UPDATE: Good balance, explicit locking
Application-level + constraints: Highest performance, requires retry logic
Distributed locks: For very high scale, adds complexity

Recommendation: Use Solution 3 (application-level + constraints) for most cases. It provides good performance while maintaining data integrity.

Failure Stories You'll Recognize

The Serializable Bottleneck: An e-commerce system was processing orders. It used SERIALIZABLE isolation to prevent race conditions. Under normal load, this worked. But during a sale (high traffic), many transactions tried to update the same inventory row simultaneously. SERIALIZABLE isolation requires transactions to run one at a time for conflicting rows. Transactions started queuing. Response times went from 100ms to 10 seconds. The fix? Use READ COMMITTED isolation and handle race conditions in application code (optimistic locking). The lesson? Isolation levels are trade-offs. Higher isolation is safer but slower.

The Dirty Read: A service was using READ UNCOMMITTED isolation (the lowest level). It was reading data that other transactions were updating but hadn't committed yet. When those transactions rolled back, the service had seen data that never actually existed. This caused incorrect calculations and data corruption. The fix? Use at least READ COMMITTED isolation to prevent dirty reads.

The Lost Update: Two transactions were trying to update the same account balance. Both read the balance as $100. Both calculated the new balance. One wrote $50, the other wrote $70. The final balance was $70, but it should have been $20. This happened because READ COMMITTED isolation doesn't prevent lost updates. The fix? Use REPEATABLE READ or SERIALIZABLE isolation, or use optimistic locking in application code.

What Interviewers Are Really Testing

They want to hear you talk about isolation levels as trade-offs, not absolutes. Junior engineers say "use SERIALIZABLE for consistency." Senior engineers say "isolation levels balance consistency and performance. READ COMMITTED works for most cases. Use higher isolation only when you need it. For high-throughput systems, use lower isolation and handle race conditions in application code."

When they ask "How would you design a high-throughput system?", they're testing:

Do you understand the trade-offs between isolation levels?
Do you know when to use each level?
Can you design systems that balance consistency and performance?
Isolation levels balance consistency and concurrency—higher isolation = more consistency but lower performance
READ COMMITTED is the default in most databases and works for most applications
REPEATABLE READ prevents non-repeatable reads using snapshot isolation
SERIALIZABLE prevents all anomalies but can cause serialization conflicts and lower throughput
Dirty reads occur when reading uncommitted data that may be rolled back
Non-repeatable reads happen when the same query returns different results within a transaction
Phantom reads occur when new rows appear in a result set within a transaction
Higher isolation increases deadlock risk—use consistent lock ordering to mitigate
Choose isolation level based on requirements—not all applications need SERIALIZABLE
Understand database-specific behavior—PostgreSQL and MySQL implement isolation differently
Monitor for conflicts—track serialization failures and deadlocks
Consider optimistic locking for high-concurrency scenarios instead of high isolation levels

How InterviewCrafted Will Teach This

We'll teach this through production failures, not definitions. Instead of memorizing "SERIALIZABLE prevents all anomalies," you'll learn through scenarios like "why did our system become slow when we used SERIALIZABLE isolation?"

You'll see how isolation levels affect performance, consistency, and system design. When an interviewer asks "how would you design a high-throughput system?", you'll think about isolation trade-offs, race conditions, and optimization strategies—not just "use SERIALIZABLE."

Transactions - Isolation levels control how transactions interact. Understanding transactions is essential for understanding isolation levels.
ACID Properties - Isolation is the "I" in ACID. Understanding ACID helps understand isolation level trade-offs.
Query Optimization - Isolation levels affect query performance and locking. Understanding query optimization helps choose appropriate isolation levels.
Data Replication - Isolation levels affect replication lag and consistency. Understanding replication helps understand isolation in distributed systems.
Indexing - Indexes can reduce lock contention at higher isolation levels. Understanding indexing helps optimize isolation level performance.

Key Takeaways

Isolation levels balance consistency and concurrency—higher isolation = more consistency but lower performance

READ COMMITTED is the default in most databases and works for most applications

REPEATABLE READ prevents non-repeatable reads using snapshot isolation

SERIALIZABLE prevents all anomalies but can cause serialization conflicts and lower throughput

Dirty reads occur when reading uncommitted data that may be rolled back

Non-repeatable reads happen when the same query returns different results within a transaction

Phantom reads occur when new rows appear in a result set within a transaction

Higher isolation increases deadlock risk—use consistent lock ordering to mitigate

Choose isolation level based on requirements—not all applications need SERIALIZABLE

Understand database-specific behavior—PostgreSQL and MySQL implement isolation differently

Monitor for conflicts—track serialization failures and deadlocks

Consider optimistic locking for high-concurrency scenarios instead of high isolation levels

Keep exploring

Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.

Isolation Levels