← Back to databases

Database Topic

Indexing

Master database indexing to dramatically improve query performance. Essential for database optimization interviews.

Indexing

Why This Matters

Think of database indexes like an index in a book. Without an index, to find a topic, you'd have to scan every page from the beginning. With an index, you look up the topic, find the page number, and jump directly to it. Database indexes work the same way—they let you find data without scanning every row.

This matters because as your database grows, queries without indexes become painfully slow. A query that takes 10ms on 1,000 rows might take 10 seconds on 1 million rows. Indexes can reduce that back to 10ms. But indexes aren't free—they slow down writes (INSERT, UPDATE, DELETE) because the database must update the index too.

In interviews, when someone asks "How would you optimize a slow query?", they're testing whether you understand indexing. Do you know when to create indexes? Do you understand composite indexes and covering indexes? Most engineers don't. They either create too many indexes (slow writes) or too few (slow reads).

What Engineers Usually Get Wrong

Most engineers think "more indexes = faster queries." But indexes have overhead. Each index consumes disk space and slows down writes. If you create indexes on every column, your INSERT/UPDATE/DELETE operations become slow. The key is balance—index columns that are frequently queried, not every column.

Engineers also don't understand composite indexes. They create separate indexes on each column, but a composite index on (city, state) is more efficient for queries that filter by both. Also, the order matters—(city, state) works for queries filtering by city alone, but not for queries filtering by state alone.

How This Breaks Systems in the Real World

An e-commerce system had a users table with 10 million rows. Queries to find users by email were slow (5 seconds). The team added an index on email. Queries became fast (10ms). But then they noticed that user registration (INSERT) became slow. Investigation showed that the index was being updated on every INSERT, and with high registration volume, this became a bottleneck.

The fix? Use a covering index that includes frequently accessed columns, reducing the need for table lookups. But the real lesson is: indexes speed up reads but slow down writes. Balance based on your workload.

Another story: A service was querying users by (city, state, country). They created separate indexes on city, state, and country. Queries were still slow. Investigation showed the database was using only one index (city), then scanning the results. The fix? Create a composite index on (city, state, country). This allowed the database to use the index for the entire query, making it much faster.


How Indexes Work

Think of a database index like an index in a book: instead of scanning every page, you can quickly find the page number.

Without index: Full table scan (O(n)) With index: Index lookup + row fetch (O(log n))


Types of Indexes

B-Tree Index (Most Common)

  • Default index type in PostgreSQL, MySQL, and most databases
  • Efficient for equality and range queries
  • Maintains sorted order
CREATE INDEX idx_user_email ON users(email);

Hash Index

  • Very fast for equality lookups
  • Not useful for range queries or sorting
  • Used in PostgreSQL for equality-only queries
CREATE INDEX idx_user_id ON users USING HASH(id);

Bitmap Index

  • Efficient for columns with low cardinality (few distinct values)
  • Common in data warehouses
  • Good for boolean or enum columns

Full-Text Index

  • Optimized for text search
  • Supports complex search queries
CREATE INDEX idx_content_search ON articles USING GIN(to_tsvector('english', content));

Composite Index

  • Index on multiple columns
  • Order matters: most selective column first
CREATE INDEX idx_user_location ON users(city, state, country);

Query can use this index:

WHERE city = 'NYC' AND state = 'NY'
WHERE city = 'NYC'  -- Can use leftmost prefix

Query cannot use this index efficiently:

WHERE state = 'NY'  -- Cannot skip city

Index Design Principles

When to Create Indexes

  • Frequently queried columns: WHERE, JOIN, ORDER BY clauses
  • Foreign keys: Usually auto-indexed, but verify
  • Columns with high selectivity: Many distinct values

When NOT to Create Indexes

  • Small tables: Index overhead may exceed benefits
  • Frequently updated columns: Indexes slow down INSERT/UPDATE/DELETE
  • Low selectivity columns: Few distinct values (e.g., boolean, enum with 2-3 values)

Index Maintenance

  • Rebuild periodically: Fragmented indexes degrade performance
  • Monitor usage: Remove unused indexes
  • Analyze query plans: Use EXPLAIN to verify index usage

Common Patterns

Covering Index

An index that contains all columns needed for a query, avoiding table lookups.

-- Query
SELECT id, name, email FROM users WHERE email = 'user@example.com';

-- Covering index
CREATE INDEX idx_user_email_covering ON users(email) INCLUDE (id, name);

Partial Index

Index only a subset of rows, reducing index size.

-- Only index active users
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

Expression Index

Index on a computed value.

CREATE INDEX idx_user_lower_email ON users(LOWER(email));

Performance Considerations

  • Write overhead: Each index slows down INSERT/UPDATE/DELETE
  • Storage cost: Indexes consume disk space
  • Query planner: Modern databases choose indexes automatically, but you can hint
  • Cardinality: Higher cardinality = better index performance

Monitoring Index Usage

-- PostgreSQL: Check index usage
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
ORDER BY idx_scan;

-- Find unused indexes
SELECT schemaname, tablename, indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0;

Interview Questions

1. Beginner Question

Q: What is a database index, and why is it important?

A: A database index is a data structure that improves the speed of data retrieval operations. Think of it like an index in a book—instead of scanning every page, you can quickly find the page number.

Why important:

  • Performance: Reduces query time from O(n) to O(log n)
  • Efficiency: Avoids full table scans
  • Scalability: Essential for large tables

Example:

-- Without index: Scans all 1 million rows
SELECT * FROM users WHERE email = 'user@example.com';

-- With index: Finds row in log(n) time
CREATE INDEX idx_user_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';  -- Much faster

2. Intermediate Question

Q: When should you create an index, and when should you avoid creating one?

A:

Create indexes when:

  • Columns used in WHERE clauses frequently
  • Columns used in JOIN conditions
  • Columns used in ORDER BY
  • Foreign keys (usually auto-indexed)
  • High-cardinality columns (many distinct values)

Avoid indexes when:

  • Small tables (overhead exceeds benefit)
  • Frequently updated columns (slows down writes)
  • Low-cardinality columns (boolean, enum with few values)
  • Columns rarely used in queries

Example:

-- Good: High selectivity, frequently queried
CREATE INDEX idx_user_email ON users(email);

-- Bad: Low selectivity, rarely queried
CREATE INDEX idx_user_gender ON users(gender);  -- Only 2-3 values

Trade-off: Indexes speed up reads but slow down writes. Balance based on workload (read-heavy vs. write-heavy).

3. Senior-Level System Question

Q: Design an indexing strategy for a social media platform with 1B users, 10B posts, and 100B likes. The system needs to support: user profile lookups, post feeds, search, and analytics queries.

A:

Indexing strategy:

  1. User table indexes:

    -- Primary key (auto-indexed)
    PRIMARY KEY (user_id)
    
    -- Profile lookups
    CREATE INDEX idx_user_username ON users(username);  -- Unique, high cardinality
    CREATE INDEX idx_user_email ON users(email);  -- Unique, for authentication
    
    -- Feed generation (users followed by a user)
    CREATE INDEX idx_follows_follower ON follows(follower_id, created_at);
    
  2. Posts table indexes:

    -- Primary key
    PRIMARY KEY (post_id)
    
    -- User's posts (chronological)
    CREATE INDEX idx_posts_user_date ON posts(user_id, created_at DESC);
    
    -- Feed queries (posts from followed users)
    CREATE INDEX idx_posts_feed ON posts(user_id, created_at DESC) 
    WHERE status = 'published';
    
    -- Search (full-text)
    CREATE INDEX idx_posts_search ON posts USING GIN(to_tsvector('english', content));
    
  3. Likes table indexes:

    -- Composite index for like lookups
    CREATE INDEX idx_likes_post_user ON likes(post_id, user_id);
    CREATE INDEX idx_likes_user ON likes(user_id, created_at);
    
    -- Covering index for count queries
    CREATE INDEX idx_likes_post_covering ON likes(post_id) INCLUDE (user_id);
    
  4. Partial indexes for hot data:

    -- Only index recent posts (last 30 days)
    CREATE INDEX idx_posts_recent ON posts(created_at DESC) 
    WHERE created_at > NOW() - INTERVAL '30 days';
    
  5. Materialized views for analytics:

    -- Pre-aggregate like counts
    CREATE MATERIALIZED VIEW post_stats AS
    SELECT post_id, COUNT(*) as like_count, MAX(created_at) as last_liked
    FROM likes
    GROUP BY post_id;
    
    CREATE INDEX idx_post_stats_likes ON post_stats(like_count DESC);
    
  6. Partitioning for scale:

    -- Partition posts by date
    CREATE TABLE posts_2024_01 PARTITION OF posts
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
    

Optimization techniques:

  • Read replicas: Route analytics to replicas
  • Caching: Cache hot posts and user feeds in Redis
  • Batch updates: Update materialized views asynchronously

Monitoring:

  • Track index usage and remove unused indexes
  • Monitor index bloat and rebuild when needed
  • Alert on slow queries that might need new indexes

Failure Stories You'll Recognize

The Slow Registration: An e-commerce system had a users table with 10 million rows. Queries to find users by email were slow (5 seconds). The team added an index on email. Queries became fast (10ms). But then they noticed that user registration (INSERT) became slow. Investigation showed that the index was being updated on every INSERT, and with high registration volume, this became a bottleneck. The fix? Use a covering index that includes frequently accessed columns, reducing the need for table lookups. But the real lesson is: indexes speed up reads but slow down writes. Balance based on your workload.

The Composite Index That Wasn't: A service was querying users by (city, state, country). They created separate indexes on city, state, and country. Queries were still slow. Investigation showed the database was using only one index (city), then scanning the results. The fix? Create a composite index on (city, state, country). This allowed the database to use the index for the entire query, making it much faster.

The Unused Indexes: A team created indexes on every column "just in case." Over time, they had 50 indexes on a table. Writes became extremely slow. Investigation showed that most indexes were never used. The fix? Monitor index usage and remove unused indexes. This reduced write overhead significantly.

What Interviewers Are Really Testing

They want to hear you talk about when to create indexes, composite indexes, and the trade-offs between read and write performance. Junior engineers say "create indexes on frequently queried columns." Senior engineers say "indexes speed up reads but slow down writes. Use composite indexes for multi-column queries. Monitor usage and remove unused indexes. Balance based on workload."

When they ask "How would you optimize a slow query?", they're testing:

  • Do you understand how indexes work?

  • Do you know when to create indexes vs when not to?

  • Can you design an indexing strategy for a large system?

  • Indexes dramatically improve query performance by reducing scan time from O(n) to O(log n)

  • B-tree indexes are the most common, efficient for equality and range queries

  • Composite indexes order matters—most selective column first, supports leftmost prefix

  • Covering indexes can eliminate table lookups by including all needed columns

  • Indexes have trade-offs—they speed up reads but slow down writes (INSERT/UPDATE/DELETE)

  • Monitor index usage and remove unused indexes to reduce write overhead

  • Partial indexes can reduce index size by indexing only relevant rows

  • Expression indexes allow indexing computed values (e.g., LOWER(email))

  • High cardinality columns make better indexes than low cardinality

  • Balance is key—too few indexes = slow queries, too many = slow writes

How InterviewCrafted Will Teach This

We'll teach this through production failures, not definitions. Instead of memorizing "an index is a data structure," you'll learn through scenarios like "why did our user registration become slow after adding an index?"

You'll see how indexes affect query performance, write performance, and system design. When an interviewer asks "how would you optimize a slow query?", you'll think about indexing strategies, composite indexes, and trade-offs—not just "add an index."

  • Query Optimization - Indexes are a key technique for optimizing queries. Understanding indexing helps with query optimization.
  • B-Trees - Most indexes use B-tree data structures. Understanding B-trees helps understand how indexes work internally.
  • Partitioning - Partitioning and indexing work together to optimize large tables. Understanding both helps design efficient database schemas.
  • Transactions - Indexes affect transaction performance (slower writes, faster reads). Understanding transactions helps understand index trade-offs.
  • ACID Properties - Indexes must maintain consistency during transactions. Understanding ACID helps understand index behavior.

Key Takeaways

Indexes dramatically improve query performance by reducing scan time from O(n) to O(log n)

B-tree indexes are the most common, efficient for equality and range queries

Composite indexes order matters—most selective column first, supports leftmost prefix

Covering indexes can eliminate table lookups by including all needed columns

Indexes have trade-offs—they speed up reads but slow down writes (INSERT/UPDATE/DELETE)

Monitor index usage and remove unused indexes to reduce write overhead

Partial indexes can reduce index size by indexing only relevant rows

Expression indexes allow indexing computed values (e.g., LOWER(email))

High cardinality columns make better indexes than low cardinality

Balance is key—too few indexes = slow queries, too many = slow writes

Keep exploring

Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.