Database Topic

Key-Value Stores

Master key-value stores like Redis for caching and high-performance data access. Essential for system design interviews.

Key-Value Stores

Why This Matters

Think of key-value stores like a filing cabinet with labeled folders. You have a key (the label) and a value (the folder contents). To find something, you look up the key. Key-value stores do the same for data—they store data as key-value pairs, allowing fast lookups by key.

This matters because key-value stores are fast. They're optimized for simple operations: get by key, set by key, delete by key. For use cases like caching, session storage, or simple lookups, key-value stores are much faster than relational databases. They're also simple—no complex queries, no joins, just key-value operations.

In interviews, when someone asks "How would you implement a cache?", they're testing whether you understand key-value stores. Do you know when to use Redis? Do you understand caching patterns? Most engineers don't. They use relational databases for everything and wonder why it's slow.

What Engineers Usually Get Wrong

Most engineers think "key-value stores are just simple databases." But key-value stores are optimized for different use cases. They're fast for simple lookups but don't support complex queries. If you need to query by value (not just by key), key-value stores aren't ideal. Use them for caching, session storage, or simple lookups.

Engineers also don't understand that key-value stores are often in-memory (like Redis), which means data is lost on restart. If you need persistence, you need to configure persistence (RDB snapshots, AOF logs) or use a different storage system. Understanding this helps you choose the right tool.

How This Breaks Systems in the Real World

A service was using Redis for caching. The cache stored frequently accessed data, reducing database load. But Redis was configured without persistence. When Redis restarted, the cache was empty. All requests hit the database, overwhelming it. The service became slow and unreliable. The fix? Configure Redis persistence (RDB or AOF). Or use a hybrid approach—cache in Redis, but don't rely on it for critical data.

Another story: A service was using a relational database for session storage. Each request queried the database to get session data. Under normal load, this worked. But during high traffic, the database became a bottleneck. The fix? Use a key-value store (Redis) for session storage. It's faster and designed for this use case. This reduced database load significantly.

Basic Concept

A key-value store is like a hash table or dictionary:

Key: "user:123"
Value: {"name": "John", "email": "john@example.com"}

Key: "session:abc123"
Value: {"user_id": 123, "expires_at": "2024-01-20T10:00:00Z"}

Key: "cache:product:456"
Value: {"name": "Widget", "price": 29.99, "stock": 100}

Operations:

GET key - Retrieve value
SET key value - Store value
DELETE key - Remove key
EXISTS key - Check if key exists

Characteristics

Simplicity

Minimal data model
Fast operations (O(1) lookup)
Easy to understand and use

Performance

Extremely fast reads and writes
Low latency
High throughput

Limitations

No complex queries (no WHERE clauses, JOINs)
No relationships between keys
Value is opaque (database doesn't understand structure)

Popular Key-Value Stores

Redis

In-memory data structure store, supports various data types.

# Strings
SET user:123:name "John"
GET user:123:name

# Hashes
HSET user:123 name "John" email "john@example.com"
HGETALL user:123

# Lists
LPUSH notifications:123 "New message"
LRANGE notifications:123 0 -1

# Sets
SADD tags:post:1 "tech" "programming"
SMEMBERS tags:post:1

# Sorted Sets
ZADD leaderboard 100 "player1"
ZRANGE leaderboard 0 -1 WITHSCORES

Features:

Persistence options (RDB, AOF)
Pub/Sub messaging
Lua scripting
Expiration (TTL)

Memcached

Simple in-memory caching system.

# Python example
import memcache
mc = memcache.Client(['127.0.0.1:11211'])
mc.set('user:123', {'name': 'John'})
value = mc.get('user:123')

Characteristics:

Pure caching (no persistence)
Distributed
Simple protocol

DynamoDB (AWS)

Managed key-value and document database.

// Put item
await dynamodb.put({
  TableName: 'Users',
  Item: {
    userId: '123',
    name: 'John',
    email: 'john@example.com'
  }
});

// Get item
const result = await dynamodb.get({
  TableName: 'Users',
  Key: { userId: '123' }
});

Features:

Fully managed
Auto-scaling
Global tables (multi-region)
Streams for change data capture

etcd

Distributed key-value store for configuration and service discovery.

Use cases:

Kubernetes (stores cluster state)
Service discovery
Distributed locking
Configuration management

Use Cases

Caching

Store frequently accessed data to reduce database load.

Key: "cache:user:123"
Value: User object (JSON)
TTL: 3600 seconds

Benefits:

Reduce database queries
Faster response times
Lower database load

Session Storage

Store user session data.

Key: "session:abc123def456"
Value: {"user_id": 123, "login_time": "2024-01-15T10:00:00Z"}
TTL: 1800 seconds (30 minutes)

Rate Limiting

Track API request counts.

Key: "ratelimit:api:user:123"
Value: 42 (request count)
TTL: 60 seconds (resets every minute)

Leaderboards

Real-time rankings.

ZADD leaderboard 1000 "player1"
ZADD leaderboard 950 "player2"
ZADD leaderboard 1100 "player3"
ZREVRANGE leaderboard 0 9 WITHSCORES  # Top 10

Real-Time Features

Counters: Like counts, view counts
Presence: Who's online
Queues: Task queues, message queues
Pub/Sub: Real-time notifications

Data Modeling Patterns

Namespacing

Use prefixes to organize keys:

user:123:profile
user:123:settings
user:123:preferences

session:abc123
session:def456

cache:product:789
cache:product:790

Composite Keys

Combine multiple values into a key:

order:123:item:456  # Order 123, item 456
user:123:friend:789  # User 123's friend 789

Serialization

Store complex data by serializing:

# JSON
import json
value = json.dumps({"name": "John", "age": 30})
redis.set("user:123", value)

# MessagePack (more efficient)
import msgpack
value = msgpack.packb({"name": "John", "age": 30})
redis.set("user:123", value)

Advanced Features

Expiration (TTL)

Automatically delete keys after a time period.

SET session:abc123 "data" EX 3600  # Expires in 3600 seconds
SET session:def456 "data" PX 3600000  # Expires in 3600000 milliseconds

Atomic Operations

# Increment
INCR page_views:article:123
INCRBY counter:123 5

# Compare and swap
SET key "old_value"
GETSET key "new_value"  # Returns old value, sets new

# Conditional set
SETNX key "value"  # Only set if key doesn't exist

Transactions

MULTI
SET key1 "value1"
SET key2 "value2"
INCR counter
EXEC  # Execute all commands atomically

When to Use Key-Value Stores

Good Fit

Caching: Reduce database load
Session storage: Fast session lookups
Real-time data: Counters, leaderboards
Simple lookups: By ID or known key
Temporary data: Data with expiration

Not a Good Fit

Complex queries: Need WHERE, JOIN, GROUP BY
Relationships: Data with foreign keys
Analytics: Complex aggregations
Structured queries: Ad-hoc reporting

Best Practices

Use appropriate TTL: Set expiration for temporary data
Namespace keys: Organize with prefixes
Monitor memory: In-memory stores have size limits
Handle failures: Key-value stores can be ephemeral
Choose serialization: JSON is readable, MessagePack is efficient

Common Patterns

Cache-Aside Pattern

def get_user(user_id):
    # Try cache first
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    
    # Cache miss: query database
    user = db.query_user(user_id)
    
    # Store in cache
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    return user

Write-Through Pattern

def update_user(user_id, data):
    # Update database
    db.update_user(user_id, data)
    
    # Update cache
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))

Distributed Locking

def acquire_lock(lock_key, timeout=10):
    lock_value = str(uuid.uuid4())
    if redis.set(lock_key, lock_value, nx=True, ex=timeout):
        return lock_value
    return None

def release_lock(lock_key, lock_value):
    # Only release if we own the lock
    if redis.get(lock_key) == lock_value:
        redis.delete(lock_key)

Interview Questions

1. Beginner Question

Q: What is a key-value store, and what are its main use cases?

A: A key-value store is the simplest NoSQL database model, storing data as key-value pairs with fast lookup capabilities.

Main use cases:

Caching: Store frequently accessed data to reduce database load
Session storage: Fast session lookups for web applications
Real-time data: Counters, leaderboards, presence indicators
Rate limiting: Track API request counts
Simple lookups: By ID or known key

Example:

# Caching user data
redis.set("user:123", json.dumps({"name": "John", "email": "john@example.com"}))
user = json.loads(redis.get("user:123"))

Why use it: Extremely fast (O(1) lookup), low latency, high throughput.

2. Intermediate Question

Q: Explain the cache-aside pattern and when to use it vs. write-through.

Cache-Aside (Lazy Loading):

def get_user(user_id):
    # 1. Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    
    # 2. Cache miss: query database
    user = db.query_user(user_id)
    
    # 3. Store in cache
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    return user

Pros: Simple, only caches accessed data Cons: Cache miss penalty, possible stale data

Write-Through:

def update_user(user_id, data):
    # 1. Update database
    db.update_user(user_id, data)
    
    # 2. Update cache
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))

Pros: Cache always consistent with database Cons: Write penalty (updates both cache and DB)

When to use:

Cache-aside: Read-heavy workloads, can tolerate stale data
Write-through: Write-heavy, need strong consistency

3. Senior-Level System Question

Q: Design a distributed rate limiting system using Redis that can handle 10M requests/second across 100 servers. How would you prevent a single user from exceeding their rate limit?

Solution: Token bucket algorithm with Redis:

def check_rate_limit(user_id, limit=100, window=60):
    key = f"ratelimit:{user_id}"
    now = time.time()
    
    # Use Redis pipeline for atomic operations
    pipe = redis.pipeline()
    
    # Remove old entries
    pipe.zremrangebyscore(key, 0, now - window)
    
    # Count current requests
    pipe.zcard(key)
    
    # Add current request
    pipe.zadd(key, {str(now): now})
    
    # Set expiration
    pipe.expire(key, window)
    
    results = pipe.execute()
    current_count = results[1]
    
    if current_count >= limit:
        return False, limit - current_count  # Rate limited
    
    return True, limit - current_count - 1  # Allowed

Alternative: Sliding window log:

def check_rate_limit_sliding(user_id, limit=100, window=60):
    key = f"ratelimit:{user_id}"
    now = time.time()
    
    # Remove requests outside window
    redis.zremrangebyscore(key, 0, now - window)
    
    # Count requests in window
    count = redis.zcard(key)
    
    if count >= limit:
        return False
    
    # Add current request
    redis.zadd(key, {str(now): now})
    redis.expire(key, window)
    return True

Distributed approach (multiple servers):

# Use Redis with consistent hashing for sharding
def check_rate_limit_distributed(user_id, limit=100, window=60):
    # Hash user_id to determine Redis shard
    shard = hash(user_id) % num_redis_shards
    redis_client = redis_shards[shard]
    
    # Use same algorithm on sharded Redis
    return check_rate_limit(user_id, limit, window, redis_client)

Optimizations:

Lua scripts: Execute atomically on Redis server
Sharding: Distribute load across multiple Redis instances
Local cache: Cache rate limit status locally to reduce Redis calls
Batch updates: Batch multiple checks in pipeline

Monitoring:

Track rate limit hits/misses
Monitor Redis memory usage
Alert on high rate limit violations

Failure Stories You'll Recognize

The Lost Cache: A service was using Redis for caching. The cache stored frequently accessed data, reducing database load. But Redis was configured without persistence. When Redis restarted, the cache was empty. All requests hit the database, overwhelming it. The service became slow and unreliable. The fix? Configure Redis persistence (RDB or AOF). Or use a hybrid approach—cache in Redis, but don't rely on it for critical data.

The Wrong Tool: A service was using a relational database for session storage. Each request queried the database to get session data. Under normal load, this worked. But during high traffic, the database became a bottleneck. The fix? Use a key-value store (Redis) for session storage. It's faster and designed for this use case. This reduced database load significantly.

The Memory Exhaustion: A service was using Redis for caching but didn't set memory limits or eviction policies. The cache grew unbounded, consuming all available memory. Redis started evicting keys randomly, causing cache misses. Performance degraded. The fix? Set memory limits and eviction policies (LRU, LFU). Monitor memory usage. Use TTLs to expire old data automatically.

What Interviewers Are Really Testing

They want to hear you talk about key-value stores as a tool for specific use cases, not a replacement for databases. Junior engineers say "key-value stores are fast." Senior engineers say "key-value stores are fast for simple lookups by key. Use them for caching, session storage, or simple lookups. They don't support complex queries. Configure persistence and memory limits. Use TTLs for temporary data."

When they ask "How would you implement a cache?", they're testing:

Do you know when to use key-value stores?
Do you understand caching patterns?
Can you configure and manage key-value stores?
Key-value stores are the simplest NoSQL model—extremely fast O(1) lookups
Primary use cases: Caching, session storage, real-time data, rate limiting
Cache-aside pattern loads data on cache miss, good for read-heavy workloads
Write-through pattern updates cache and DB together, ensures consistency
Redis is the most popular key-value store, supports many data structures
TTL (time-to-live) is essential for temporary data (sessions, cache)
Namespace keys with prefixes (e.g., "user:123") for organization
Distributed locking using Redis SETNX for coordination across servers
Not for complex queries—use for simple lookups, not JOINs or aggregations
Memory management is critical—monitor memory usage and set eviction policies
Persistence options in Redis (RDB snapshots, AOF) for durability
Atomic operations (INCR, SETNX) are powerful for counters and locks

How InterviewCrafted Will Teach This

We'll teach this through production failures, not definitions. Instead of memorizing "key-value stores are fast," you'll learn through scenarios like "why did our database get overwhelmed when Redis restarted?"

You'll see how key-value stores affect performance, reliability, and system design. When an interviewer asks "how would you implement a cache?", you'll think about key-value stores, caching patterns, and configuration—not just "use Redis."

NoSQL Basics - Key-value stores are a type of NoSQL database. Understanding NoSQL basics helps understand key-value store characteristics.
Document Stores - More complex NoSQL alternative for structured data. Understanding document stores helps choose between NoSQL types.
Hash Tables - Key-value stores are essentially distributed hash tables. Understanding hash tables helps understand key-value store internals.
Data Replication - Key-value stores use replication for availability. Understanding replication helps design distributed key-value stores.
Query Optimization - Key-value stores optimize for simple lookups. Understanding query optimization helps understand when to use key-value stores.

Key Takeaways

Key-value stores are the simplest NoSQL model—extremely fast O(1) lookups

Primary use cases: Caching, session storage, real-time data, rate limiting

Cache-aside pattern loads data on cache miss, good for read-heavy workloads

Write-through pattern updates cache and DB together, ensures consistency

Redis is the most popular key-value store, supports many data structures

TTL (time-to-live) is essential for temporary data (sessions, cache)

Namespace keys with prefixes (e.g., "user:123") for organization

Distributed locking using Redis SETNX for coordination across servers

Not for complex queries—use for simple lookups, not JOINs or aggregations

Memory management is critical—monitor memory usage and set eviction policies

Persistence options in Redis (RDB snapshots, AOF) for durability

Atomic operations (INCR, SETNX) are powerful for counters and locks

Keep exploring

Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.

Key-Value Stores

Key-Value Stores

Why This Matters

What Engineers Usually Get Wrong

How This Breaks Systems in the Real World

Basic Concept

Characteristics

Simplicity

Performance

Limitations

Popular Key-Value Stores

Redis

Memcached

DynamoDB (AWS)

etcd

Use Cases

Caching

Session Storage

Rate Limiting

Leaderboards

Real-Time Features

Data Modeling Patterns

Namespacing

Composite Keys

Serialization

Advanced Features

Expiration (TTL)

Atomic Operations

Transactions

When to Use Key-Value Stores

Good Fit

Not a Good Fit

Best Practices

Common Patterns

Cache-Aside Pattern

Write-Through Pattern

Distributed Locking

Interview Questions

1. Beginner Question

2. Intermediate Question

3. Senior-Level System Question

Failure Stories You'll Recognize

What Interviewers Are Really Testing

How InterviewCrafted Will Teach This

Key Takeaways

Related Topics

Keep exploring