Database Topic
Key-Value Stores
Master key-value stores like Redis for caching and high-performance data access. Essential for system design interviews.
Key-Value Stores
Why This Matters
Think of key-value stores like a filing cabinet with labeled folders. You have a key (the label) and a value (the folder contents). To find something, you look up the key. Key-value stores do the same for data—they store data as key-value pairs, allowing fast lookups by key.
This matters because key-value stores are fast. They're optimized for simple operations: get by key, set by key, delete by key. For use cases like caching, session storage, or simple lookups, key-value stores are much faster than relational databases. They're also simple—no complex queries, no joins, just key-value operations.
In interviews, when someone asks "How would you implement a cache?", they're testing whether you understand key-value stores. Do you know when to use Redis? Do you understand caching patterns? Most engineers don't. They use relational databases for everything and wonder why it's slow.
What Engineers Usually Get Wrong
Most engineers think "key-value stores are just simple databases." But key-value stores are optimized for different use cases. They're fast for simple lookups but don't support complex queries. If you need to query by value (not just by key), key-value stores aren't ideal. Use them for caching, session storage, or simple lookups.
Engineers also don't understand that key-value stores are often in-memory (like Redis), which means data is lost on restart. If you need persistence, you need to configure persistence (RDB snapshots, AOF logs) or use a different storage system. Understanding this helps you choose the right tool.
How This Breaks Systems in the Real World
A service was using Redis for caching. The cache stored frequently accessed data, reducing database load. But Redis was configured without persistence. When Redis restarted, the cache was empty. All requests hit the database, overwhelming it. The service became slow and unreliable. The fix? Configure Redis persistence (RDB or AOF). Or use a hybrid approach—cache in Redis, but don't rely on it for critical data.
Another story: A service was using a relational database for session storage. Each request queried the database to get session data. Under normal load, this worked. But during high traffic, the database became a bottleneck. The fix? Use a key-value store (Redis) for session storage. It's faster and designed for this use case. This reduced database load significantly.
Basic Concept
A key-value store is like a hash table or dictionary:
Key: "user:123"
Value: {"name": "John", "email": "john@example.com"}
Key: "session:abc123"
Value: {"user_id": 123, "expires_at": "2024-01-20T10:00:00Z"}
Key: "cache:product:456"
Value: {"name": "Widget", "price": 29.99, "stock": 100}
Operations:
GET key- Retrieve valueSET key value- Store valueDELETE key- Remove keyEXISTS key- Check if key exists
Characteristics
Simplicity
- Minimal data model
- Fast operations (O(1) lookup)
- Easy to understand and use
Performance
- Extremely fast reads and writes
- Low latency
- High throughput
Limitations
- No complex queries (no WHERE clauses, JOINs)
- No relationships between keys
- Value is opaque (database doesn't understand structure)
Popular Key-Value Stores
Redis
In-memory data structure store, supports various data types.
# Strings
SET user:123:name "John"
GET user:123:name
# Hashes
HSET user:123 name "John" email "john@example.com"
HGETALL user:123
# Lists
LPUSH notifications:123 "New message"
LRANGE notifications:123 0 -1
# Sets
SADD tags:post:1 "tech" "programming"
SMEMBERS tags:post:1
# Sorted Sets
ZADD leaderboard 100 "player1"
ZRANGE leaderboard 0 -1 WITHSCORES
Features:
- Persistence options (RDB, AOF)
- Pub/Sub messaging
- Lua scripting
- Expiration (TTL)
Memcached
Simple in-memory caching system.
# Python example
import memcache
mc = memcache.Client(['127.0.0.1:11211'])
mc.set('user:123', {'name': 'John'})
value = mc.get('user:123')
Characteristics:
- Pure caching (no persistence)
- Distributed
- Simple protocol
DynamoDB (AWS)
Managed key-value and document database.
// Put item
await dynamodb.put({
TableName: 'Users',
Item: {
userId: '123',
name: 'John',
email: 'john@example.com'
}
});
// Get item
const result = await dynamodb.get({
TableName: 'Users',
Key: { userId: '123' }
});
Features:
- Fully managed
- Auto-scaling
- Global tables (multi-region)
- Streams for change data capture
etcd
Distributed key-value store for configuration and service discovery.
Use cases:
- Kubernetes (stores cluster state)
- Service discovery
- Distributed locking
- Configuration management
Use Cases
Caching
Store frequently accessed data to reduce database load.
Key: "cache:user:123"
Value: User object (JSON)
TTL: 3600 seconds
Benefits:
- Reduce database queries
- Faster response times
- Lower database load
Session Storage
Store user session data.
Key: "session:abc123def456"
Value: {"user_id": 123, "login_time": "2024-01-15T10:00:00Z"}
TTL: 1800 seconds (30 minutes)
Rate Limiting
Track API request counts.
Key: "ratelimit:api:user:123"
Value: 42 (request count)
TTL: 60 seconds (resets every minute)
Leaderboards
Real-time rankings.
ZADD leaderboard 1000 "player1"
ZADD leaderboard 950 "player2"
ZADD leaderboard 1100 "player3"
ZREVRANGE leaderboard 0 9 WITHSCORES # Top 10
Real-Time Features
- Counters: Like counts, view counts
- Presence: Who's online
- Queues: Task queues, message queues
- Pub/Sub: Real-time notifications
Data Modeling Patterns
Namespacing
Use prefixes to organize keys:
user:123:profile
user:123:settings
user:123:preferences
session:abc123
session:def456
cache:product:789
cache:product:790
Composite Keys
Combine multiple values into a key:
order:123:item:456 # Order 123, item 456
user:123:friend:789 # User 123's friend 789
Serialization
Store complex data by serializing:
# JSON
import json
value = json.dumps({"name": "John", "age": 30})
redis.set("user:123", value)
# MessagePack (more efficient)
import msgpack
value = msgpack.packb({"name": "John", "age": 30})
redis.set("user:123", value)
Advanced Features
Expiration (TTL)
Automatically delete keys after a time period.
SET session:abc123 "data" EX 3600 # Expires in 3600 seconds
SET session:def456 "data" PX 3600000 # Expires in 3600000 milliseconds
Atomic Operations
# Increment
INCR page_views:article:123
INCRBY counter:123 5
# Compare and swap
SET key "old_value"
GETSET key "new_value" # Returns old value, sets new
# Conditional set
SETNX key "value" # Only set if key doesn't exist
Transactions
MULTI
SET key1 "value1"
SET key2 "value2"
INCR counter
EXEC # Execute all commands atomically
When to Use Key-Value Stores
Good Fit
- Caching: Reduce database load
- Session storage: Fast session lookups
- Real-time data: Counters, leaderboards
- Simple lookups: By ID or known key
- Temporary data: Data with expiration
Not a Good Fit
- Complex queries: Need WHERE, JOIN, GROUP BY
- Relationships: Data with foreign keys
- Analytics: Complex aggregations
- Structured queries: Ad-hoc reporting
Best Practices
- Use appropriate TTL: Set expiration for temporary data
- Namespace keys: Organize with prefixes
- Monitor memory: In-memory stores have size limits
- Handle failures: Key-value stores can be ephemeral
- Choose serialization: JSON is readable, MessagePack is efficient
Common Patterns
Cache-Aside Pattern
def get_user(user_id):
# Try cache first
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# Cache miss: query database
user = db.query_user(user_id)
# Store in cache
redis.setex(f"user:{user_id}", 3600, json.dumps(user))
return user
Write-Through Pattern
def update_user(user_id, data):
# Update database
db.update_user(user_id, data)
# Update cache
redis.setex(f"user:{user_id}", 3600, json.dumps(data))
Distributed Locking
def acquire_lock(lock_key, timeout=10):
lock_value = str(uuid.uuid4())
if redis.set(lock_key, lock_value, nx=True, ex=timeout):
return lock_value
return None
def release_lock(lock_key, lock_value):
# Only release if we own the lock
if redis.get(lock_key) == lock_value:
redis.delete(lock_key)
Interview Questions
1. Beginner Question
Q: What is a key-value store, and what are its main use cases?
A: A key-value store is the simplest NoSQL database model, storing data as key-value pairs with fast lookup capabilities.
Main use cases:
- Caching: Store frequently accessed data to reduce database load
- Session storage: Fast session lookups for web applications
- Real-time data: Counters, leaderboards, presence indicators
- Rate limiting: Track API request counts
- Simple lookups: By ID or known key
Example:
# Caching user data
redis.set("user:123", json.dumps({"name": "John", "email": "john@example.com"}))
user = json.loads(redis.get("user:123"))
Why use it: Extremely fast (O(1) lookup), low latency, high throughput.
2. Intermediate Question
Q: Explain the cache-aside pattern and when to use it vs. write-through.
A:
Cache-Aside (Lazy Loading):
def get_user(user_id):
# 1. Check cache
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# 2. Cache miss: query database
user = db.query_user(user_id)
# 3. Store in cache
redis.setex(f"user:{user_id}", 3600, json.dumps(user))
return user
Pros: Simple, only caches accessed data Cons: Cache miss penalty, possible stale data
Write-Through:
def update_user(user_id, data):
# 1. Update database
db.update_user(user_id, data)
# 2. Update cache
redis.setex(f"user:{user_id}", 3600, json.dumps(data))
Pros: Cache always consistent with database Cons: Write penalty (updates both cache and DB)
When to use:
- Cache-aside: Read-heavy workloads, can tolerate stale data
- Write-through: Write-heavy, need strong consistency
3. Senior-Level System Question
Q: Design a distributed rate limiting system using Redis that can handle 10M requests/second across 100 servers. How would you prevent a single user from exceeding their rate limit?
A:
Solution: Token bucket algorithm with Redis:
def check_rate_limit(user_id, limit=100, window=60):
key = f"ratelimit:{user_id}"
now = time.time()
# Use Redis pipeline for atomic operations
pipe = redis.pipeline()
# Remove old entries
pipe.zremrangebyscore(key, 0, now - window)
# Count current requests
pipe.zcard(key)
# Add current request
pipe.zadd(key, {str(now): now})
# Set expiration
pipe.expire(key, window)
results = pipe.execute()
current_count = results[1]
if current_count >= limit:
return False, limit - current_count # Rate limited
return True, limit - current_count - 1 # Allowed
Alternative: Sliding window log:
def check_rate_limit_sliding(user_id, limit=100, window=60):
key = f"ratelimit:{user_id}"
now = time.time()
# Remove requests outside window
redis.zremrangebyscore(key, 0, now - window)
# Count requests in window
count = redis.zcard(key)
if count >= limit:
return False
# Add current request
redis.zadd(key, {str(now): now})
redis.expire(key, window)
return True
Distributed approach (multiple servers):
# Use Redis with consistent hashing for sharding
def check_rate_limit_distributed(user_id, limit=100, window=60):
# Hash user_id to determine Redis shard
shard = hash(user_id) % num_redis_shards
redis_client = redis_shards[shard]
# Use same algorithm on sharded Redis
return check_rate_limit(user_id, limit, window, redis_client)
Optimizations:
- Lua scripts: Execute atomically on Redis server
- Sharding: Distribute load across multiple Redis instances
- Local cache: Cache rate limit status locally to reduce Redis calls
- Batch updates: Batch multiple checks in pipeline
Monitoring:
- Track rate limit hits/misses
- Monitor Redis memory usage
- Alert on high rate limit violations
Failure Stories You'll Recognize
The Lost Cache: A service was using Redis for caching. The cache stored frequently accessed data, reducing database load. But Redis was configured without persistence. When Redis restarted, the cache was empty. All requests hit the database, overwhelming it. The service became slow and unreliable. The fix? Configure Redis persistence (RDB or AOF). Or use a hybrid approach—cache in Redis, but don't rely on it for critical data.
The Wrong Tool: A service was using a relational database for session storage. Each request queried the database to get session data. Under normal load, this worked. But during high traffic, the database became a bottleneck. The fix? Use a key-value store (Redis) for session storage. It's faster and designed for this use case. This reduced database load significantly.
The Memory Exhaustion: A service was using Redis for caching but didn't set memory limits or eviction policies. The cache grew unbounded, consuming all available memory. Redis started evicting keys randomly, causing cache misses. Performance degraded. The fix? Set memory limits and eviction policies (LRU, LFU). Monitor memory usage. Use TTLs to expire old data automatically.
What Interviewers Are Really Testing
They want to hear you talk about key-value stores as a tool for specific use cases, not a replacement for databases. Junior engineers say "key-value stores are fast." Senior engineers say "key-value stores are fast for simple lookups by key. Use them for caching, session storage, or simple lookups. They don't support complex queries. Configure persistence and memory limits. Use TTLs for temporary data."
When they ask "How would you implement a cache?", they're testing:
-
Do you know when to use key-value stores?
-
Do you understand caching patterns?
-
Can you configure and manage key-value stores?
-
Key-value stores are the simplest NoSQL model—extremely fast O(1) lookups
-
Primary use cases: Caching, session storage, real-time data, rate limiting
-
Cache-aside pattern loads data on cache miss, good for read-heavy workloads
-
Write-through pattern updates cache and DB together, ensures consistency
-
Redis is the most popular key-value store, supports many data structures
-
TTL (time-to-live) is essential for temporary data (sessions, cache)
-
Namespace keys with prefixes (e.g., "user:123") for organization
-
Distributed locking using Redis SETNX for coordination across servers
-
Not for complex queries—use for simple lookups, not JOINs or aggregations
-
Memory management is critical—monitor memory usage and set eviction policies
-
Persistence options in Redis (RDB snapshots, AOF) for durability
-
Atomic operations (INCR, SETNX) are powerful for counters and locks
How InterviewCrafted Will Teach This
We'll teach this through production failures, not definitions. Instead of memorizing "key-value stores are fast," you'll learn through scenarios like "why did our database get overwhelmed when Redis restarted?"
You'll see how key-value stores affect performance, reliability, and system design. When an interviewer asks "how would you implement a cache?", you'll think about key-value stores, caching patterns, and configuration—not just "use Redis."
- NoSQL Basics - Key-value stores are a type of NoSQL database. Understanding NoSQL basics helps understand key-value store characteristics.
- Document Stores - More complex NoSQL alternative for structured data. Understanding document stores helps choose between NoSQL types.
- Hash Tables - Key-value stores are essentially distributed hash tables. Understanding hash tables helps understand key-value store internals.
- Data Replication - Key-value stores use replication for availability. Understanding replication helps design distributed key-value stores.
- Query Optimization - Key-value stores optimize for simple lookups. Understanding query optimization helps understand when to use key-value stores.
Key Takeaways
Key-value stores are the simplest NoSQL model—extremely fast O(1) lookups
Primary use cases: Caching, session storage, real-time data, rate limiting
Cache-aside pattern loads data on cache miss, good for read-heavy workloads
Write-through pattern updates cache and DB together, ensures consistency
Redis is the most popular key-value store, supports many data structures
TTL (time-to-live) is essential for temporary data (sessions, cache)
Namespace keys with prefixes (e.g., "user:123") for organization
Distributed locking using Redis SETNX for coordination across servers
Not for complex queries—use for simple lookups, not JOINs or aggregations
Memory management is critical—monitor memory usage and set eviction policies
Persistence options in Redis (RDB snapshots, AOF) for durability
Atomic operations (INCR, SETNX) are powerful for counters and locks
Related Topics
NoSQL Basics
Key-value stores are a type of NoSQL database. Understanding NoSQL basics helps understand key-value store characteristics.
Document Stores
More complex NoSQL alternative for structured data. Understanding document stores helps choose between NoSQL types.
Hash Tables
Key-value stores are essentially distributed hash tables. Understanding hash tables helps understand key-value store internals.
Data Replication
Key-value stores use replication for availability. Understanding replication helps design distributed key-value stores.
Query Optimization
Key-value stores optimize for simple lookups. Understanding query optimization helps understand when to use key-value stores.
Keep exploring
Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.