← Back to databases

Database Topic

Key-Value Stores

Master key-value stores like Redis for caching and high-performance data access. Essential for system design interviews.

Key-value stores are the simplest form of NoSQL databases, storing data as key-value pairs with fast lookup capabilities.


Basic Concept

A key-value store is like a hash table or dictionary:

Key: "user:123"
Value: {"name": "John", "email": "john@example.com"}

Key: "session:abc123"
Value: {"user_id": 123, "expires_at": "2024-01-20T10:00:00Z"}

Key: "cache:product:456"
Value: {"name": "Widget", "price": 29.99, "stock": 100}

Operations:

  • GET key - Retrieve value
  • SET key value - Store value
  • DELETE key - Remove key
  • EXISTS key - Check if key exists

Characteristics

Simplicity

  • Minimal data model
  • Fast operations (O(1) lookup)
  • Easy to understand and use

Performance

  • Extremely fast reads and writes
  • Low latency
  • High throughput

Limitations

  • No complex queries (no WHERE clauses, JOINs)
  • No relationships between keys
  • Value is opaque (database doesn't understand structure)

Redis

In-memory data structure store, supports various data types.

# Strings
SET user:123:name "John"
GET user:123:name

# Hashes
HSET user:123 name "John" email "john@example.com"
HGETALL user:123

# Lists
LPUSH notifications:123 "New message"
LRANGE notifications:123 0 -1

# Sets
SADD tags:post:1 "tech" "programming"
SMEMBERS tags:post:1

# Sorted Sets
ZADD leaderboard 100 "player1"
ZRANGE leaderboard 0 -1 WITHSCORES

Features:

  • Persistence options (RDB, AOF)
  • Pub/Sub messaging
  • Lua scripting
  • Expiration (TTL)

Memcached

Simple in-memory caching system.

# Python example
import memcache
mc = memcache.Client(['127.0.0.1:11211'])
mc.set('user:123', {'name': 'John'})
value = mc.get('user:123')

Characteristics:

  • Pure caching (no persistence)
  • Distributed
  • Simple protocol

DynamoDB (AWS)

Managed key-value and document database.

// Put item
await dynamodb.put({
  TableName: 'Users',
  Item: {
    userId: '123',
    name: 'John',
    email: 'john@example.com'
  }
});

// Get item
const result = await dynamodb.get({
  TableName: 'Users',
  Key: { userId: '123' }
});

Features:

  • Fully managed
  • Auto-scaling
  • Global tables (multi-region)
  • Streams for change data capture

etcd

Distributed key-value store for configuration and service discovery.

Use cases:

  • Kubernetes (stores cluster state)
  • Service discovery
  • Distributed locking
  • Configuration management

Use Cases

Caching

Store frequently accessed data to reduce database load.

Key: "cache:user:123"
Value: User object (JSON)
TTL: 3600 seconds

Benefits:

  • Reduce database queries
  • Faster response times
  • Lower database load

Session Storage

Store user session data.

Key: "session:abc123def456"
Value: {"user_id": 123, "login_time": "2024-01-15T10:00:00Z"}
TTL: 1800 seconds (30 minutes)

Rate Limiting

Track API request counts.

Key: "ratelimit:api:user:123"
Value: 42 (request count)
TTL: 60 seconds (resets every minute)

Leaderboards

Real-time rankings.

ZADD leaderboard 1000 "player1"
ZADD leaderboard 950 "player2"
ZADD leaderboard 1100 "player3"
ZREVRANGE leaderboard 0 9 WITHSCORES  # Top 10

Real-Time Features

  • Counters: Like counts, view counts
  • Presence: Who's online
  • Queues: Task queues, message queues
  • Pub/Sub: Real-time notifications

Data Modeling Patterns

Namespacing

Use prefixes to organize keys:

user:123:profile
user:123:settings
user:123:preferences

session:abc123
session:def456

cache:product:789
cache:product:790

Composite Keys

Combine multiple values into a key:

order:123:item:456  # Order 123, item 456
user:123:friend:789  # User 123's friend 789

Serialization

Store complex data by serializing:

# JSON
import json
value = json.dumps({"name": "John", "age": 30})
redis.set("user:123", value)

# MessagePack (more efficient)
import msgpack
value = msgpack.packb({"name": "John", "age": 30})
redis.set("user:123", value)

Advanced Features

Expiration (TTL)

Automatically delete keys after a time period.

SET session:abc123 "data" EX 3600  # Expires in 3600 seconds
SET session:def456 "data" PX 3600000  # Expires in 3600000 milliseconds

Atomic Operations

# Increment
INCR page_views:article:123
INCRBY counter:123 5

# Compare and swap
SET key "old_value"
GETSET key "new_value"  # Returns old value, sets new

# Conditional set
SETNX key "value"  # Only set if key doesn't exist

Transactions

MULTI
SET key1 "value1"
SET key2 "value2"
INCR counter
EXEC  # Execute all commands atomically

When to Use Key-Value Stores

Good Fit

  • Caching: Reduce database load
  • Session storage: Fast session lookups
  • Real-time data: Counters, leaderboards
  • Simple lookups: By ID or known key
  • Temporary data: Data with expiration

Not a Good Fit

  • Complex queries: Need WHERE, JOIN, GROUP BY
  • Relationships: Data with foreign keys
  • Analytics: Complex aggregations
  • Structured queries: Ad-hoc reporting

Best Practices

  1. Use appropriate TTL: Set expiration for temporary data
  2. Namespace keys: Organize with prefixes
  3. Monitor memory: In-memory stores have size limits
  4. Handle failures: Key-value stores can be ephemeral
  5. Choose serialization: JSON is readable, MessagePack is efficient

Common Patterns

Cache-Aside Pattern

def get_user(user_id):
    # Try cache first
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    
    # Cache miss: query database
    user = db.query_user(user_id)
    
    # Store in cache
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    return user

Write-Through Pattern

def update_user(user_id, data):
    # Update database
    db.update_user(user_id, data)
    
    # Update cache
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))

Distributed Locking

def acquire_lock(lock_key, timeout=10):
    lock_value = str(uuid.uuid4())
    if redis.set(lock_key, lock_value, nx=True, ex=timeout):
        return lock_value
    return None

def release_lock(lock_key, lock_value):
    # Only release if we own the lock
    if redis.get(lock_key) == lock_value:
        redis.delete(lock_key)

Interview Questions

1. Beginner Question

Q: What is a key-value store, and what are its main use cases?

A: A key-value store is the simplest NoSQL database model, storing data as key-value pairs with fast lookup capabilities.

Main use cases:

  • Caching: Store frequently accessed data to reduce database load
  • Session storage: Fast session lookups for web applications
  • Real-time data: Counters, leaderboards, presence indicators
  • Rate limiting: Track API request counts
  • Simple lookups: By ID or known key

Example:

# Caching user data
redis.set("user:123", json.dumps({"name": "John", "email": "john@example.com"}))
user = json.loads(redis.get("user:123"))

Why use it: Extremely fast (O(1) lookup), low latency, high throughput.

2. Intermediate Question

Q: Explain the cache-aside pattern and when to use it vs. write-through.

A:

Cache-Aside (Lazy Loading):

def get_user(user_id):
    # 1. Check cache
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)
    
    # 2. Cache miss: query database
    user = db.query_user(user_id)
    
    # 3. Store in cache
    redis.setex(f"user:{user_id}", 3600, json.dumps(user))
    return user

Pros: Simple, only caches accessed data Cons: Cache miss penalty, possible stale data

Write-Through:

def update_user(user_id, data):
    # 1. Update database
    db.update_user(user_id, data)
    
    # 2. Update cache
    redis.setex(f"user:{user_id}", 3600, json.dumps(data))

Pros: Cache always consistent with database Cons: Write penalty (updates both cache and DB)

When to use:

  • Cache-aside: Read-heavy workloads, can tolerate stale data
  • Write-through: Write-heavy, need strong consistency

3. Senior-Level System Question

Q: Design a distributed rate limiting system using Redis that can handle 10M requests/second across 100 servers. How would you prevent a single user from exceeding their rate limit?

A:

Solution: Token bucket algorithm with Redis:

def check_rate_limit(user_id, limit=100, window=60):
    key = f"ratelimit:{user_id}"
    now = time.time()
    
    # Use Redis pipeline for atomic operations
    pipe = redis.pipeline()
    
    # Remove old entries
    pipe.zremrangebyscore(key, 0, now - window)
    
    # Count current requests
    pipe.zcard(key)
    
    # Add current request
    pipe.zadd(key, {str(now): now})
    
    # Set expiration
    pipe.expire(key, window)
    
    results = pipe.execute()
    current_count = results[1]
    
    if current_count >= limit:
        return False, limit - current_count  # Rate limited
    
    return True, limit - current_count - 1  # Allowed

Alternative: Sliding window log:

def check_rate_limit_sliding(user_id, limit=100, window=60):
    key = f"ratelimit:{user_id}"
    now = time.time()
    
    # Remove requests outside window
    redis.zremrangebyscore(key, 0, now - window)
    
    # Count requests in window
    count = redis.zcard(key)
    
    if count >= limit:
        return False
    
    # Add current request
    redis.zadd(key, {str(now): now})
    redis.expire(key, window)
    return True

Distributed approach (multiple servers):

# Use Redis with consistent hashing for sharding
def check_rate_limit_distributed(user_id, limit=100, window=60):
    # Hash user_id to determine Redis shard
    shard = hash(user_id) % num_redis_shards
    redis_client = redis_shards[shard]
    
    # Use same algorithm on sharded Redis
    return check_rate_limit(user_id, limit, window, redis_client)

Optimizations:

  • Lua scripts: Execute atomically on Redis server
  • Sharding: Distribute load across multiple Redis instances
  • Local cache: Cache rate limit status locally to reduce Redis calls
  • Batch updates: Batch multiple checks in pipeline

Monitoring:

  • Track rate limit hits/misses
  • Monitor Redis memory usage
  • Alert on high rate limit violations

Key Takeaways

  • Key-value stores are the simplest NoSQL model—extremely fast O(1) lookups
  • Primary use cases: Caching, session storage, real-time data, rate limiting
  • Cache-aside pattern loads data on cache miss, good for read-heavy workloads
  • Write-through pattern updates cache and DB together, ensures consistency
  • Redis is the most popular key-value store, supports many data structures
  • TTL (time-to-live) is essential for temporary data (sessions, cache)
  • Namespace keys with prefixes (e.g., "user:123") for organization
  • Distributed locking using Redis SETNX for coordination across servers
  • Not for complex queries—use for simple lookups, not JOINs or aggregations
  • Memory management is critical—monitor memory usage and set eviction policies
  • Persistence options in Redis (RDB snapshots, AOF) for durability
  • Atomic operations (INCR, SETNX) are powerful for counters and locks

Keep exploring

Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.