Topic Overview
Database Sharding
Master database sharding strategies: horizontal partitioning, shard key selection, cross-shard queries, rebalancing, and handling failures in sharded systems.
Database sharding is a horizontal partitioning strategy that splits a database into smaller, independent databases called shards. Each shard contains a subset of the data and can be stored on different servers, enabling horizontal scaling.
What is Sharding?
Sharding divides a large database into smaller, manageable pieces (shards) distributed across multiple servers.
Benefits:
- Horizontal scaling: Add more servers to scale
- Performance: Smaller databases = faster queries
- Availability: Failure of one shard doesn't affect others
- Geographic distribution: Shards can be in different regions
Challenges:
- Cross-shard queries: Complex queries spanning multiple shards
- Rebalancing: Moving data when adding/removing shards
- Transaction management: Distributed transactions are complex
- Shard key selection: Critical for even distribution
Sharding Strategies
1. Range-Based Sharding
Concept: Partition data based on value ranges.
Shard 1: User IDs 1-1000
Shard 2: User IDs 1001-2000
Shard 3: User IDs 2001-3000
Example:
-- Shard 1
SELECT * FROM users WHERE id BETWEEN 1 AND 1000;
-- Shard 2
SELECT * FROM users WHERE id BETWEEN 1001 AND 2000;
Advantages:
- Simple to implement
- Easy range queries
Disadvantages:
- Hot spots: Uneven distribution if data not uniform
- Rebalancing: Difficult when ranges change
2. Hash-Based Sharding
Concept: Hash shard key to determine shard.
hash(user_id) % num_shards = shard_number
User ID 123 → hash(123) % 3 = 0 → Shard 0
User ID 456 → hash(456) % 3 = 1 → Shard 1
User ID 789 → hash(789) % 3 = 2 → Shard 2
Example:
def get_shard(user_id, num_shards):
shard_number = hash(user_id) % num_shards
return f"shard_{shard_number}"
# Usage
shard = get_shard(123, 3) # Returns "shard_0"
Advantages:
- Even distribution: Hash function distributes evenly
- No hot spots: Uniform data distribution
Disadvantages:
- Range queries: Difficult (must query all shards)
- Rebalancing: Requires moving many keys
3. Directory-Based Sharding
Concept: Use lookup table (directory) to map keys to shards.
Directory:
user_123 → shard_1
user_456 → shard_2
user_789 → shard_1
Example:
class ShardDirectory:
def __init__(self):
self.directory = {} # key → shard
def get_shard(self, key):
return self.directory.get(key)
def assign_shard(self, key, shard):
self.directory[key] = shard
Advantages:
- Flexible: Easy to reassign shards
- No rebalancing: Just update directory
Disadvantages:
- Single point of failure: Directory must be highly available
- Lookup overhead: Extra lookup for each query
Shard Key Selection
Shard key determines which shard stores the data.
Good Shard Keys
Characteristics:
- High cardinality: Many unique values
- Even distribution: Uniform across shards
- Frequently queried: Used in WHERE clauses
- Immutable: Doesn't change
Examples:
- User ID (good: high cardinality, even distribution)
- Order ID (good: unique, frequently queried)
- Timestamp (bad: can cause hot spots)
Bad Shard Keys
Problems:
- Low cardinality: Few unique values (e.g., country, status)
- Skewed distribution: Uneven data (e.g., 90% in one shard)
- Mutable: Changes require moving data
- Not in queries: Requires scanning all shards
Cross-Shard Queries
Problem: Queries spanning multiple shards are complex.
Solutions
1. Avoid Cross-Shard Queries
- Design schema to minimize cross-shard queries
- Denormalize data if needed
2. Query All Shards (Scatter-Gather)
async def query_all_shards(query):
results = await Promise.all([
shard1.query(query),
shard2.query(query),
shard3.query(query)
])
return merge_results(results)
3. Use Composite Shard Key
- Include related data in same shard
- Example: User + User's Orders in same shard
4. Materialized Views
- Pre-compute cross-shard aggregations
- Update periodically
Examples
Hash-Based Sharding Implementation
class ShardedDatabase:
def __init__(self, num_shards=4):
self.num_shards = num_shards
self.shards = [Database(f"shard_{i}") for i in range(num_shards)]
def get_shard(self, shard_key):
"""Get shard for given key"""
shard_index = hash(shard_key) % self.num_shards
return self.shards[shard_index]
def insert(self, record):
"""Insert record into appropriate shard"""
shard = self.get_shard(record['user_id'])
shard.insert(record)
def get(self, user_id):
"""Get record from appropriate shard"""
shard = self.get_shard(user_id)
return shard.get(user_id)
def query_all(self, query):
"""Query all shards and merge results"""
results = []
for shard in self.shards:
results.extend(shard.query(query))
return results
Rebalancing
class ShardRebalancer:
def __init__(self, shards):
self.shards = shards
async def rebalance(self, new_num_shards):
"""Rebalance data when adding/removing shards"""
# 1. Create new shard distribution
new_shards = self.create_new_shards(new_num_shards)
# 2. Migrate data
for old_shard in self.shards:
for record in old_shard.scan():
new_shard = self.get_new_shard(record.key, new_num_shards)
if new_shard != old_shard:
await self.migrate_record(record, old_shard, new_shard)
# 3. Update routing
self.update_routing(new_shards)
# 4. Verify and cleanup
await self.verify_migration()
self.cleanup_old_shards()
async def migrate_record(self, record, from_shard, to_shard):
"""Migrate single record"""
# Copy to new shard
await to_shard.insert(record)
# Verify
if await to_shard.get(record.key):
# Delete from old shard
await from_shard.delete(record.key)
Cross-Shard Transaction (Saga Pattern)
class CrossShardTransaction:
async def transfer_money(self, from_user, to_user, amount):
"""Transfer money across shards"""
from_shard = self.get_shard(from_user)
to_shard = self.get_shard(to_user)
if from_shard == to_shard:
# Same shard: Simple transaction
return await from_shard.transaction(
debit(from_user, amount),
credit(to_user, amount)
)
else:
# Cross-shard: Use saga pattern
return await self.saga_transfer(from_shard, to_shard, from_user, to_user, amount)
async def saga_transfer(self, from_shard, to_shard, from_user, to_user, amount):
"""Saga pattern for cross-shard transaction"""
try:
# Step 1: Debit from source
await from_shard.debit(from_user, amount)
# Step 2: Credit to destination
await to_shard.credit(to_user, amount)
return {'status': 'success'}
except Exception as e:
# Compensate: Rollback
await from_shard.credit(from_user, amount)
raise e
Common Pitfalls
- Poor shard key selection: Causes hot spots, uneven distribution. Fix: Choose high cardinality, evenly distributed keys
- Cross-shard queries: Slow, complex. Fix: Design schema to minimize, use composite keys
- Rebalancing issues: Data movement is expensive. Fix: Plan for growth, use consistent hashing
- No replication: Single shard failure loses data. Fix: Replicate each shard
- Transaction complexity: Distributed transactions are hard. Fix: Use saga pattern, eventual consistency
- Shard discovery: Clients don't know which shard. Fix: Use shard router, directory service
Interview Questions
Beginner
Q: What is database sharding and why is it used?
A:
Database sharding splits a large database into smaller databases (shards) distributed across multiple servers.
Why used:
- Horizontal scaling: Add more servers to scale (vs vertical scaling)
- Performance: Smaller databases = faster queries
- Availability: Failure of one shard doesn't affect others
- Geographic distribution: Shards can be in different regions
How it works:
Original Database:
[All data in one database]
Sharded Database:
Shard 1: [Data subset 1]
Shard 2: [Data subset 2]
Shard 3: [Data subset 3]
Sharding strategies:
- Range-based: Partition by value ranges
- Hash-based: Hash key to determine shard
- Directory-based: Lookup table maps keys to shards
Example:
Users table sharded by user_id:
Shard 1: user_id 1-1000
Shard 2: user_id 1001-2000
Shard 3: user_id 2001-3000
Intermediate
Q: Explain different sharding strategies. What are the trade-offs?
A:
1. Range-Based Sharding:
Shard 1: IDs 1-1000
Shard 2: IDs 1001-2000
Shard 3: IDs 2001-3000
Advantages:
- Simple to implement
- Easy range queries
- Good for sequential data
Disadvantages:
- Hot spots: Uneven distribution
- Rebalancing: Difficult when ranges change
2. Hash-Based Sharding:
hash(key) % num_shards = shard_number
Advantages:
- Even distribution: Hash function distributes evenly
- No hot spots: Uniform data distribution
Disadvantages:
- Range queries: Must query all shards
- Rebalancing: Requires moving many keys
3. Directory-Based Sharding:
Directory: key → shard mapping
Advantages:
- Flexible: Easy to reassign shards
- No rebalancing: Just update directory
Disadvantages:
- Single point of failure: Directory must be highly available
- Lookup overhead: Extra lookup for each query
When to use:
- Range-based: Sequential data, range queries common
- Hash-based: Even distribution needed, no range queries
- Directory-based: Flexible shard assignment needed
Senior
Q: Design a sharded database system that handles 1 billion records across 100 shards. How do you handle shard key selection, rebalancing, cross-shard queries, and failures?
A:
class ShardedDatabaseSystem {
private shards: Shard[];
private shardRouter: ShardRouter;
private rebalancer: Rebalancer;
private replication: Replication;
constructor() {
this.shards = this.initializeShards(100);
this.shardRouter = new ShardRouter();
this.rebalancer = new Rebalancer();
this.replication = new Replication({ replicas: 3 });
}
// 1. Shard Key Selection
selectShardKey(record: Record): string {
// Composite shard key: user_id + region
// Ensures related data in same shard
return `${record.user_id}:${record.region}`;
}
// 2. Shard Router
class ShardRouter {
private consistentHash: ConsistentHash;
getShard(shardKey: string): Shard {
// Use consistent hashing for even distribution
return this.consistentHash.getNode(shardKey);
}
async routeQuery(query: Query): Promise<Result> {
if (query.isSingleShard()) {
// Single shard query
const shard = this.getShard(query.shardKey);
return await shard.query(query);
} else {
// Cross-shard query: Query all shards
return await this.scatterGather(query);
}
}
async scatterGather(query: Query): Promise<Result> {
// Query all shards in parallel
const results = await Promise.all(
this.shards.map(shard => shard.query(query))
);
// Merge results
return this.mergeResults(results);
}
}
// 3. Rebalancing
class Rebalancer {
async rebalance(newNumShards: number): Promise<void> {
// Use consistent hashing: Only keys near changed nodes move
const newShards = this.createNewShards(newNumShards);
// Identify keys that need to move
const keysToMove = this.identifyKeysToMove(newShards);
// Migrate in batches
for (const batch of this.batchKeys(keysToMove, 1000)) {
await this.migrateBatch(batch);
}
// Update routing
this.updateRouting(newShards);
}
async migrateBatch(keys: string[]): Promise<void> {
// Migrate batch of keys
for (const key of keys) {
const oldShard = this.getShard(key);
const newShard = this.getNewShard(key);
if (oldShard !== newShard) {
// Copy data
const data = await oldShard.get(key);
await newShard.insert(key, data);
// Verify
if (await newShard.get(key)) {
await oldShard.delete(key);
}
}
}
}
}
// 4. Cross-Shard Queries
class CrossShardQueryHandler {
async handleQuery(query: Query): Promise<Result> {
if (query.isSingleShard()) {
return await this.singleShardQuery(query);
}
// Cross-shard query strategies
switch (query.type) {
case 'aggregation':
return await this.aggregateAcrossShards(query);
case 'join':
return await this.joinAcrossShards(query);
case 'range':
return await this.rangeQuery(query);
}
}
async aggregateAcrossShards(query: Query): Promise<Result> {
// Scatter: Query all shards
const partialResults = await Promise.all(
this.shards.map(shard => shard.aggregate(query))
);
// Gather: Merge aggregations
return this.mergeAggregations(partialResults);
}
async joinAcrossShards(query: Query): Promise<Result> {
// Strategy 1: Broadcast small table
if (query.rightTable.size < 10000) {
return await this.broadcastJoin(query);
}
// Strategy 2: Repartition both tables
return await this.repartitionJoin(query);
}
}
// 5. Failure Handling
class FailureHandler {
async handleShardFailure(failedShard: Shard): Promise<void> {
// 1. Detect failure
if (!await this.healthCheck(failedShard)) {
// 2. Route traffic to replicas
const replicas = this.replication.getReplicas(failedShard);
this.shardRouter.updateRouting(failedShard, replicas[0]);
// 3. Promote replica to primary
await replicas[0].promoteToPrimary();
// 4. Replicate to new replica
await this.replication.createNewReplica(replicas[0]);
}
}
}
// 6. Replication
class Replication {
async replicate(shard: Shard, data: Record): Promise<void> {
const replicas = this.getReplicas(shard);
// Write to all replicas
await Promise.all(
replicas.map(replica => replica.write(data))
);
}
async read(shard: Shard, key: string): Promise<Record> {
// Read from primary or replica
const node = this.selectReadNode(shard);
return await node.read(key);
}
}
// 7. Monitoring
async getMetrics(): Promise<Metrics> {
return {
shardLoads: await this.getShardLoads(),
queryLatency: await this.getQueryLatency(),
rebalancingStatus: await this.rebalancer.getStatus(),
replicationLag: await this.replication.getLag()
};
}
}
Features:
- Consistent hashing: Even distribution, minimal rebalancing
- Composite shard keys: Related data in same shard
- Rebalancing: Batch migration, minimal downtime
- Cross-shard queries: Scatter-gather, aggregation, joins
- Replication: 3 replicas per shard for high availability
- Failure handling: Automatic failover, replica promotion
- Monitoring: Track load, latency, replication lag
Key Takeaways
- Sharding: Horizontal partitioning of database across multiple servers
- Strategies: Range-based, hash-based, directory-based
- Shard key: Critical for even distribution, choose high cardinality keys
- Cross-shard queries: Complex, use scatter-gather, avoid when possible
- Rebalancing: Expensive, use consistent hashing to minimize movement
- Replication: Replicate each shard for high availability
- Failure handling: Automatic failover, replica promotion
- Best practices: Choose good shard keys, minimize cross-shard queries, plan for rebalancing, replicate for availability