Topic Overview
Load Balancing for System Design Interviews
Interview-focused load balancing: L4 vs L7, health checks, algorithms, and scaling/reliability trade-offs.
Load balancing is the practice of distributing incoming network traffic across multiple servers to ensure no single server becomes overwhelmed. It's fundamental to building scalable, high-availability systems.
What is Load Balancing?
A load balancer sits between clients and servers, acting as a reverse proxy that distributes requests. It improves:
- Performance: Distributes load to prevent bottlenecks
- Availability: Routes traffic away from failed servers
- Scalability: Allows horizontal scaling by adding more servers
Load Balancing Algorithms
Round Robin
Distributes requests sequentially across servers in rotation.
Use case: When all servers have similar capacity and requests are stateless.
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)
Least Connections
Routes to the server with the fewest active connections.
Use case: When requests have varying processing times (e.g., file uploads, long-running queries).
Weighted Round Robin
Round robin with weights assigned to servers based on capacity.
Use case: When servers have different hardware specs (e.g., 2x weight for servers with 2x CPU).
IP Hash
Hashes client IP to consistently route to the same server.
Use case: When you need session affinity (sticky sessions).
Least Response Time
Routes to the server with the lowest average response time.
Use case: When server performance varies dynamically.
Types of Load Balancers
Layer 4 (Transport Layer)
Operates at TCP/UDP level. Faster, but less intelligent.
Pros: Low latency, high throughput, simple Cons: Can't inspect HTTP content, limited routing options
Layer 7 (Application Layer)
Operates at HTTP/HTTPS level. More intelligent routing.
Pros: Content-aware routing, SSL termination, path-based routing Cons: Higher latency, more CPU intensive
Health Checks
Load balancers continuously check server health:
// Health check configuration
interface HealthCheck {
protocol: 'HTTP' | 'TCP' | 'HTTPS';
path: '/health';
interval: 30; // seconds
timeout: 5; // seconds
healthyThreshold: 2; // consecutive successes
unhealthyThreshold: 3; // consecutive failures
}
Common health check endpoints:
/health- Basic liveness check/health/ready- Readiness check (dependencies available)/health/live- Liveness check (process running)
Session Persistence (Sticky Sessions)
Some applications require a client to always hit the same server.
Methods:
- Cookie-based: Load balancer sets a cookie with server identifier
- IP-based: Hash client IP (problematic with NAT/proxies)
- Application-level: Store session in shared cache (Redis)
Trade-off: Sticky sessions reduce flexibility but may be necessary for stateful applications.
Examples
Simple Load Balancer Design
class LoadBalancer {
private servers: Server[];
private algorithm: LoadBalancingAlgorithm;
private healthChecker: HealthChecker;
constructor(servers: Server[], algorithm: LoadBalancingAlgorithm) {
this.servers = servers;
this.algorithm = algorithm;
this.healthChecker = new HealthChecker(servers);
}
async routeRequest(request: Request): Promise<Response> {
const healthyServers = this.healthChecker.getHealthyServers();
if (healthyServers.length === 0) {
throw new Error('No healthy servers available');
}
const selectedServer = this.algorithm.selectServer(
healthyServers,
request
);
return await this.forwardRequest(selectedServer, request);
}
private async forwardRequest(
server: Server,
request: Request
): Promise<Response> {
try {
return await fetch(`http://${server.host}:${server.port}${request.path}`, {
method: request.method,
body: request.body,
headers: request.headers,
});
} catch (error) {
this.healthChecker.markUnhealthy(server);
throw error;
}
}
}
AWS Application Load Balancer (ALB) Configuration
# ALB with health checks and multiple target groups
Resources:
ApplicationLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Type: application
Scheme: internet-facing
Subnets: [subnet-1, subnet-2]
SecurityGroups: [sg-12345]
Listeners:
- Protocol: HTTPS
Port: 443
DefaultActions:
- Type: forward
TargetGroupArn: !Ref TargetGroup
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Port: 8080
Protocol: HTTP
VpcId: vpc-12345
HealthCheckPath: /health
HealthCheckIntervalSeconds: 30
HealthyThresholdCount: 2
UnhealthyThresholdCount: 3
Common Pitfalls
- Not implementing health checks: Traffic may route to dead servers, causing failures
- Using sticky sessions unnecessarily: Reduces flexibility and can cause uneven load distribution
- Ignoring server capacity differences: Use weighted algorithms when servers have different specs
- Single point of failure: Load balancer itself must be highly available (use multiple instances)
- Not monitoring metrics: Track request latency, error rates, and server utilization
- Layer 4 for complex routing: Use Layer 7 when you need content-aware routing or SSL termination
- Ignoring geographic distribution: Use DNS-based load balancing for global traffic
Interview Questions
Beginner
Q: What is load balancing and why is it important?
A: Load balancing distributes incoming network traffic across multiple servers to prevent any single server from becoming overwhelmed. It's important because it:
- Improves performance by distributing load
- Increases availability by routing traffic away from failed servers
- Enables horizontal scaling by adding more servers as needed
- Provides redundancy and fault tolerance
Intermediate
Q: Design a load balancer that handles 1 million requests per second. What algorithms would you use and how would you ensure high availability?
A:
Architecture:
- Multiple load balancer instances (active-active) behind DNS round-robin
- Layer 4 load balancers for initial distribution (low latency)
- Layer 7 load balancers behind Layer 4 for intelligent routing
- Consistent hashing for server selection to minimize cache misses
Algorithms:
- Primary: Least connections (handles varying request times)
- Fallback: Weighted round-robin (accounts for server capacity)
- Session affinity: IP hash with consistent hashing ring
High Availability:
- Multiple LB instances in different availability zones
- Health checks every 5 seconds with 2 healthy/3 unhealthy thresholds
- Automatic failover using DNS or BGP anycast
- Circuit breakers to quickly remove unhealthy servers
- Rate limiting to prevent overload
- Monitoring with real-time alerts on latency/error spikes
Scaling:
- Use distributed load balancers (e.g., AWS ALB, Google Cloud Load Balancer)
- Implement connection pooling
- Cache health check results (update every 5s, not per-request)
- Use hardware load balancers or specialized software (Envoy, HAProxy)
Senior
Q: Design a global load balancing system for a video streaming service with users in North America, Europe, and Asia. How do you handle latency, failover, and data consistency?
A:
Multi-tier Architecture:
Tier 1: DNS-based Global Load Balancing (GeoDNS)
- Route users to nearest region based on DNS resolver location
- Use anycast IPs for lowest latency
- TTL: 60 seconds for fast failover
Tier 2: Regional Load Balancers
- Each region (NA, EU, Asia) has its own load balancer cluster
- Active-active across multiple data centers in region
- Health checks between regions for cross-region failover
Tier 3: Application Load Balancers
- Layer 7 load balancers within each region
- Route based on:
- Geographic proximity (latency-based routing)
- Server health and capacity
- Content availability (some content region-specific)
Latency Optimization:
- Edge locations (CDN): Cache popular videos at edge
- Regional data centers: Route to nearest DC within region
- Anycast routing: Multiple IPs, BGP routes to nearest
- Pre-warming: Pre-connect to likely servers
- Adaptive routing: Monitor latency, dynamically adjust
Failover Strategy:
- Health checks: Multi-level (DNS → Regional LB → App LB → Server)
- Automatic failover:
- Regional: DNS fails over to next nearest region (within 30s)
- Data center: Regional LB fails over to backup DC (within 10s)
- Server: App LB removes unhealthy servers (within 5s)
- Graceful degradation: If region fails, route to next region with higher latency
- Circuit breakers: Prevent cascading failures
Data Consistency:
- Metadata: Strongly consistent (user preferences, watch history) via global database with regional replicas
- Video content: Eventually consistent (CDN propagation)
- Session data: Stored in distributed cache (Redis) with regional replication
- User authentication: Centralized with regional caches
Monitoring:
- Real-time latency dashboards per region
- Error rates and availability metrics
- Traffic patterns and capacity planning
- Automated alerts for anomalies
Example Flow:
User in Tokyo → GeoDNS → Routes to Asia region
→ Regional LB (Tokyo DC) → App LB → Video Server
→ If Tokyo DC fails → Regional LB → Singapore DC
→ If entire Asia region fails → GeoDNS → Routes to US West (with latency warning)
- Load balancing distributes traffic across multiple servers to improve performance, availability, and scalability
- Choose algorithm based on use case: Round-robin for equal servers, least connections for varying workloads, weighted for different capacities
- Layer 4 vs Layer 7: Layer 4 is faster but less intelligent; Layer 7 enables content-aware routing
- Health checks are critical: Continuously monitor server health and automatically remove unhealthy servers
- Session persistence: Use sticky sessions only when necessary (stateful apps); prefer stateless with shared session storage
- High availability: Load balancer itself must be highly available (multiple instances, automatic failover)
- Monitor metrics: Track latency, error rates, server utilization, and connection counts
- Global load balancing: Use DNS-based routing for geographic distribution with regional failover
- Scalability: Design for horizontal scaling; load balancer should handle adding/removing servers dynamically
What's next?