Topic Overview
Load Balancing
Distribute incoming requests across multiple servers to improve performance, reliability, and scalability.
Load balancing is the practice of distributing incoming network traffic across multiple servers to ensure no single server becomes overwhelmed. It's fundamental to building scalable, high-availability systems.
What is Load Balancing?
A load balancer sits between clients and servers, acting as a reverse proxy that distributes requests. It improves:
- Performance: Distributes load to prevent bottlenecks
- Availability: Routes traffic away from failed servers
- Scalability: Allows horizontal scaling by adding more servers
Load Balancing Algorithms
Round Robin
Distributes requests sequentially across servers in rotation.
Use case: When all servers have similar capacity and requests are stateless.
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)
Least Connections
Routes to the server with the fewest active connections.
Use case: When requests have varying processing times (e.g., file uploads, long-running queries).
Weighted Round Robin
Round robin with weights assigned to servers based on capacity.
Use case: When servers have different hardware specs (e.g., 2x weight for servers with 2x CPU).
IP Hash
Hashes client IP to consistently route to the same server.
Use case: When you need session affinity (sticky sessions).
Least Response Time
Routes to the server with the lowest average response time.
Use case: When server performance varies dynamically.
Types of Load Balancers
Layer 4 (Transport Layer)
Operates at TCP/UDP level. Faster, but less intelligent.
Pros: Low latency, high throughput, simple Cons: Can't inspect HTTP content, limited routing options
Layer 7 (Application Layer)
Operates at HTTP/HTTPS level. More intelligent routing.
Pros: Content-aware routing, SSL termination, path-based routing Cons: Higher latency, more CPU intensive
Health Checks
Load balancers continuously check server health:
// Health check configuration
interface HealthCheck {
protocol: 'HTTP' | 'TCP' | 'HTTPS';
path: '/health';
interval: 30; // seconds
timeout: 5; // seconds
healthyThreshold: 2; // consecutive successes
unhealthyThreshold: 3; // consecutive failures
}
Common health check endpoints:
/health- Basic liveness check/health/ready- Readiness check (dependencies available)/health/live- Liveness check (process running)
Session Persistence (Sticky Sessions)
Some applications require a client to always hit the same server.
Methods:
- Cookie-based: Load balancer sets a cookie with server identifier
- IP-based: Hash client IP (problematic with NAT/proxies)
- Application-level: Store session in shared cache (Redis)
Trade-off: Sticky sessions reduce flexibility but may be necessary for stateful applications.
Examples
Simple Load Balancer Design
class LoadBalancer {
private servers: Server[];
private algorithm: LoadBalancingAlgorithm;
private healthChecker: HealthChecker;
constructor(servers: Server[], algorithm: LoadBalancingAlgorithm) {
this.servers = servers;
this.algorithm = algorithm;
this.healthChecker = new HealthChecker(servers);
}
async routeRequest(request: Request): Promise<Response> {
const healthyServers = this.healthChecker.getHealthyServers();
if (healthyServers.length === 0) {
throw new Error('No healthy servers available');
}
const selectedServer = this.algorithm.selectServer(
healthyServers,
request
);
return await this.forwardRequest(selectedServer, request);
}
private async forwardRequest(
server: Server,
request: Request
): Promise<Response> {
try {
return await fetch(`http://${server.host}:${server.port}${request.path}`, {
method: request.method,
body: request.body,
headers: request.headers,
});
} catch (error) {
this.healthChecker.markUnhealthy(server);
throw error;
}
}
}
AWS Application Load Balancer (ALB) Configuration
# ALB with health checks and multiple target groups
Resources:
ApplicationLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Type: application
Scheme: internet-facing
Subnets: [subnet-1, subnet-2]
SecurityGroups: [sg-12345]
Listeners:
- Protocol: HTTPS
Port: 443
DefaultActions:
- Type: forward
TargetGroupArn: !Ref TargetGroup
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Port: 8080
Protocol: HTTP
VpcId: vpc-12345
HealthCheckPath: /health
HealthCheckIntervalSeconds: 30
HealthyThresholdCount: 2
UnhealthyThresholdCount: 3
Common Pitfalls
- Not implementing health checks: Traffic may route to dead servers, causing failures
- Using sticky sessions unnecessarily: Reduces flexibility and can cause uneven load distribution
- Ignoring server capacity differences: Use weighted algorithms when servers have different specs
- Single point of failure: Load balancer itself must be highly available (use multiple instances)
- Not monitoring metrics: Track request latency, error rates, and server utilization
- Layer 4 for complex routing: Use Layer 7 when you need content-aware routing or SSL termination
- Ignoring geographic distribution: Use DNS-based load balancing for global traffic
Interview Questions
Beginner
Q: What is load balancing and why is it important?
A: Load balancing distributes incoming network traffic across multiple servers to prevent any single server from becoming overwhelmed. It's important because it:
- Improves performance by distributing load
- Increases availability by routing traffic away from failed servers
- Enables horizontal scaling by adding more servers as needed
- Provides redundancy and fault tolerance
Intermediate
Q: Design a load balancer that handles 1 million requests per second. What algorithms would you use and how would you ensure high availability?
A:
Architecture:
- Multiple load balancer instances (active-active) behind DNS round-robin
- Layer 4 load balancers for initial distribution (low latency)
- Layer 7 load balancers behind Layer 4 for intelligent routing
- Consistent hashing for server selection to minimize cache misses
Algorithms:
- Primary: Least connections (handles varying request times)
- Fallback: Weighted round-robin (accounts for server capacity)
- Session affinity: IP hash with consistent hashing ring
High Availability:
- Multiple LB instances in different availability zones
- Health checks every 5 seconds with 2 healthy/3 unhealthy thresholds
- Automatic failover using DNS or BGP anycast
- Circuit breakers to quickly remove unhealthy servers
- Rate limiting to prevent overload
- Monitoring with real-time alerts on latency/error spikes
Scaling:
- Use distributed load balancers (e.g., AWS ALB, Google Cloud Load Balancer)
- Implement connection pooling
- Cache health check results (update every 5s, not per-request)
- Use hardware load balancers or specialized software (Envoy, HAProxy)
Senior
Q: Design a global load balancing system for a video streaming service with users in North America, Europe, and Asia. How do you handle latency, failover, and data consistency?
A:
Multi-tier Architecture:
Tier 1: DNS-based Global Load Balancing (GeoDNS)
- Route users to nearest region based on DNS resolver location
- Use anycast IPs for lowest latency
- TTL: 60 seconds for fast failover
Tier 2: Regional Load Balancers
- Each region (NA, EU, Asia) has its own load balancer cluster
- Active-active across multiple data centers in region
- Health checks between regions for cross-region failover
Tier 3: Application Load Balancers
- Layer 7 load balancers within each region
- Route based on:
- Geographic proximity (latency-based routing)
- Server health and capacity
- Content availability (some content region-specific)
Latency Optimization:
- Edge locations (CDN): Cache popular videos at edge
- Regional data centers: Route to nearest DC within region
- Anycast routing: Multiple IPs, BGP routes to nearest
- Pre-warming: Pre-connect to likely servers
- Adaptive routing: Monitor latency, dynamically adjust
Failover Strategy:
- Health checks: Multi-level (DNS → Regional LB → App LB → Server)
- Automatic failover:
- Regional: DNS fails over to next nearest region (within 30s)
- Data center: Regional LB fails over to backup DC (within 10s)
- Server: App LB removes unhealthy servers (within 5s)
- Graceful degradation: If region fails, route to next region with higher latency
- Circuit breakers: Prevent cascading failures
Data Consistency:
- Metadata: Strongly consistent (user preferences, watch history) via global database with regional replicas
- Video content: Eventually consistent (CDN propagation)
- Session data: Stored in distributed cache (Redis) with regional replication
- User authentication: Centralized with regional caches
Monitoring:
- Real-time latency dashboards per region
- Error rates and availability metrics
- Traffic patterns and capacity planning
- Automated alerts for anomalies
Example Flow:
User in Tokyo → GeoDNS → Routes to Asia region
→ Regional LB (Tokyo DC) → App LB → Video Server
→ If Tokyo DC fails → Regional LB → Singapore DC
→ If entire Asia region fails → GeoDNS → Routes to US West (with latency warning)
Key Takeaways
- Load balancing distributes traffic across multiple servers to improve performance, availability, and scalability
- Choose algorithm based on use case: Round-robin for equal servers, least connections for varying workloads, weighted for different capacities
- Layer 4 vs Layer 7: Layer 4 is faster but less intelligent; Layer 7 enables content-aware routing
- Health checks are critical: Continuously monitor server health and automatically remove unhealthy servers
- Session persistence: Use sticky sessions only when necessary (stateful apps); prefer stateless with shared session storage
- High availability: Load balancer itself must be highly available (multiple instances, automatic failover)
- Monitor metrics: Track latency, error rates, server utilization, and connection counts
- Global load balancing: Use DNS-based routing for geographic distribution with regional failover
- Scalability: Design for horizontal scaling; load balancer should handle adding/removing servers dynamically