Topic Overview

Load Balancing for System Design Interviews

Interview-focused load balancing: L4 vs L7, health checks, algorithms, and scaling/reliability trade-offs.

10 min read

Load balancing is the practice of distributing incoming network traffic across multiple servers to ensure no single server becomes overwhelmed. It's fundamental to building scalable, high-availability systems.


What is Load Balancing?

A load balancer sits between clients and servers, acting as a reverse proxy that distributes requests. It improves:

  • Performance: Distributes load to prevent bottlenecks
  • Availability: Routes traffic away from failed servers
  • Scalability: Allows horizontal scaling by adding more servers

Load Balancing Algorithms

Round Robin

Distributes requests sequentially across servers in rotation.

Use case: When all servers have similar capacity and requests are stateless.

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)

Least Connections

Routes to the server with the fewest active connections.

Use case: When requests have varying processing times (e.g., file uploads, long-running queries).

Weighted Round Robin

Round robin with weights assigned to servers based on capacity.

Use case: When servers have different hardware specs (e.g., 2x weight for servers with 2x CPU).

IP Hash

Hashes client IP to consistently route to the same server.

Use case: When you need session affinity (sticky sessions).

Least Response Time

Routes to the server with the lowest average response time.

Use case: When server performance varies dynamically.


Types of Load Balancers

Layer 4 (Transport Layer)

Operates at TCP/UDP level. Faster, but less intelligent.

Pros: Low latency, high throughput, simple Cons: Can't inspect HTTP content, limited routing options

Layer 7 (Application Layer)

Operates at HTTP/HTTPS level. More intelligent routing.

Pros: Content-aware routing, SSL termination, path-based routing Cons: Higher latency, more CPU intensive


Health Checks

Load balancers continuously check server health:

// Health check configuration
interface HealthCheck {
  protocol: 'HTTP' | 'TCP' | 'HTTPS';
  path: '/health';
  interval: 30; // seconds
  timeout: 5; // seconds
  healthyThreshold: 2; // consecutive successes
  unhealthyThreshold: 3; // consecutive failures
}

Common health check endpoints:

  • /health - Basic liveness check
  • /health/ready - Readiness check (dependencies available)
  • /health/live - Liveness check (process running)

Session Persistence (Sticky Sessions)

Some applications require a client to always hit the same server.

Methods:

  1. Cookie-based: Load balancer sets a cookie with server identifier
  2. IP-based: Hash client IP (problematic with NAT/proxies)
  3. Application-level: Store session in shared cache (Redis)

Trade-off: Sticky sessions reduce flexibility but may be necessary for stateful applications.


Examples

Simple Load Balancer Design

class LoadBalancer {
  private servers: Server[];
  private algorithm: LoadBalancingAlgorithm;
  private healthChecker: HealthChecker;

  constructor(servers: Server[], algorithm: LoadBalancingAlgorithm) {
    this.servers = servers;
    this.algorithm = algorithm;
    this.healthChecker = new HealthChecker(servers);
  }

  async routeRequest(request: Request): Promise<Response> {
    const healthyServers = this.healthChecker.getHealthyServers();
    if (healthyServers.length === 0) {
      throw new Error('No healthy servers available');
    }

    const selectedServer = this.algorithm.selectServer(
      healthyServers,
      request
    );
    
    return await this.forwardRequest(selectedServer, request);
  }

  private async forwardRequest(
    server: Server,
    request: Request
  ): Promise<Response> {
    try {
      return await fetch(`http://${server.host}:${server.port}${request.path}`, {
        method: request.method,
        body: request.body,
        headers: request.headers,
      });
    } catch (error) {
      this.healthChecker.markUnhealthy(server);
      throw error;
    }
  }
}

AWS Application Load Balancer (ALB) Configuration

# ALB with health checks and multiple target groups
Resources:
  ApplicationLoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Type: application
      Scheme: internet-facing
      Subnets: [subnet-1, subnet-2]
      SecurityGroups: [sg-12345]
      Listeners:
        - Protocol: HTTPS
          Port: 443
          DefaultActions:
            - Type: forward
              TargetGroupArn: !Ref TargetGroup

  TargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      Port: 8080
      Protocol: HTTP
      VpcId: vpc-12345
      HealthCheckPath: /health
      HealthCheckIntervalSeconds: 30
      HealthyThresholdCount: 2
      UnhealthyThresholdCount: 3

Common Pitfalls

  • Not implementing health checks: Traffic may route to dead servers, causing failures
  • Using sticky sessions unnecessarily: Reduces flexibility and can cause uneven load distribution
  • Ignoring server capacity differences: Use weighted algorithms when servers have different specs
  • Single point of failure: Load balancer itself must be highly available (use multiple instances)
  • Not monitoring metrics: Track request latency, error rates, and server utilization
  • Layer 4 for complex routing: Use Layer 7 when you need content-aware routing or SSL termination
  • Ignoring geographic distribution: Use DNS-based load balancing for global traffic

Interview Questions

Beginner

Q: What is load balancing and why is it important?

A: Load balancing distributes incoming network traffic across multiple servers to prevent any single server from becoming overwhelmed. It's important because it:

  • Improves performance by distributing load
  • Increases availability by routing traffic away from failed servers
  • Enables horizontal scaling by adding more servers as needed
  • Provides redundancy and fault tolerance

Intermediate

Q: Design a load balancer that handles 1 million requests per second. What algorithms would you use and how would you ensure high availability?

A:

Architecture:

  • Multiple load balancer instances (active-active) behind DNS round-robin
  • Layer 4 load balancers for initial distribution (low latency)
  • Layer 7 load balancers behind Layer 4 for intelligent routing
  • Consistent hashing for server selection to minimize cache misses

Algorithms:

  • Primary: Least connections (handles varying request times)
  • Fallback: Weighted round-robin (accounts for server capacity)
  • Session affinity: IP hash with consistent hashing ring

High Availability:

  1. Multiple LB instances in different availability zones
  2. Health checks every 5 seconds with 2 healthy/3 unhealthy thresholds
  3. Automatic failover using DNS or BGP anycast
  4. Circuit breakers to quickly remove unhealthy servers
  5. Rate limiting to prevent overload
  6. Monitoring with real-time alerts on latency/error spikes

Scaling:

  • Use distributed load balancers (e.g., AWS ALB, Google Cloud Load Balancer)
  • Implement connection pooling
  • Cache health check results (update every 5s, not per-request)
  • Use hardware load balancers or specialized software (Envoy, HAProxy)

Senior

Q: Design a global load balancing system for a video streaming service with users in North America, Europe, and Asia. How do you handle latency, failover, and data consistency?

A:

Multi-tier Architecture:

Tier 1: DNS-based Global Load Balancing (GeoDNS)

  • Route users to nearest region based on DNS resolver location
  • Use anycast IPs for lowest latency
  • TTL: 60 seconds for fast failover

Tier 2: Regional Load Balancers

  • Each region (NA, EU, Asia) has its own load balancer cluster
  • Active-active across multiple data centers in region
  • Health checks between regions for cross-region failover

Tier 3: Application Load Balancers

  • Layer 7 load balancers within each region
  • Route based on:
    • Geographic proximity (latency-based routing)
    • Server health and capacity
    • Content availability (some content region-specific)

Latency Optimization:

  1. Edge locations (CDN): Cache popular videos at edge
  2. Regional data centers: Route to nearest DC within region
  3. Anycast routing: Multiple IPs, BGP routes to nearest
  4. Pre-warming: Pre-connect to likely servers
  5. Adaptive routing: Monitor latency, dynamically adjust

Failover Strategy:

  1. Health checks: Multi-level (DNS → Regional LB → App LB → Server)
  2. Automatic failover:
    • Regional: DNS fails over to next nearest region (within 30s)
    • Data center: Regional LB fails over to backup DC (within 10s)
    • Server: App LB removes unhealthy servers (within 5s)
  3. Graceful degradation: If region fails, route to next region with higher latency
  4. Circuit breakers: Prevent cascading failures

Data Consistency:

  • Metadata: Strongly consistent (user preferences, watch history) via global database with regional replicas
  • Video content: Eventually consistent (CDN propagation)
  • Session data: Stored in distributed cache (Redis) with regional replication
  • User authentication: Centralized with regional caches

Monitoring:

  • Real-time latency dashboards per region
  • Error rates and availability metrics
  • Traffic patterns and capacity planning
  • Automated alerts for anomalies

Example Flow:

User in Tokyo → GeoDNS → Routes to Asia region
→ Regional LB (Tokyo DC) → App LB → Video Server
→ If Tokyo DC fails → Regional LB → Singapore DC
→ If entire Asia region fails → GeoDNS → Routes to US West (with latency warning)

  • Load balancing distributes traffic across multiple servers to improve performance, availability, and scalability
  • Choose algorithm based on use case: Round-robin for equal servers, least connections for varying workloads, weighted for different capacities
  • Layer 4 vs Layer 7: Layer 4 is faster but less intelligent; Layer 7 enables content-aware routing
  • Health checks are critical: Continuously monitor server health and automatically remove unhealthy servers
  • Session persistence: Use sticky sessions only when necessary (stateful apps); prefer stateless with shared session storage
  • High availability: Load balancer itself must be highly available (multiple instances, automatic failover)
  • Monitor metrics: Track latency, error rates, server utilization, and connection counts
  • Global load balancing: Use DNS-based routing for geographic distribution with regional failover
  • Scalability: Design for horizontal scaling; load balancer should handle adding/removing servers dynamically

About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.