Cloud & DevOps Topic

Load Balancing (Concepts & Use Cases)

Learn load balancing concepts: algorithms, health checks, failure handling, and common deployment patterns.

January 23, 202523 min read

Load Balancing

Why Engineers Care About This

Load balancing distributes traffic across multiple servers, enabling high availability and scalability. When one server fails, load balancers route traffic to healthy servers. When traffic increases, load balancers distribute load across servers. But load balancing requires careful configuration—algorithms, health checks, and session handling. Understanding load balancing helps you build reliable, scalable systems.

When single servers become bottlenecks, or server failures cause outages, or traffic isn't distributed evenly, you're hitting load balancing problems. These problems compound. Without load balancing, single points of failure cause outages, and servers can't handle traffic spikes. With poor load balancing (wrong algorithm, no health checks), traffic isn't distributed evenly or failures aren't handled. Good load balancing solves these problems by distributing traffic and handling failures.

In interviews, when someone asks "How would you handle high traffic?", they're really asking: "Do you understand load balancing? Do you know how to choose load balancing algorithms? Do you understand health checks and failover?" Most engineers don't. They use default load balancing without understanding algorithms, or don't implement load balancing at all.

Core Intuitions You Must Build

Load balancing algorithms determine traffic distribution. Different algorithms (round-robin, least connections, weighted, IP hash) distribute traffic differently. Round-robin distributes evenly but doesn't consider server load. Least connections routes to server with fewest connections (better for long-lived connections). Weighted algorithms consider server capacity. Choose algorithm based on use case—don't use one algorithm for everything.
Health checks enable automatic failover. Load balancers must know which servers are healthy to route traffic correctly. Health checks (HTTP, TCP) test server health periodically. If health check fails, load balancer stops routing traffic to that server (automatic failover). Health checks should be fast and accurate—slow health checks delay failover, inaccurate health checks cause false positives.
Session persistence enables stateful applications. Some applications require session persistence (sticky sessions)—same client must connect to same server to maintain session state. Load balancers can use cookies, IP hashing, or other methods to ensure session persistence. But session persistence reduces load distribution flexibility—if server fails, sessions are lost. Use session persistence only when needed—stateless applications don't need it.
Layer 4 (TCP) vs Layer 7 (HTTP) load balancing have different capabilities. Layer 4 load balancing (TCP) routes based on IP and port, is faster (less processing), but has limited capabilities (can't inspect HTTP). Layer 7 load balancing (HTTP) routes based on HTTP headers/paths, is slower (more processing), but has more capabilities (content-based routing, SSL termination). Choose based on needs—Layer 4 for performance, Layer 7 for flexibility.
Load balancing enables horizontal scaling. With load balancing, you can add servers to handle increased traffic (horizontal scaling). Load balancers automatically distribute traffic to new servers. This enables scaling without downtime—add servers, they're automatically included in load balancing. Don't scale vertically (bigger servers) when you can scale horizontally (more servers) with load balancing.
Load balancers can become single points of failure. Load balancers themselves can fail, causing outages. Design load balancer high availability—use multiple load balancers (active-passive, active-active), or use managed load balancers (cloud providers handle availability). Don't have a single load balancer—it's a single point of failure.

Subtopics (Taught Through Real Scenarios)

Load Balancing Algorithms

What people usually get wrong:

Engineers often use default load balancing algorithm (usually round-robin) without understanding alternatives. But different algorithms suit different use cases. Round-robin distributes evenly but doesn't consider server load. Least connections routes to server with fewest connections (better for long-lived connections). Weighted algorithms consider server capacity. Choose algorithm based on use case—don't use one algorithm for everything.

How this breaks systems in the real world:

A service used round-robin load balancing for long-lived connections (WebSocket, database connections). Round-robin distributed connections evenly, but some servers had more active connections than others (connections stayed open longer). This caused uneven load—some servers were overloaded, others were underutilized. The fix? Use least connections algorithm—routes to server with fewest connections, balancing load better for long-lived connections. Now load is distributed evenly. But the real lesson is: load balancing algorithms matter. Choose based on use case.

What interviewers are really listening for:

They want to hear you talk about load balancing algorithms, their trade-offs, and when to use each. Junior engineers say "just use round-robin." Senior engineers say "choose load balancing algorithm based on use case—round-robin for even distribution, least connections for long-lived connections, weighted for different server capacities." They're testing whether you understand that load balancing algorithms have trade-offs.

Health Checks and Failover

What people usually get wrong:

Engineers often don't configure health checks or use slow health checks. But health checks enable automatic failover—if server fails, load balancer stops routing traffic to it. Health checks should be fast and accurate—slow health checks delay failover, inaccurate health checks cause false positives. Configure health checks correctly—fast checks, appropriate intervals, accurate health indicators.

How this breaks systems in the real world:

A service had health checks that took 30 seconds (checking database, external APIs). When a server failed, health check took 30 seconds to detect failure, then load balancer stopped routing traffic. During those 30 seconds, requests were sent to failed server, causing errors. The fix? Use fast health checks (simple endpoint, 1-2 seconds)—detect failures quickly, failover immediately. Now failures are detected and handled quickly. But the real lesson is: health checks must be fast. Slow health checks delay failover.

What interviewers are really listening for:

They want to hear you talk about health checks, failover, and health check configuration. Junior engineers say "just enable health checks." Senior engineers say "configure fast, accurate health checks—slow health checks delay failover, inaccurate health checks cause false positives—health checks enable automatic failover when servers fail." They're testing whether you understand that health checks are about failover speed, not just "checking health."

Session Persistence

What people usually get wrong:

Engineers often enable session persistence for all applications, thinking "it's safer." But session persistence reduces load distribution flexibility—if server fails, sessions are lost. Stateless applications don't need session persistence—they can connect to any server. Use session persistence only when needed (stateful applications that require session state). Don't enable session persistence for stateless applications—it reduces flexibility.

How this breaks systems in the real world:

A stateless API service enabled session persistence (sticky sessions). When a server failed, all sessions on that server were lost, and clients couldn't reconnect (load balancer tried to route to failed server). The fix? Disable session persistence for stateless APIs—any server can handle any request. Now server failures don't affect clients (load balancer routes to healthy servers). But the real lesson is: session persistence is only needed for stateful applications. Don't enable it unnecessarily.

What interviewers are really listening for:

They want to hear you talk about session persistence, when it's needed, and its trade-offs. Junior engineers say "just enable sticky sessions." Senior engineers say "use session persistence only for stateful applications that require session state—stateless applications don't need it, and session persistence reduces load distribution flexibility." They're testing whether you understand that session persistence is about state, not just "routing."

Key Takeaways

Load balancing algorithms determine traffic distribution—choose based on use case (round-robin, least connections, weighted)

Health checks enable automatic failover—configure fast, accurate health checks for quick failover

Session persistence enables stateful applications—use only when needed, reduces flexibility

Layer 4 (TCP) vs Layer 7 (HTTP) load balancing have different capabilities—Layer 4 for performance, Layer 7 for flexibility

Load balancing enables horizontal scaling—add servers, traffic automatically distributed

Load balancers can become single points of failure—design high availability for load balancers

Good load balancing distributes traffic evenly and handles failures automatically

Keep exploring

Production ownership spans deploy, observe, and recover. Pick the next hub topic that completes the loop you started here.

Load Balancing (Concepts & Use Cases)

Load Balancing

Why Engineers Care About This

Core Intuitions You Must Build

Subtopics (Taught Through Real Scenarios)

Load Balancing Algorithms

Health Checks and Failover

Session Persistence

Key Takeaways

Related Topics

Keep exploring