Topic Overview
Circuit Breakers: Prevent Cascades & Handle Dependency Failure
Use circuit breakers to prevent cascading failures: states, thresholds, fallbacks, and how they pair with retries/timeouts.
Circuit Breakers
Why Engineers Care About This
Circuit breakers prevent cascading failures. When a dependency fails repeatedly, circuit breakers "open" (stop calling the dependency) to prevent overwhelming it and spreading failures. This enables graceful degradation—services can fail fast or use fallbacks instead of waiting for timeouts. But circuit breakers add complexity—state management, failure detection, and recovery logic.
When failures cascade through your system, or slow dependencies cause timeouts, or you can't fail fast when dependencies are down, you're hitting problems that circuit breakers solve. These problems compound. Without circuit breakers, failing dependencies cause cascading failures (one failure causes many failures). Slow dependencies cause timeouts across the system. Circuit breakers prevent these problems by failing fast and enabling graceful degradation.
In interviews, when someone asks "How would you handle failing dependencies?", they're really asking: "Do you understand circuit breakers? Do you know when to use circuit breakers vs retries? Do you understand state management and recovery?" Most engineers don't. They retry forever (overwhelming failing services) or don't handle failures at all.
Core Intuitions You Must Build
-
Circuit breakers have three states: closed, open, half-open. Closed: normal operation, requests pass through. Open: dependency is failing, requests fail fast (don't call dependency). Half-open: testing if dependency recovered, allow one request through. Circuit opens when failure threshold is reached (e.g., 5 failures in 10 seconds). Circuit transitions to half-open after timeout (e.g., 30 seconds) to test recovery. Circuit closes if test succeeds, opens if test fails.
-
Circuit breakers fail fast, preventing cascading failures. When circuit is open, requests fail immediately (don't call dependency, don't wait for timeout). This prevents overwhelming failing dependencies and spreading failures. Fail fast is better than slow failures—users get errors quickly instead of waiting for timeouts. Also, failing fast enables fallbacks (use cached data, default values) instead of waiting.
-
Circuit breakers are for repeated failures, not single failures. Circuit breakers open when failures are repeated (e.g., 5 failures in 10 seconds). Single failures don't open circuits—they might be transient. Use retries for single failures, circuit breakers for repeated failures. Don't use circuit breakers for every failure—they're for patterns of failures, not individual failures.
-
Fallback strategies enable graceful degradation. When circuit is open, services can use fallbacks (cached data, default values, alternative services) instead of failing. This enables graceful degradation—services continue operating with reduced functionality instead of failing completely. Design fallbacks for critical dependencies—they enable services to operate during outages.
-
Circuit breaker configuration affects behavior. Failure threshold (how many failures open circuit), timeout (how long circuit stays open), and half-open test (how many requests to test) all affect behavior. Too sensitive (low threshold) causes false positives (circuit opens on transient failures). Too insensitive (high threshold) causes slow failure detection. Configure based on your needs—expected failure patterns and recovery time.
-
Circuit breakers require monitoring and alerting. Circuit breakers indicate problems (dependencies failing). Monitor circuit breaker states—track open circuits, failure rates, and recovery times. Alert when circuits open or stay open for too long. This helps you catch problems early and debug dependency issues. Don't implement circuit breakers without monitoring—you won't know when dependencies are failing.
Subtopics (Taught Through Real Scenarios)
Circuit Breaker States
What people usually get wrong:
Engineers often think "circuit breaker is just on/off." But circuit breakers have three states: closed (normal), open (failing), half-open (testing recovery). Understanding states is critical—circuit opens on repeated failures, transitions to half-open to test recovery, closes if recovery succeeds. Don't implement circuit breakers as simple on/off—they need state management.
How this breaks systems in the real world:
A service implemented a circuit breaker as simple on/off. When dependency failed, circuit opened and stayed open. There was no half-open state to test recovery. When dependency recovered, circuit stayed open, causing unnecessary failures. The fix? Implement three-state circuit breaker—closed (normal), open (failing), half-open (testing recovery). Now circuit tests recovery and closes when dependency recovers. But the real lesson is: circuit breakers need state management. Three states enable recovery testing.
What interviewers are really listening for:
They want to hear you talk about circuit breaker states, state transitions, and recovery testing. Junior engineers say "circuit breaker is just on/off." Senior engineers say "circuit breakers have three states—closed (normal), open (failing), half-open (testing recovery)—with state transitions based on failure patterns and recovery." They're testing whether you understand that circuit breakers are state machines, not simple switches.
Fail Fast vs Retries
What people usually get wrong:
Engineers often use retries for everything, or circuit breakers for everything. But they solve different problems. Retries handle transient failures (network hiccups, temporary errors). Circuit breakers handle repeated failures (dependency is down, needs time to recover). Use retries for single failures, circuit breakers for repeated failures. Don't retry when circuit is open—it wastes resources and overwhelms failing dependencies.
How this breaks systems in the real world:
A service retried failed requests indefinitely. When a dependency was down, the service retried requests repeatedly, overwhelming the dependency and spreading failures. The fix? Use circuit breakers—when dependency fails repeatedly, open circuit and fail fast. Don't retry when circuit is open. Now failures are contained, and dependencies can recover. But the real lesson is: retries and circuit breakers solve different problems. Use retries for transient failures, circuit breakers for repeated failures.
What interviewers are really listening for:
They want to hear you talk about fail fast vs retries, when to use each, and their trade-offs. Junior engineers say "just retry everything" or "just use circuit breakers." Senior engineers say "use retries for transient failures, circuit breakers for repeated failures—circuit breakers fail fast to prevent cascading failures, retries handle individual failures." They're testing whether you understand that fail fast and retries are complementary, not alternatives.
Fallback Strategies
What people usually get wrong:
Engineers often fail completely when circuit is open. But services can use fallbacks (cached data, default values, alternative services) to continue operating with reduced functionality. Design fallbacks for critical dependencies—they enable graceful degradation. Don't fail completely when circuit is open—use fallbacks to maintain partial functionality.
How this breaks systems in the real world:
A service failed completely when a dependency's circuit was open. Users couldn't use the service at all, even though some functionality could work with fallbacks (cached data, default values). The fix? Implement fallbacks—when circuit is open, use cached data or default values instead of failing completely. Now service continues operating with reduced functionality. But the real lesson is: fallbacks enable graceful degradation. Don't fail completely when circuit is open.
What interviewers are really listening for:
They want to hear you talk about fallback strategies, graceful degradation, and maintaining partial functionality. Junior engineers say "just fail when circuit is open." Senior engineers say "implement fallbacks when circuit is open—use cached data, default values, or alternative services to maintain partial functionality and enable graceful degradation." They're testing whether you understand that circuit breakers enable graceful degradation, not just "failing fast."
- Circuit breakers have three states—closed (normal), open (failing), half-open (testing recovery)
- Circuit breakers fail fast—prevent cascading failures by failing immediately when circuit is open
- Circuit breakers are for repeated failures—use retries for single failures, circuit breakers for patterns
- Fallback strategies enable graceful degradation—use cached data or default values when circuit is open
- Circuit breaker configuration affects behavior—configure thresholds and timeouts based on needs
- Circuit breakers require monitoring—track states, failure rates, and recovery times
- Good circuit breakers prevent cascading failures and enable graceful degradation
- Error Handling & Logging - Handling circuit breaker failures
- Message Queues - Using circuit breakers with message queues
- System Design - Designing resilient systems with circuit breakers
Key Takeaways
Circuit breakers have three states—closed (normal), open (failing), half-open (testing recovery)
Circuit breakers fail fast—prevent cascading failures by failing immediately when circuit is open
Circuit breakers are for repeated failures—use retries for single failures, circuit breakers for patterns
Fallback strategies enable graceful degradation—use cached data or default values when circuit is open
Circuit breaker configuration affects behavior—configure thresholds and timeouts based on needs
Circuit breakers require monitoring—track states, failure rates, and recovery times
Good circuit breakers prevent cascading failures and enable graceful degradation