Topic Overview
Circuit Breaker Pattern
Learn the circuit breaker pattern for fault tolerance. Understand states (closed, open, half-open), implementation, and preventing cascading failures.
The circuit breaker pattern prevents cascading failures in distributed systems by stopping requests to failing services, allowing them to recover, and providing fallback mechanisms.
What is a Circuit Breaker?
Circuit breaker is a design pattern that:
- Monitors service health
- Stops requests when service is failing
- Allows service to recover
- Resumes requests when service recovers
Analogy: Like electrical circuit breaker - stops current when overloaded, prevents damage.
Circuit Breaker States
1. Closed (Normal Operation)
State: Circuit is closed, requests flow through.
Requests → Circuit Breaker → Service
(All requests pass)
Behavior:
- Requests forwarded to service
- Failures counted
- If failure threshold reached → Open
2. Open (Failing)
State: Circuit is open, requests blocked.
Requests → Circuit Breaker → [BLOCKED]
(Requests fail fast)
Behavior:
- Requests immediately fail (no service call)
- Returns error or fallback
- After timeout → Half-Open
3. Half-Open (Testing)
State: Circuit is half-open, testing if service recovered.
Request → Circuit Breaker → Service (test)
(One request allowed)
Behavior:
- Allows one test request
- If successful → Closed
- If failed → Open
State Transitions
Closed → Open: Failure threshold reached
Open → Half-Open: Timeout elapsed
Half-Open → Closed: Test request succeeds
Half-Open → Open: Test request fails
Implementation
Basic Circuit Breaker
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failures = 0
self.last_failure_time = None
self.state = 'CLOSED' # CLOSED, OPEN, HALF_OPEN
def call(self, func, *args, **kwargs):
if self.state == 'OPEN':
if time.time() - self.last_failure_time > self.timeout:
self.state = 'HALF_OPEN'
else:
raise CircuitBreakerOpenError('Circuit breaker is OPEN')
try:
result = func(*args, **kwargs)
# Success
if self.state == 'HALF_OPEN':
self.state = 'CLOSED'
self.failures = 0
return result
except Exception as e:
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.failure_threshold:
self.state = 'OPEN'
raise e
Circuit Breaker with Fallback
class CircuitBreakerWithFallback:
def __init__(self, failure_threshold=5, timeout=60):
self.circuit_breaker = CircuitBreaker(failure_threshold, timeout)
self.fallback = None
def set_fallback(self, fallback_func):
self.fallback = fallback_func
def call(self, func, *args, **kwargs):
try:
return self.circuit_breaker.call(func, *args, **kwargs)
except CircuitBreakerOpenError:
if self.fallback:
return self.fallback(*args, **kwargs)
raise
Circuit Breaker with Metrics
class CircuitBreakerWithMetrics:
def __init__(self, failure_threshold=5, timeout=60):
self.circuit_breaker = CircuitBreaker(failure_threshold, timeout)
self.metrics = {
'total_requests': 0,
'successful_requests': 0,
'failed_requests': 0,
'circuit_open_count': 0
}
def call(self, func, *args, **kwargs):
self.metrics['total_requests'] += 1
try:
result = self.circuit_breaker.call(func, *args, **kwargs)
self.metrics['successful_requests'] += 1
return result
except CircuitBreakerOpenError:
self.metrics['circuit_open_count'] += 1
raise
except Exception as e:
self.metrics['failed_requests'] += 1
raise
Examples
HTTP Service with Circuit Breaker
import requests
from circuit_breaker import CircuitBreaker
class HTTPClient:
def __init__(self):
self.circuit_breaker = CircuitBreaker(failure_threshold=5, timeout=60)
def get(self, url):
def _get():
response = requests.get(url, timeout=5)
response.raise_for_status()
return response.json()
try:
return self.circuit_breaker.call(_get)
except CircuitBreakerOpenError:
# Return cached data or default
return self.get_cached_data(url)
Microservice with Circuit Breaker
class OrderService:
def __init__(self):
self.payment_circuit = CircuitBreaker(failure_threshold=5, timeout=60)
self.inventory_circuit = CircuitBreaker(failure_threshold=5, timeout=60)
async def create_order(self, order_data):
# Call payment service with circuit breaker
try:
payment = await self.payment_circuit.call(
self.payment_service.charge,
order_data.total
)
except CircuitBreakerOpenError:
# Payment service down: Queue for later
await self.queue_payment(order_data)
return {'status': 'queued'}
# Call inventory service with circuit breaker
try:
inventory = await self.inventory_circuit.call(
self.inventory_service.reserve,
order_data.items
)
except CircuitBreakerOpenError:
# Inventory service down: Refund payment
await self.payment_service.refund(payment.id)
raise ServiceUnavailableError('Inventory service unavailable')
return {'status': 'success', 'order_id': order.id}
Common Pitfalls
- Too sensitive: Opens too quickly. Fix: Adjust failure threshold and timeout
- Too slow to recover: Stays open too long. Fix: Reduce timeout, test recovery
- No fallback: Returns errors to users. Fix: Implement fallback (cached data, default values)
- Not monitoring: Don't know when circuit opens. Fix: Add metrics, alerts
- Ignoring half-open: Not testing recovery. Fix: Implement half-open state properly
Interview Questions
Beginner
Q: What is the circuit breaker pattern and why is it used?
A:
Circuit breaker pattern prevents cascading failures by stopping requests to failing services.
Why used:
- Prevent cascading failures: Failing service doesn't bring down entire system
- Allow recovery: Gives failing service time to recover
- Fail fast: Returns error immediately instead of waiting for timeout
- Resource protection: Prevents overwhelming failing service
States:
- Closed: Normal operation, requests pass through
- Open: Service failing, requests blocked (fail fast)
- Half-Open: Testing if service recovered
Example:
Service A calls Service B
Service B starts failing
Circuit breaker opens after 5 failures
Service A gets immediate error (no wait)
Service B has time to recover
After timeout, circuit breaker tests (half-open)
If successful, circuit closes (normal operation)
Intermediate
Q: Explain the circuit breaker states and transitions. How do you implement it?
A:
States:
-
Closed (Normal)
Requests → Circuit Breaker → Service Failures counted If failures >= threshold → Open -
Open (Failing)
Requests → Circuit Breaker → [BLOCKED] Returns error immediately (fail fast) After timeout → Half-Open -
Half-Open (Testing)
Request → Circuit Breaker → Service (test) If success → Closed If failure → Open
Implementation:
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failures = 0
self.last_failure_time = None
self.state = 'CLOSED'
def call(self, func):
if self.state == 'OPEN':
if time.time() - self.last_failure_time > self.timeout:
self.state = 'HALF_OPEN'
else:
raise CircuitBreakerOpenError()
try:
result = func()
if self.state == 'HALF_OPEN':
self.state = 'CLOSED'
self.failures = 0
return result
except Exception:
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.failure_threshold:
self.state = 'OPEN'
raise
Transitions:
- Closed → Open: Failure threshold reached
- Open → Half-Open: Timeout elapsed
- Half-Open → Closed: Test succeeds
- Half-Open → Open: Test fails
Senior
Q: Design a circuit breaker system for a microservices architecture. How do you handle different failure types, implement fallbacks, and monitor circuit breaker health?
A:
class MicroservicesCircuitBreaker {
private breakers: Map<string, CircuitBreaker>;
private fallbackManager: FallbackManager;
private metrics: Metrics;
constructor() {
this.breakers = new Map();
this.fallbackManager = new FallbackManager();
this.metrics = new Metrics();
}
// 1. Circuit Breaker per Service
getCircuitBreaker(serviceName: string): CircuitBreaker {
if (!this.breakers.has(serviceName)) {
this.breakers.set(serviceName, new CircuitBreaker({
failureThreshold: 5,
timeout: 60000, // 1 minute
halfOpenMaxCalls: 3
}));
}
return this.breakers.get(serviceName);
}
// 2. Call with Circuit Breaker
async call(serviceName: string, method: string, ...args: any[]): Promise<any> {
const breaker = this.getCircuitBreaker(serviceName);
const service = this.getService(serviceName);
try {
return await breaker.execute(async () => {
return await service[method](...args);
});
} catch (error) {
if (error instanceof CircuitBreakerOpenError) {
// Circuit open: Use fallback
return await this.fallbackManager.execute(serviceName, method, args);
}
throw error;
}
}
// 3. Advanced Circuit Breaker
class CircuitBreaker {
private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
private failures: number = 0;
private lastFailureTime: number = 0;
private halfOpenCalls: number = 0;
async execute<T>(fn: () => Promise<T>): Promise<T> {
// Check state
if (this.state === 'OPEN') {
if (Date.now() - this.lastFailureTime > this.timeout) {
this.state = 'HALF_OPEN';
this.halfOpenCalls = 0;
} else {
throw new CircuitBreakerOpenError();
}
}
// Half-open: Limit test calls
if (this.state === 'HALF_OPEN') {
if (this.halfOpenCalls >= this.halfOpenMaxCalls) {
throw new CircuitBreakerOpenError();
}
this.halfOpenCalls++;
}
try {
const result = await this.withTimeout(fn, 5000);
// Success
if (this.state === 'HALF_OPEN') {
this.state = 'CLOSED';
this.failures = 0;
} else {
this.failures = Math.max(0, this.failures - 1); // Decay
}
return result;
} catch (error) {
this.recordFailure(error);
throw error;
}
}
recordFailure(error: Error): void {
this.failures++;
this.lastFailureTime = Date.now();
// Different thresholds for different error types
const threshold = this.getThreshold(error);
if (this.failures >= threshold) {
this.state = 'OPEN';
this.metrics.recordCircuitOpen(this.serviceName);
}
}
getThreshold(error: Error): number {
// Timeout: Lower threshold (service slow)
if (error instanceof TimeoutError) {
return 3;
}
// 5xx: Service error (higher threshold)
if (error.status >= 500) {
return 5;
}
// 4xx: Client error (don't count)
if (error.status >= 400 && error.status < 500) {
return Infinity; // Don't open circuit
}
return this.failureThreshold;
}
}
// 4. Fallback Strategies
class FallbackManager {
async execute(serviceName: string, method: string, args: any[]): Promise<any> {
// Strategy 1: Cached data
const cached = await this.getCached(serviceName, method, args);
if (cached) {
return cached;
}
// Strategy 2: Default values
const defaults = this.getDefaults(serviceName, method);
if (defaults) {
return defaults;
}
// Strategy 3: Alternative service
const alternative = this.getAlternative(serviceName);
if (alternative) {
return await alternative[method](...args);
}
// Strategy 4: Queue for later
await this.queueRequest(serviceName, method, args);
throw new ServiceUnavailableError();
}
}
// 5. Monitoring
class Metrics {
recordCircuitOpen(serviceName: string): void {
// Alert when circuit opens
this.alert({
service: serviceName,
event: 'circuit_open',
timestamp: Date.now()
});
}
getHealth(): HealthStatus {
const status = {};
for (const [service, breaker] of this.breakers.entries()) {
status[service] = {
state: breaker.state,
failures: breaker.failures,
lastFailure: breaker.lastFailureTime
};
}
return status;
}
}
}
Features:
- Per-service circuit breakers: Separate breaker for each service
- Error type handling: Different thresholds for different errors
- Fallback strategies: Cached data, defaults, alternative services
- Monitoring: Track circuit state, failures, alerts
- Timeout handling: Prevent hanging requests
Key Takeaways
- Circuit breaker: Prevents cascading failures by stopping requests to failing services
- States: Closed (normal), Open (failing), Half-Open (testing)
- Transitions: Closed → Open (threshold), Open → Half-Open (timeout), Half-Open → Closed/Open (test result)
- Implementation: Monitor failures, open circuit at threshold, test recovery
- Fallback: Return cached data, defaults, or queue requests when circuit open
- Benefits: Fail fast, allow recovery, prevent cascading failures
- Best practices: Adjust thresholds, implement fallbacks, monitor health, handle different error types