Topic Overview

Saga Pattern: Distributed Transactions, Orchestration & Recovery

Learn sagas for distributed transactions: choreography vs orchestration, compensations, idempotency, and failure handling.

12 min read

The Saga pattern manages distributed transactions across multiple microservices by breaking them into a series of local transactions with compensating actions. Each step has a corresponding compensation that undoes its effects.


What is the Saga Pattern?

Saga is a pattern for managing distributed transactions:

  • Local transactions: Each service performs its own transaction
  • Compensating transactions: Each step has a compensation
  • Eventual consistency: System eventually consistent (not immediately)
  • No 2PC: Avoids two-phase commit complexity

Why needed:

  • Microservices: Each service has its own database
  • No distributed transactions: 2PC too slow, doesn't scale
  • Need consistency: Must handle failures gracefully

Saga Types

1. Choreography (Decentralized)

Services coordinate via events:

Order Service: Create order → Publish OrderCreated
Inventory Service: Reserve → Publish InventoryReserved
Payment Service: Charge → Publish PaymentProcessed

If Payment fails:
  Payment Service: Publish PaymentFailed
  Inventory Service: Release inventory (compensate)
  Order Service: Cancel order (compensate)

No central coordinator

2. Orchestration (Centralized)

Orchestrator coordinates:

Orchestrator:
  1. Create order (Order Service)
  2. Reserve inventory (Inventory Service)
  3. Process payment (Payment Service)
  
If payment fails:
  Orchestrator:
    1. Refund payment (compensate)
    2. Release inventory (compensate)
    3. Cancel order (compensate)

Central coordinator


Compensating Transactions

Compensation undoes the effect of a transaction step.

Example:

Step 1: Create order → Compensation: Cancel order
Step 2: Reserve inventory → Compensation: Release inventory
Step 3: Charge payment → Compensation: Refund payment

Important: Compensation may not be perfect (e.g., refund may take time)


Examples

Saga Orchestration

class OrderSagaOrchestrator {
  async createOrder(orderData: OrderData): Promise<Order> {
    const saga = new Saga();
    
    // Step 1: Create order
    saga.addStep(
      async () => {
        const order = await this.orderService.createOrder(orderData);
        return { orderId: order.id };
      },
      async (context) => {
        await this.orderService.cancelOrder(context.orderId);
      }
    );
    
    // Step 2: Reserve inventory
    saga.addStep(
      async (context) => {
        await this.inventoryService.reserve(context.orderId, orderData.items);
        return context;
      },
      async (context) => {
        await this.inventoryService.release(context.orderId);
      }
    );
    
    // Step 3: Process payment
    saga.addStep(
      async (context) => {
        const payment = await this.paymentService.charge(context.orderId, orderData.total);
        return { ...context, paymentId: payment.id };
      },
      async (context) => {
        await this.paymentService.refund(context.paymentId);
      }
    );
    
    try {
      return await saga.execute();
    } catch (error) {
      // Saga failed, compensations executed
      throw error;
    }
  }
}

class Saga {
  private steps: SagaStep[] = [];
  
  addStep(execute: Function, compensate: Function): void {
    this.steps.push({ execute, compensate });
  }
  
  async execute(): Promise<any> {
    const context = {};
    const executed: SagaStep[] = [];
    
    try {
      for (const step of this.steps) {
        const result = await step.execute(context);
        Object.assign(context, result);
        executed.push(step);
      }
      return context;
    } catch (error) {
      // Compensate in reverse order
      for (const step of executed.reverse()) {
        try {
          await step.compensate(context);
        } catch (compError) {
          // Log compensation failure
          console.error('Compensation failed:', compError);
        }
      }
      throw error;
    }
  }
}

Saga Choreography

// Order Service
class OrderService {
  async createOrder(orderData: OrderData): Promise<Order> {
    const order = await this.database.createOrder(orderData);
    
    // Publish event
    await this.eventBus.publish('order.created', {
      orderId: order.id,
      items: orderData.items,
      total: orderData.total
    });
    
    return order;
  }
  
  // Subscribe to events
  constructor() {
    this.eventBus.subscribe('payment.failed', async (event) => {
      // Compensate: Cancel order
      await this.cancelOrder(event.orderId);
    });
  }
}

// Inventory Service
class InventoryService {
  constructor() {
    this.eventBus.subscribe('order.created', async (event) => {
      await this.reserveInventory(event.orderId, event.items);
    });
    
    this.eventBus.subscribe('payment.failed', async (event) => {
      // Compensate: Release inventory
      await this.releaseInventory(event.orderId);
    });
  }
  
  async reserveInventory(orderId: string, items: Item[]): Promise<void> {
    await this.database.reserve(items);
    
    await this.eventBus.publish('inventory.reserved', { orderId });
  }
}

// Payment Service
class PaymentService {
  constructor() {
    this.eventBus.subscribe('inventory.reserved', async (event) => {
      try {
        await this.processPayment(event.orderId);
        await this.eventBus.publish('payment.processed', { orderId: event.orderId });
      } catch (error) {
        // Publish failure event
        await this.eventBus.publish('payment.failed', { orderId: event.orderId });
      }
    });
  }
}

Common Pitfalls

  • Non-idempotent compensations: Compensation called multiple times. Fix: Make compensations idempotent
  • Lost compensation events: Events lost, compensation not executed. Fix: Persistent event bus, retries
  • Circular dependencies: Services waiting for each other. Fix: Design carefully, use timeouts
  • Not handling partial failures: Some steps succeed, some fail. Fix: Track executed steps, compensate all

Interview Questions

Beginner

Q: What is the Saga pattern and why is it used in microservices?

A:

Saga pattern manages distributed transactions across microservices using compensating transactions.

Why used:

  • Microservices: Each service has its own database
  • No distributed transactions: 2PC too slow, doesn't scale
  • Need consistency: Must handle failures gracefully

How it works:

1. Execute step 1 (local transaction)
2. Execute step 2 (local transaction)
3. Execute step 3 (local transaction)

If step 3 fails:
  1. Compensate step 3
  2. Compensate step 2
  3. Compensate step 1

Example:

Create order → Reserve inventory → Charge payment

If payment fails:
  Refund payment → Release inventory → Cancel order

Types:

  • Choreography: Services coordinate via events
  • Orchestration: Central orchestrator coordinates

Intermediate

Q: Explain the difference between Saga orchestration and choreography. When would you use each?

A:

Orchestration (Centralized):

Orchestrator coordinates workflow:

Orchestrator:
  1. Create order (Order Service)
  2. Reserve inventory (Inventory Service)
  3. Process payment (Payment Service)
  
If payment fails:
  Orchestrator compensates in reverse order

Benefits:

  • Clear workflow
  • Easy to handle failures
  • Better visibility
  • Centralized logic

Drawbacks:

  • Tight coupling to orchestrator
  • Single point of failure

Choreography (Decentralized):

Services coordinate via events:

OrderCreated Event:
  → Inventory Service: Reserve
  → Payment Service: Charge

PaymentFailed Event:
  → Inventory Service: Release (compensate)
  → Order Service: Cancel (compensate)

Benefits:

  • Loose coupling
  • No single point of failure
  • Easy to add services

Drawbacks:

  • Hard to track overall flow
  • Difficult to handle failures

When to use:

  • Orchestration: Complex workflows, need coordination
  • Choreography: Simple workflows, loose coupling needed

Senior

Q: Design a Saga system for an e-commerce platform that handles order processing across multiple services. How do you ensure compensations are executed, handle partial failures, and maintain consistency?

A:

class ECommerceSagaSystem {
  private sagaStore: SagaStore;
  private eventBus: EventBus;
  private compensationManager: CompensationManager;
  
  constructor() {
    this.sagaStore = new SagaStore();
    this.eventBus = new EventBus();
    this.compensationManager = new CompensationManager();
  }
  
  // 1. Saga Orchestration
  class OrderSaga {
    async execute(orderData: OrderData): Promise<Order> {
      const sagaId = this.generateSagaId();
      const saga = {
        id: sagaId,
        steps: [],
        state: 'RUNNING',
        context: {}
      };
      
      await this.sagaStore.save(saga);
      
      try {
        // Step 1: Create order
        const order = await this.executeStep(saga, async () => {
          return await this.orderService.createOrder(orderData);
        }, async (order) => {
          await this.orderService.cancelOrder(order.id);
        });
        
        // Step 2: Reserve inventory
        await this.executeStep(saga, async () => {
          return await this.inventoryService.reserve(order.id, orderData.items);
        }, async () => {
          await this.inventoryService.release(order.id);
        });
        
        // Step 3: Process payment
        await this.executeStep(saga, async () => {
          return await this.paymentService.charge(order.id, orderData.total);
        }, async (payment) => {
          await this.paymentService.refund(payment.id);
        });
        
        saga.state = 'COMPLETED';
        await this.sagaStore.save(saga);
        
        return order;
      } catch (error) {
        // Compensate
        await this.compensate(saga);
        throw error;
      }
    }
    
    async executeStep(saga: Saga, execute: Function, compensate: Function): Promise<any> {
      const step = {
        id: this.generateStepId(),
        execute,
        compensate,
        state: 'PENDING'
      };
      
      saga.steps.push(step);
      await this.sagaStore.save(saga);
      
      try {
        step.state = 'EXECUTING';
        await this.sagaStore.save(saga);
        
        const result = await execute();
        
        step.state = 'COMPLETED';
        step.result = result;
        await this.sagaStore.save(saga);
        
        return result;
      } catch (error) {
        step.state = 'FAILED';
        step.error = error;
        await this.sagaStore.save(saga);
        throw error;
      }
    }
    
    async compensate(saga: Saga): Promise<void> {
      saga.state = 'COMPENSATING';
      await this.sagaStore.save(saga);
      
      // Compensate in reverse order
      for (const step of saga.steps.reverse()) {
        if (step.state === 'COMPLETED') {
          try {
            await step.compensate(step.result);
            step.state = 'COMPENSATED';
          } catch (error) {
            // Log compensation failure
            step.state = 'COMPENSATION_FAILED';
            // Continue compensating other steps
          }
          await this.sagaStore.save(saga);
        }
      }
      
      saga.state = 'COMPENSATED';
      await this.sagaStore.save(saga);
    }
  }
  
  // 2. Saga Store (Persistence)
  class SagaStore {
    async save(saga: Saga): Promise<void> {
      // Persist saga state
      await this.database.upsert('sagas', saga);
    }
    
    async get(sagaId: string): Promise<Saga> {
      return await this.database.get('sagas', sagaId);
    }
    
    async getIncompleteSagas(): Promise<Saga[]> {
      // Get sagas that need recovery
      return await this.database.query(
        'SELECT * FROM sagas WHERE state IN (?, ?)',
        ['RUNNING', 'COMPENSATING']
      );
    }
  }
  
  // 3. Saga Recovery
  class SagaRecovery {
    async recover(): Promise<void> {
      const incompleteSagas = await this.sagaStore.getIncompleteSagas();
      
      for (const saga of incompleteSagas) {
        if (this.isStale(saga)) {
          // Saga stuck: Compensate
          await this.compensate(saga);
        } else {
          // Retry failed step
          await this.retryStep(saga);
        }
      }
    }
    
    isStale(saga: Saga): boolean {
      const lastUpdate = saga.updatedAt;
      const now = Date.now();
      return (now - lastUpdate) > 300000; // 5 minutes
    }
  }
  
  // 4. Idempotent Compensations
  class CompensationManager {
    async compensate(step: SagaStep, context: any): Promise<void> {
      // Check if already compensated
      const compensated = await this.isCompensated(step.id);
      if (compensated) {
        return; // Idempotent: Already compensated
      }
      
      // Execute compensation
      await step.compensate(context);
      
      // Mark as compensated
      await this.markCompensated(step.id);
    }
  }
}

Features:

  1. Saga persistence: Store saga state for recovery
  2. Step tracking: Track each step's state
  3. Compensation: Compensate in reverse order
  4. Recovery: Recover incomplete sagas
  5. Idempotent compensations: Safe to retry

  • Saga pattern: Manages distributed transactions using compensating transactions
  • Local transactions: Each service performs its own transaction
  • Compensating transactions: Each step has compensation to undo effects
  • Orchestration: Central orchestrator coordinates (clear workflow)
  • Choreography: Services coordinate via events (loose coupling)
  • Eventual consistency: System eventually consistent (not immediately)
  • Recovery: Persist saga state, recover incomplete sagas
  • Best practices: Idempotent compensations, track steps, handle partial failures

About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.