Topic Overview

Interrupts & Traps

Understand interrupts (hardware events) and traps (software exceptions) in operating systems.

Medium8 min read

Interrupts & Traps

Why This Matters

Think of interrupts like a doorbell. When someone rings the doorbell (hardware event), you stop what you're doing (CPU stops current task), answer the door (handle the interrupt), then go back to what you were doing (resume previous task). Interrupts allow the CPU to respond to events (keyboard input, network packets, timer ticks) without constantly checking (polling). Traps are like software doorbells—they're triggered by software (exceptions, system calls).

This matters because interrupts are how the OS responds to events efficiently. Without interrupts, the CPU would have to constantly poll devices ("is there a keypress? is there a network packet?"), wasting CPU cycles. Interrupts allow devices to notify the CPU when events occur, making the system efficient. Understanding interrupts helps you understand how I/O works and how the OS handles events.

In interviews, when someone asks "How does the OS handle I/O?", they're testing whether you understand interrupts. Do you know how interrupts work? Do you understand interrupt handlers? Most engineers don't. They just use I/O and assume it works.

What Engineers Usually Get Wrong

Most engineers think "interrupts are just events." But interrupts involve saving CPU state, switching to interrupt handler, handling the interrupt, and restoring state. This has overhead. Also, interrupts can be nested (interrupt handler can be interrupted), which requires careful handling. Understanding this helps you understand interrupt overhead and system behavior.

Engineers also don't understand that interrupts can be disabled. The OS disables interrupts during critical sections (like updating kernel data structures) to prevent race conditions. If interrupts are disabled too long, the system becomes unresponsive (can't handle events). Understanding this helps you understand why some operations must be fast.

How This Breaks Systems in the Real World

A service was experiencing high interrupt rates. A network card was generating thousands of interrupts per second (one per packet). The CPU spent most of its time handling interrupts, leaving little time for actual work. The service became slow. The fix? Use interrupt coalescing (batch interrupts) or NAPI (polling mode for high packet rates). This reduces interrupt overhead and improves performance.

Another story: A service was disabling interrupts for too long during a critical section. During this time, the system couldn't handle any events (keyboard, network, timer). The system appeared frozen. The fix? Minimize interrupt disable time. Only disable interrupts for the shortest time necessary. Use other synchronization mechanisms (locks) when possible. Understanding interrupt handling helps you write efficient kernel code.


Examples

Example 1: Hardware Interrupt (Network Packet)

Network card receives packet
Network card generates interrupt signal
CPU stops current task, saves state
CPU jumps to interrupt handler (kernel mode)
Interrupt handler processes packet
CPU restores state, resumes previous task

Asynchronous: Interrupt can arrive at any time

Example 2: Software Trap (System Call)

User program calls read()
Program executes trap instruction (software exception)
CPU switches to kernel mode
Kernel handles system call
CPU switches back to user mode
Return to user program

Synchronous: Trap is explicitly triggered by program

Example 3: Interrupt Nesting

Low-priority interrupt handler running
High-priority interrupt arrives
Low-priority handler interrupted
High-priority handler runs
High-priority handler completes
Low-priority handler resumes

Nesting: Interrupts can interrupt other interrupts


Common Pitfalls

Pitfall 1: Interrupt handlers doing too much work

  • Problem: Interrupt handlers run with interrupts disabled, blocking other interrupts
  • Solution: Keep interrupt handlers minimal (set flags, queue work), do heavy work in bottom half or deferred work
  • Example: Network interrupt handler should just queue packet, not process it fully

Pitfall 2: Not handling interrupt nesting

  • Problem: Interrupt handlers can be interrupted by higher-priority interrupts
  • Solution: Design handlers to be re-entrant, use proper locking, avoid shared state
  • Example: Interrupt handler accessing shared data without proper synchronization can cause race conditions

Pitfall 3: Disabling interrupts for too long

  • Problem: Disabling interrupts too long makes system unresponsive (can't handle events)
  • Solution: Minimize interrupt disable time, use other synchronization mechanisms when possible
  • Example: Using interrupt disable for critical sections instead of locks can cause system hangs

Pitfall 4: Ignoring interrupt overhead

  • Problem: High interrupt rates can overwhelm CPU (interrupt storm)
  • Solution: Use interrupt coalescing, NAPI for network, batch interrupt processing
  • Example: Network card generating one interrupt per packet can overwhelm CPU at high packet rates

Pitfall 5: Not distinguishing interrupts from traps

  • Problem: Confusing hardware interrupts with software traps (system calls, exceptions)
  • Solution: Understand that interrupts are hardware events, traps are software exceptions
  • Example: System calls use traps (software), not interrupts (hardware)

Interview Questions

Beginner

Q: What is the difference between an interrupt and a trap?

A: An interrupt is a hardware event that occurs asynchronously (e.g., keyboard press, network packet arrival, timer tick). The hardware generates an interrupt signal, and the CPU stops what it's doing to handle it. A trap is a software exception that occurs synchronously (e.g., system call, division by zero, page fault). The software explicitly triggers a trap (like calling a system call), and the CPU switches to kernel mode to handle it. Both cause the CPU to switch to kernel mode, but interrupts are hardware-driven and asynchronous, while traps are software-driven and synchronous.


Intermediate

Q: Why can high interrupt rates degrade system performance, and how can you mitigate this?

A: High interrupt rates degrade performance because:

  1. Overhead: Each interrupt requires saving/restoring CPU state (context switch overhead)
  2. Cache pollution: Interrupt handlers can evict useful data from CPU cache
  3. CPU time: CPU spends time handling interrupts instead of doing useful work
  4. Interrupt storms: Extremely high rates can overwhelm the CPU

Mitigation strategies:

  • Interrupt coalescing: Batch multiple interrupts together (e.g., process 10 packets per interrupt instead of 1)
  • NAPI (Linux): For network devices, use polling mode at high packet rates instead of interrupt-per-packet
  • Priority interrupts: Use interrupt priorities to defer non-critical interrupts
  • Bottom halves: Move heavy processing out of interrupt handler to deferred work
  • Hardware offloading: Use hardware features to reduce interrupt rate (e.g., TCP offloading)

For example, a network card generating 100,000 interrupts/second (one per packet) can consume significant CPU. Using NAPI or interrupt coalescing to batch 100 packets per interrupt reduces interrupt rate to 1,000/second, dramatically improving performance.


Senior

Q: How would you design an interrupt handling system for a real-time embedded system that needs to handle multiple high-priority interrupts with strict timing requirements?

A: I would design a priority-based interrupt handling system:

  1. Interrupt priorities:

    • Assign priorities based on timing requirements (highest for critical real-time events)
    • Use hardware interrupt controllers (like ARM GIC) that support priority-based preemption
    • Ensure critical interrupts can preempt lower-priority ones
  2. Two-level interrupt handling:

    • Top half (ISR): Minimal work in interrupt handler (acknowledge interrupt, save minimal state)
    • Bottom half: Defer non-critical work to bottom half or task context
    • Use work queues or tasklets for deferred work
  3. Interrupt nesting control:

    • Allow nested interrupts for higher-priority interrupts
    • Disable lower-priority interrupts during critical sections
    • Use interrupt masking carefully to prevent priority inversion
  4. Deterministic timing:

    • Measure and bound interrupt handler execution time
    • Use worst-case execution time (WCET) analysis
    • Ensure interrupt handlers complete within timing constraints
  5. Interrupt coalescing (selective):

    • Use coalescing only for non-critical interrupts
    • Keep critical interrupts immediate for low latency
    • Balance latency vs. CPU overhead
  6. Hardware support:

    • Use hardware features (DMA, interrupt controllers) to reduce software overhead
    • Offload work to hardware when possible
    • Use dedicated interrupt lines for critical events
  7. Monitoring and debugging:

    • Track interrupt latencies and handler execution times
    • Monitor interrupt rates and CPU time spent in interrupts
    • Use hardware timers to measure interrupt response time
  8. Testing:

    • Test worst-case interrupt scenarios
    • Verify timing constraints under load
    • Test interrupt nesting and priority handling

This design ensures deterministic, low-latency interrupt handling while maintaining system responsiveness.


  • Interrupts: Hardware events that interrupt CPU execution, require saving/restoring state

  • Traps: Software exceptions (system calls, exceptions) that trigger mode switches

  • Interrupt handling: Save state → handle interrupt → restore state (has overhead)

  • Interrupt nesting: Interrupt handlers can be interrupted, requires careful handling

  • Interrupt disable: OS disables interrupts during critical sections (must be brief)

  • Best practices: Minimize interrupt handler time, use coalescing for high rates, optimize for performance

  • System Calls - How system calls use traps to switch from user mode to kernel mode

  • Kernel Mode vs User Mode - How interrupts and traps trigger mode switches

  • I/O Management - How interrupts are used to handle I/O events from devices

  • Context Switching - How interrupts trigger context switches for process scheduling

  • Scheduling Algorithms - How timer interrupts trigger scheduling decisions

Key Takeaways

Interrupts: Hardware events that interrupt CPU execution, require saving/restoring state

Traps: Software exceptions (system calls, exceptions) that trigger mode switches

Interrupt handling: Save state → handle interrupt → restore state (has overhead)

Interrupt nesting: Interrupt handlers can be interrupted, requires careful handling

Interrupt disable: OS disables interrupts during critical sections (must be brief)

Best practices: Minimize interrupt handler time, use coalescing for high rates, optimize for performance


About the author

InterviewCrafted helps you master system design with patience. We believe in curiosity-led engineering, reflective writing, and designing systems that make future changes feel calm.