Operating Systems Topic

Interrupts & Traps

Understand interrupts (hardware events) and traps (software exceptions) in operating systems.

Medium8 min read

Interrupts & Traps

Why This Matters

Think of interrupts like a doorbell. When someone rings the doorbell (hardware event), you stop what you're doing (CPU stops current task), answer the door (handle the interrupt), then go back to what you were doing (resume previous task). Interrupts allow the CPU to respond to events (keyboard input, network packets, timer ticks) without constantly checking (polling). Traps are like software doorbells—they're triggered by software (exceptions, system calls).

This matters because interrupts are how the OS responds to events efficiently. Without interrupts, the CPU would have to constantly poll devices ("is there a keypress? is there a network packet?"), wasting CPU cycles. Interrupts allow devices to notify the CPU when events occur, making the system efficient. Understanding interrupts helps you understand how I/O works and how the OS handles events.

In interviews, when someone asks "How does the OS handle I/O?", they're testing whether you understand interrupts. Do you know how interrupts work? Do you understand interrupt handlers? Most engineers don't. They just use I/O and assume it works.

What Engineers Usually Get Wrong

Most engineers think "interrupts are just events." But interrupts involve saving CPU state, switching to interrupt handler, handling the interrupt, and restoring state. This has overhead. Also, interrupts can be nested (interrupt handler can be interrupted), which requires careful handling. Understanding this helps you understand interrupt overhead and system behavior.

Engineers also don't understand that interrupts can be disabled. The OS disables interrupts during critical sections (like updating kernel data structures) to prevent race conditions. If interrupts are disabled too long, the system becomes unresponsive (can't handle events). Understanding this helps you understand why some operations must be fast.

How This Breaks Systems in the Real World

A service was experiencing high interrupt rates. A network card was generating thousands of interrupts per second (one per packet). The CPU spent most of its time handling interrupts, leaving little time for actual work. The service became slow. The fix? Use interrupt coalescing (batch interrupts) or NAPI (polling mode for high packet rates). This reduces interrupt overhead and improves performance.

Another story: A service was disabling interrupts for too long during a critical section. During this time, the system couldn't handle any events (keyboard, network, timer). The system appeared frozen. The fix? Minimize interrupt disable time. Only disable interrupts for the shortest time necessary. Use other synchronization mechanisms (locks) when possible. Understanding interrupt handling helps you write efficient kernel code.

Examples

Example 1: Hardware Interrupt (Network Packet)

Network card receives packet
  ↓
Network card generates interrupt signal
  ↓
CPU stops current task, saves state
  ↓
CPU jumps to interrupt handler (kernel mode)
  ↓
Interrupt handler processes packet
  ↓
CPU restores state, resumes previous task

Asynchronous: Interrupt can arrive at any time

Example 2: Software Trap (System Call)

User program calls read()
  ↓
Program executes trap instruction (software exception)
  ↓
CPU switches to kernel mode
  ↓
Kernel handles system call
  ↓
CPU switches back to user mode
  ↓
Return to user program

Synchronous: Trap is explicitly triggered by program

Example 3: Interrupt Nesting

Low-priority interrupt handler running
  ↓
High-priority interrupt arrives
  ↓
Low-priority handler interrupted
  ↓
High-priority handler runs
  ↓
High-priority handler completes
  ↓
Low-priority handler resumes

Nesting: Interrupts can interrupt other interrupts

Common Pitfalls

Pitfall 1: Interrupt handlers doing too much work

Problem: Interrupt handlers run with interrupts disabled, blocking other interrupts
Solution: Keep interrupt handlers minimal (set flags, queue work), do heavy work in bottom half or deferred work
Example: Network interrupt handler should just queue packet, not process it fully

Pitfall 2: Not handling interrupt nesting

Problem: Interrupt handlers can be interrupted by higher-priority interrupts
Solution: Design handlers to be re-entrant, use proper locking, avoid shared state
Example: Interrupt handler accessing shared data without proper synchronization can cause race conditions

Pitfall 3: Disabling interrupts for too long

Problem: Disabling interrupts too long makes system unresponsive (can't handle events)
Solution: Minimize interrupt disable time, use other synchronization mechanisms when possible
Example: Using interrupt disable for critical sections instead of locks can cause system hangs

Pitfall 4: Ignoring interrupt overhead

Problem: High interrupt rates can overwhelm CPU (interrupt storm)
Solution: Use interrupt coalescing, NAPI for network, batch interrupt processing
Example: Network card generating one interrupt per packet can overwhelm CPU at high packet rates

Pitfall 5: Not distinguishing interrupts from traps

Problem: Confusing hardware interrupts with software traps (system calls, exceptions)
Solution: Understand that interrupts are hardware events, traps are software exceptions
Example: System calls use traps (software), not interrupts (hardware)

Interview Questions

Beginner

Q: What is the difference between an interrupt and a trap?

A: An interrupt is a hardware event that occurs asynchronously (e.g., keyboard press, network packet arrival, timer tick). The hardware generates an interrupt signal, and the CPU stops what it's doing to handle it. A trap is a software exception that occurs synchronously (e.g., system call, division by zero, page fault). The software explicitly triggers a trap (like calling a system call), and the CPU switches to kernel mode to handle it. Both cause the CPU to switch to kernel mode, but interrupts are hardware-driven and asynchronous, while traps are software-driven and synchronous.

Intermediate

Q: Why can high interrupt rates degrade system performance, and how can you mitigate this?

A: High interrupt rates degrade performance because:

Overhead: Each interrupt requires saving/restoring CPU state (context switch overhead)
Cache pollution: Interrupt handlers can evict useful data from CPU cache
CPU time: CPU spends time handling interrupts instead of doing useful work
Interrupt storms: Extremely high rates can overwhelm the CPU

Mitigation strategies:

Interrupt coalescing: Batch multiple interrupts together (e.g., process 10 packets per interrupt instead of 1)
NAPI (Linux): For network devices, use polling mode at high packet rates instead of interrupt-per-packet
Priority interrupts: Use interrupt priorities to defer non-critical interrupts
Bottom halves: Move heavy processing out of interrupt handler to deferred work
Hardware offloading: Use hardware features to reduce interrupt rate (e.g., TCP offloading)

For example, a network card generating 100,000 interrupts/second (one per packet) can consume significant CPU. Using NAPI or interrupt coalescing to batch 100 packets per interrupt reduces interrupt rate to 1,000/second, dramatically improving performance.

Senior

Q: How would you design an interrupt handling system for a real-time embedded system that needs to handle multiple high-priority interrupts with strict timing requirements?

A: I would design a priority-based interrupt handling system:

Interrupt priorities:
- Assign priorities based on timing requirements (highest for critical real-time events)
- Use hardware interrupt controllers (like ARM GIC) that support priority-based preemption
- Ensure critical interrupts can preempt lower-priority ones
Two-level interrupt handling:
- Top half (ISR): Minimal work in interrupt handler (acknowledge interrupt, save minimal state)
- Bottom half: Defer non-critical work to bottom half or task context
- Use work queues or tasklets for deferred work
Interrupt nesting control:
- Allow nested interrupts for higher-priority interrupts
- Disable lower-priority interrupts during critical sections
- Use interrupt masking carefully to prevent priority inversion
Deterministic timing:
- Measure and bound interrupt handler execution time
- Use worst-case execution time (WCET) analysis
- Ensure interrupt handlers complete within timing constraints
Interrupt coalescing (selective):
- Use coalescing only for non-critical interrupts
- Keep critical interrupts immediate for low latency
- Balance latency vs. CPU overhead
Hardware support:
- Use hardware features (DMA, interrupt controllers) to reduce software overhead
- Offload work to hardware when possible
- Use dedicated interrupt lines for critical events
Monitoring and debugging:
- Track interrupt latencies and handler execution times
- Monitor interrupt rates and CPU time spent in interrupts
- Use hardware timers to measure interrupt response time
Testing:
- Test worst-case interrupt scenarios
- Verify timing constraints under load
- Test interrupt nesting and priority handling

This design ensures deterministic, low-latency interrupt handling while maintaining system responsiveness.

Key Takeaways

Interrupts: Hardware events that interrupt CPU execution, require saving/restoring state

Traps: Software exceptions (system calls, exceptions) that trigger mode switches

Interrupt handling: Save state → handle interrupt → restore state (has overhead)

Interrupt nesting: Interrupt handlers can be interrupted, requires careful handling

Interrupt disable: OS disables interrupts during critical sections (must be brief)

Best practices: Minimize interrupt handler time, use coalescing for high rates, optimize for performance

Keep exploring

Kernel concepts stack on each other. Return to the hub and pick the next topic that closes a gap you noticed here.

Interrupts & Traps

Interrupts & Traps

Why This Matters

What Engineers Usually Get Wrong

How This Breaks Systems in the Real World

Examples

Example 1: Hardware Interrupt (Network Packet)

Example 2: Software Trap (System Call)

Example 3: Interrupt Nesting

Common Pitfalls

Interview Questions

Beginner

Intermediate

Senior

Key Takeaways

Related Topics

Keep exploring