Topic Overview
Interrupts & Traps
Understand interrupts (hardware events) and traps (software exceptions) in operating systems.
Interrupts & Traps
Why This Matters
Think of interrupts like a doorbell. When someone rings the doorbell (hardware event), you stop what you're doing (CPU stops current task), answer the door (handle the interrupt), then go back to what you were doing (resume previous task). Interrupts allow the CPU to respond to events (keyboard input, network packets, timer ticks) without constantly checking (polling). Traps are like software doorbells—they're triggered by software (exceptions, system calls).
This matters because interrupts are how the OS responds to events efficiently. Without interrupts, the CPU would have to constantly poll devices ("is there a keypress? is there a network packet?"), wasting CPU cycles. Interrupts allow devices to notify the CPU when events occur, making the system efficient. Understanding interrupts helps you understand how I/O works and how the OS handles events.
In interviews, when someone asks "How does the OS handle I/O?", they're testing whether you understand interrupts. Do you know how interrupts work? Do you understand interrupt handlers? Most engineers don't. They just use I/O and assume it works.
What Engineers Usually Get Wrong
Most engineers think "interrupts are just events." But interrupts involve saving CPU state, switching to interrupt handler, handling the interrupt, and restoring state. This has overhead. Also, interrupts can be nested (interrupt handler can be interrupted), which requires careful handling. Understanding this helps you understand interrupt overhead and system behavior.
Engineers also don't understand that interrupts can be disabled. The OS disables interrupts during critical sections (like updating kernel data structures) to prevent race conditions. If interrupts are disabled too long, the system becomes unresponsive (can't handle events). Understanding this helps you understand why some operations must be fast.
How This Breaks Systems in the Real World
A service was experiencing high interrupt rates. A network card was generating thousands of interrupts per second (one per packet). The CPU spent most of its time handling interrupts, leaving little time for actual work. The service became slow. The fix? Use interrupt coalescing (batch interrupts) or NAPI (polling mode for high packet rates). This reduces interrupt overhead and improves performance.
Another story: A service was disabling interrupts for too long during a critical section. During this time, the system couldn't handle any events (keyboard, network, timer). The system appeared frozen. The fix? Minimize interrupt disable time. Only disable interrupts for the shortest time necessary. Use other synchronization mechanisms (locks) when possible. Understanding interrupt handling helps you write efficient kernel code.
Examples
Example 1: Hardware Interrupt (Network Packet)
Network card receives packet
↓
Network card generates interrupt signal
↓
CPU stops current task, saves state
↓
CPU jumps to interrupt handler (kernel mode)
↓
Interrupt handler processes packet
↓
CPU restores state, resumes previous task
Asynchronous: Interrupt can arrive at any time
Example 2: Software Trap (System Call)
User program calls read()
↓
Program executes trap instruction (software exception)
↓
CPU switches to kernel mode
↓
Kernel handles system call
↓
CPU switches back to user mode
↓
Return to user program
Synchronous: Trap is explicitly triggered by program
Example 3: Interrupt Nesting
Low-priority interrupt handler running
↓
High-priority interrupt arrives
↓
Low-priority handler interrupted
↓
High-priority handler runs
↓
High-priority handler completes
↓
Low-priority handler resumes
Nesting: Interrupts can interrupt other interrupts
Common Pitfalls
Pitfall 1: Interrupt handlers doing too much work
- Problem: Interrupt handlers run with interrupts disabled, blocking other interrupts
- Solution: Keep interrupt handlers minimal (set flags, queue work), do heavy work in bottom half or deferred work
- Example: Network interrupt handler should just queue packet, not process it fully
Pitfall 2: Not handling interrupt nesting
- Problem: Interrupt handlers can be interrupted by higher-priority interrupts
- Solution: Design handlers to be re-entrant, use proper locking, avoid shared state
- Example: Interrupt handler accessing shared data without proper synchronization can cause race conditions
Pitfall 3: Disabling interrupts for too long
- Problem: Disabling interrupts too long makes system unresponsive (can't handle events)
- Solution: Minimize interrupt disable time, use other synchronization mechanisms when possible
- Example: Using interrupt disable for critical sections instead of locks can cause system hangs
Pitfall 4: Ignoring interrupt overhead
- Problem: High interrupt rates can overwhelm CPU (interrupt storm)
- Solution: Use interrupt coalescing, NAPI for network, batch interrupt processing
- Example: Network card generating one interrupt per packet can overwhelm CPU at high packet rates
Pitfall 5: Not distinguishing interrupts from traps
- Problem: Confusing hardware interrupts with software traps (system calls, exceptions)
- Solution: Understand that interrupts are hardware events, traps are software exceptions
- Example: System calls use traps (software), not interrupts (hardware)
Interview Questions
Beginner
Q: What is the difference between an interrupt and a trap?
A: An interrupt is a hardware event that occurs asynchronously (e.g., keyboard press, network packet arrival, timer tick). The hardware generates an interrupt signal, and the CPU stops what it's doing to handle it. A trap is a software exception that occurs synchronously (e.g., system call, division by zero, page fault). The software explicitly triggers a trap (like calling a system call), and the CPU switches to kernel mode to handle it. Both cause the CPU to switch to kernel mode, but interrupts are hardware-driven and asynchronous, while traps are software-driven and synchronous.
Intermediate
Q: Why can high interrupt rates degrade system performance, and how can you mitigate this?
A: High interrupt rates degrade performance because:
- Overhead: Each interrupt requires saving/restoring CPU state (context switch overhead)
- Cache pollution: Interrupt handlers can evict useful data from CPU cache
- CPU time: CPU spends time handling interrupts instead of doing useful work
- Interrupt storms: Extremely high rates can overwhelm the CPU
Mitigation strategies:
- Interrupt coalescing: Batch multiple interrupts together (e.g., process 10 packets per interrupt instead of 1)
- NAPI (Linux): For network devices, use polling mode at high packet rates instead of interrupt-per-packet
- Priority interrupts: Use interrupt priorities to defer non-critical interrupts
- Bottom halves: Move heavy processing out of interrupt handler to deferred work
- Hardware offloading: Use hardware features to reduce interrupt rate (e.g., TCP offloading)
For example, a network card generating 100,000 interrupts/second (one per packet) can consume significant CPU. Using NAPI or interrupt coalescing to batch 100 packets per interrupt reduces interrupt rate to 1,000/second, dramatically improving performance.
Senior
Q: How would you design an interrupt handling system for a real-time embedded system that needs to handle multiple high-priority interrupts with strict timing requirements?
A: I would design a priority-based interrupt handling system:
-
Interrupt priorities:
- Assign priorities based on timing requirements (highest for critical real-time events)
- Use hardware interrupt controllers (like ARM GIC) that support priority-based preemption
- Ensure critical interrupts can preempt lower-priority ones
-
Two-level interrupt handling:
- Top half (ISR): Minimal work in interrupt handler (acknowledge interrupt, save minimal state)
- Bottom half: Defer non-critical work to bottom half or task context
- Use work queues or tasklets for deferred work
-
Interrupt nesting control:
- Allow nested interrupts for higher-priority interrupts
- Disable lower-priority interrupts during critical sections
- Use interrupt masking carefully to prevent priority inversion
-
Deterministic timing:
- Measure and bound interrupt handler execution time
- Use worst-case execution time (WCET) analysis
- Ensure interrupt handlers complete within timing constraints
-
Interrupt coalescing (selective):
- Use coalescing only for non-critical interrupts
- Keep critical interrupts immediate for low latency
- Balance latency vs. CPU overhead
-
Hardware support:
- Use hardware features (DMA, interrupt controllers) to reduce software overhead
- Offload work to hardware when possible
- Use dedicated interrupt lines for critical events
-
Monitoring and debugging:
- Track interrupt latencies and handler execution times
- Monitor interrupt rates and CPU time spent in interrupts
- Use hardware timers to measure interrupt response time
-
Testing:
- Test worst-case interrupt scenarios
- Verify timing constraints under load
- Test interrupt nesting and priority handling
This design ensures deterministic, low-latency interrupt handling while maintaining system responsiveness.
-
Interrupts: Hardware events that interrupt CPU execution, require saving/restoring state
-
Traps: Software exceptions (system calls, exceptions) that trigger mode switches
-
Interrupt handling: Save state → handle interrupt → restore state (has overhead)
-
Interrupt nesting: Interrupt handlers can be interrupted, requires careful handling
-
Interrupt disable: OS disables interrupts during critical sections (must be brief)
-
Best practices: Minimize interrupt handler time, use coalescing for high rates, optimize for performance
-
System Calls - How system calls use traps to switch from user mode to kernel mode
-
Kernel Mode vs User Mode - How interrupts and traps trigger mode switches
-
I/O Management - How interrupts are used to handle I/O events from devices
-
Context Switching - How interrupts trigger context switches for process scheduling
-
Scheduling Algorithms - How timer interrupts trigger scheduling decisions
Key Takeaways
Interrupts: Hardware events that interrupt CPU execution, require saving/restoring state
Traps: Software exceptions (system calls, exceptions) that trigger mode switches
Interrupt handling: Save state → handle interrupt → restore state (has overhead)
Interrupt nesting: Interrupt handlers can be interrupted, requires careful handling
Interrupt disable: OS disables interrupts during critical sections (must be brief)
Best practices: Minimize interrupt handler time, use coalescing for high rates, optimize for performance
Related Topics
System Calls
How system calls use traps to switch from user mode to kernel mode
Kernel Mode vs User Mode
How interrupts and traps trigger mode switches
I/O Management
How interrupts are used to handle I/O events from devices
Context Switching
How interrupts trigger context switches for process scheduling
Scheduling Algorithms
How timer interrupts trigger scheduling decisions
What's next?