Topic Overview
Operating Systems: Processes, Memory & Scheduling
Understand OS fundamentals: processes/threads, memory, scheduling, and how kernels manage resources.
Operating Systems
Why Engineers Care About This
Think of the operating system as a restaurant manager. You're the customer (your application), and you want food (CPU time, memory, disk access). The manager doesn't just hand you whatever you ask for—they coordinate with the kitchen (CPU), manage the waitlist (process scheduling), handle reservations (memory allocation), and make sure one customer doesn't hog all the tables (resource isolation).
When your API suddenly becomes slow, or your database connections get exhausted, or your service crashes with "out of memory" errors, you're hitting the boundaries where your code meets the operating system. Understanding these boundaries is what separates engineers who can debug production issues from those who can't.
In interviews, when someone asks "How would you design a system that handles 10,000 concurrent requests?", they're really asking: "Do you understand that your application runs on an OS that has limits? Do you know where those limits are, and how to work with them instead of against them?"
Core Intuitions You Must Build
-
Processes are isolated, threads share memory. A process is like a separate apartment—it has its own address space. Threads are like roommates sharing that apartment. When one thread crashes, it can take down the whole process. When one process crashes, others keep running. This is why microservices run as separate processes, not threads.
-
The kernel is a bottleneck. Every time your code needs to read a file, make a network call, or allocate memory, it crosses into kernel space. This boundary crossing has a cost. When thousands of requests all need to make syscalls at once, the kernel becomes the bottleneck. This is why async I/O matters—it lets you make many syscalls without blocking.
-
Memory is a lie. Your process thinks it has a huge, continuous address space. In reality, the OS maps virtual addresses to physical RAM (or disk, if you're swapping). When you allocate memory, you're not actually getting it until you write to it. This is why "memory leaks" can be subtle—you might think you freed memory, but the OS hasn't reclaimed it yet.
-
Scheduling is about fairness and throughput. The OS scheduler is like a traffic cop at a busy intersection. It has to decide: do I let this one car through quickly, or do I rotate through everyone fairly? Different scheduling algorithms optimize for different things. Real-time systems prioritize latency. Batch systems prioritize throughput. Web servers need a balance.
-
I/O waits are where time goes. Your CPU can execute billions of instructions per second. But when it's waiting for a disk read or network packet, it's doing nothing. This is why profiling shows "99% CPU idle" even when your system feels slow—the CPU is waiting for I/O. Understanding this changes how you design systems.
-
Concurrency primitives leak into your code. Locks, semaphores, mutexes—these are OS concepts that show up in your application code. When you use a mutex in your code, you're using the OS's ability to put threads to sleep and wake them up. Understanding how the OS implements these helps you use them correctly.
Subtopics (Taught Through Real Scenarios)
Processes and Process Management
What people usually get wrong:
Most engineers think processes are just "running programs." That's technically true, but it misses the point. Processes are isolation boundaries. When you design a microservice architecture, you're using process isolation to prevent one service from crashing another. But many engineers don't realize that process creation has overhead—forking a process is expensive. This is why you don't spawn a new process for every HTTP request.
How this breaks systems in the real world:
Picture this: You're running a Node.js API that handles file uploads. Each upload needs to be processed, so you spawn a child process to do the heavy lifting. Under normal load, this works fine. But during a traffic spike, you spawn 1000 child processes in 10 seconds. Each process needs memory, file descriptors, and CPU time. Suddenly, your server runs out of process slots (there's a limit—usually around 32,000 on Linux). New requests can't spawn processes. Your API starts returning 500 errors. The fix? Use a worker pool with a fixed number of processes, or better yet, use threads or async processing.
What interviewers are really listening for:
When they ask "How would you handle 10,000 concurrent requests?", they want to hear you think about process limits, memory per process, and context switching overhead. Junior engineers say "spawn more processes." Senior engineers say "processes are heavy, use a thread pool or async I/O, and understand your OS limits." They're testing whether you know that the OS is a resource manager, not a magic box with infinite capacity.
Threads and Concurrency
What people usually get wrong:
Engineers often confuse "threads" with "parallelism." Threads let you do concurrent work, but true parallelism only happens if you have multiple CPU cores. On a single-core machine, threads just give the illusion of parallelism through time-slicing. Also, many engineers don't realize that thread creation has overhead too—not as much as processes, but still significant. Creating thousands of threads is a recipe for disaster.
How this breaks systems in the real world:
Here's a classic production story: A Java service was handling requests by creating a new thread for each request. Under normal load, this worked. But during a traffic spike, the service tried to create 50,000 threads. The OS ran out of thread slots. The JVM crashed with "unable to create native thread" errors. The service went down. The fix? Use a thread pool with a bounded size. But the real lesson is: threads share memory, so one buggy thread can corrupt data used by others. This is why race conditions are so dangerous—they're hard to reproduce and can cause data corruption.
What interviewers are really listening for:
They want to hear you talk about thread safety, race conditions, and when to use threads vs processes vs async I/O. Junior engineers say "use threads for concurrency." Senior engineers say "threads are for CPU-bound work, async I/O is for I/O-bound work, and always use thread pools with bounds." They're testing whether you understand the trade-offs and the OS limits.
Memory Management
What people usually get wrong:
Most engineers think memory management is about "malloc and free." That's the surface. The deeper truth is that the OS manages memory in pages (usually 4KB chunks), and your process sees a virtual address space that may not map to physical RAM. When you allocate memory, you're not necessarily getting RAM—you're getting a promise of RAM. The OS only gives you real RAM when you actually write to that memory. This is why you can "allocate" 1GB instantly, but writing to it takes time.
How this breaks systems in the real world:
A Python service was processing large JSON files. It loaded entire files into memory, processed them, and then moved to the next file. Under normal load, this worked. But when processing 100 files concurrently, the service started swapping—the OS moved memory pages to disk because RAM was full. Swapping is 1000x slower than RAM access. The service became unusable, taking minutes to process files that should take seconds. The fix? Process files in chunks, or limit concurrent processing. But the real lesson is: the OS will swap when RAM is full, and swapping kills performance.
What interviewers are really listening for:
They want to hear you think about memory limits, swapping, and how memory allocation actually works. Junior engineers say "just allocate more memory." Senior engineers say "understand your memory footprint, watch for memory leaks, and design for the memory you have, not the memory you wish you had." They're testing whether you know that memory is a finite resource managed by the OS.
I/O and System Calls
What people usually get wrong:
Engineers often think "I/O is just reading and writing files." But I/O includes network calls, database queries, and any operation that waits for something outside your process. The key insight is that I/O is slow compared to CPU operations. A disk read takes milliseconds. A network call can take hundreds of milliseconds. During that time, your CPU is idle. This is why blocking I/O is wasteful—you're paying for a CPU that's doing nothing.
How this breaks systems in the real world:
A web service was making synchronous database calls. Each request would query the database, wait for the response, then return. Under normal load, this worked fine—each request took 50ms. But during a traffic spike, 1000 requests arrived in one second. Each request held a database connection for 50ms. The connection pool (100 connections) was exhausted in 5 seconds. New requests had to wait for connections to free up. Response times went from 50ms to 5 seconds. The fix? Use async I/O or connection pooling with proper limits. But the real lesson is: blocking I/O doesn't scale. You need non-blocking I/O or async I/O to handle high concurrency.
What interviewers are really listening for:
They want to hear you talk about blocking vs non-blocking I/O, async patterns, and how I/O waits affect system design. Junior engineers say "just make the calls faster." Senior engineers say "I/O is the bottleneck, use async I/O or connection pooling, and understand that syscalls have overhead." They're testing whether you know that I/O waits are where performance problems hide.
Scheduling and CPU Management
What people usually get wrong:
Engineers often think the CPU scheduler just "runs processes in order." But scheduling is complex. The OS has to balance fairness (every process gets a turn) with throughput (get work done quickly) with latency (respond quickly to interactive tasks). Different scheduling algorithms optimize for different goals. Most modern OSes use "completely fair scheduler" variants that try to balance all three.
How this breaks systems in the real world:
A server was running a batch job (processing logs) and a web API. The batch job was CPU-intensive and ran for hours. The web API needed to respond quickly to requests. Under the default scheduler, the batch job got most of the CPU time because it was always ready to run. The web API requests had to wait, causing high latency. Users complained. The fix? Use CPU affinity or nice values to prioritize the web API. But the real lesson is: the scheduler tries to be fair, but "fair" might not mean "what you want." You need to understand scheduling to tune performance.
What interviewers are really listening for:
They want to hear you think about CPU utilization, context switching overhead, and how scheduling affects latency. Junior engineers say "just use more CPUs." Senior engineers say "understand CPU scheduling, use CPU affinity for critical tasks, and measure context switching overhead." They're testing whether you know that CPU time is a resource that needs to be managed.
Failure Stories You'll Recognize
The Fork Bomb: A script accidentally called itself recursively, spawning new processes in an infinite loop. Each process spawned two more processes. Within seconds, the server had thousands of processes competing for CPU time. The system became unresponsive. The fix? Kill the parent process, or reboot. The lesson? Process creation is cheap, but unlimited process creation will kill your system. Always have limits.
The Memory Leak That Looked Like a Feature: A service was "working fine" for months, then suddenly started crashing with out-of-memory errors. Investigation showed that the service was slowly leaking memory—allocating memory but not freeing it. The leak was small (a few KB per request), so it took months to fill up RAM. But once RAM was full, the OS started swapping, and performance died. The fix? Find and fix the leak. The lesson? Memory leaks are sneaky. They don't show up until it's too late. Always monitor memory usage over time.
The Thread Pool Exhaustion: A service was using a thread pool with 100 threads. Each thread handled one request at a time. Under normal load, this worked. But during a traffic spike, 200 requests arrived simultaneously. The first 100 got threads immediately. The other 100 had to wait. But those waiting requests held HTTP connections open, and the load balancer thought the service was slow, so it sent more requests. The thread pool stayed exhausted, and the service became unresponsive. The fix? Increase thread pool size, or better yet, use async I/O. The lesson? Bounded resources need proper sizing and monitoring.
The Disk I/O Storm: A service was writing logs to disk synchronously. Under normal load, this was fine. But during a traffic spike, thousands of log writes queued up. The disk couldn't keep up. Each write took longer and longer. Eventually, the disk I/O queue was so long that the service couldn't write logs fast enough, and it started blocking on log writes. Response times spiked. The fix? Use async logging or write to a faster medium (like a local buffer that flushes to disk). The lesson? Disk I/O is slow. Don't block on it if you care about latency.
How InterviewCrafted Will Teach This
We'll teach operating systems through stories, not definitions. Instead of memorizing "a process is an instance of a running program," you'll learn through scenarios like "what happens when your API tries to spawn 10,000 processes?" You'll see how OS concepts show up in real system design problems.
Through Stories:
- We'll walk through production incidents where OS limits caused outages
- You'll see how microservices use process isolation, and when it helps vs hurts
- You'll learn why async I/O matters through stories of blocking I/O failures
Through Thought Experiments:
- "What if your service needs to handle 1 million concurrent connections? What OS limits will you hit?"
- "How would you design a system where one component is CPU-bound and another is I/O-bound?"
- "What happens to your API when the database connection pool is exhausted?"
Through Trade-offs:
- Processes vs threads vs async: when to use each, and why
- Memory vs CPU vs I/O: which is your bottleneck, and how to optimize
- Latency vs throughput: how OS scheduling affects both
Through "What Would You Do?" Moments:
- Your service is using 90% CPU but response times are slow. What do you investigate?
- Your service crashes with "out of memory" but you're only using 50% of RAM. What's happening?
- Your database queries are fast, but your API is slow. Where is the time going?
The goal isn't to memorize OS internals. It's to build intuition about how your code interacts with the OS, so you can design systems that work with OS limits instead of against them. When an interviewer asks "how would you scale this?", you'll think about processes, memory, I/O, and scheduling—not just "add more servers."