Operating Systems Topic

Process vs Thread

Understand the fundamental differences between processes and threads: isolation, memory sharing, context switching, and when to use each.

Beginner10 min read

Process vs Thread

Why This Matters

Think of processes as separate apartments and threads as roommates sharing an apartment. If one roommate (thread) breaks something, everyone in that apartment is affected. But if someone in a different apartment (process) has a problem, your apartment is fine.

This distinction matters because it determines how your system fails. When a thread crashes, it can take down the entire process. When a process crashes, other processes keep running. This is why microservices run as separate processes—one service crashing doesn't kill the others.

In interviews, when someone asks "How would you design a system that handles 10,000 concurrent requests?", they're testing whether you understand this trade-off. Do you use processes for isolation, or threads for efficiency? The answer depends on what you're optimizing for.

What Engineers Usually Get Wrong

Most engineers think "threads are faster, so use threads." But that's missing the point. Threads are faster to create and switch between, but they share memory. This means one buggy thread can corrupt data used by other threads. Processes are slower to create, but they're isolated. One process crashing doesn't affect others.

Also, engineers often confuse "concurrency" with "parallelism." Threads give you concurrency (multiple things happening), but true parallelism only happens if you have multiple CPU cores. On a single-core machine, threads just take turns. Processes can actually run in parallel on different cores.

How This Breaks Systems in the Real World

A Java web service was handling requests by creating a new thread for each request. Under normal load, this worked fine. But during a traffic spike, the service tried to create 50,000 threads. The OS ran out of thread slots (there's a limit—usually around 32,000 on Linux). The JVM crashed with "unable to create native thread" errors. The service went down.

The fix? Use a thread pool with a bounded size (say, 1000 threads). But the real lesson is: threads share memory, so one buggy thread can corrupt data used by others. This is why race conditions are so dangerous—they're hard to reproduce and can cause data corruption.

Another story: A Python service was processing files. It used threads because "threads are faster." But Python has the Global Interpreter Lock (GIL), which means only one thread can execute Python code at a time. So the threads weren't actually running in parallel—they were just taking turns. The service was slow. The fix? Use processes instead. Each process has its own Python interpreter, so they can actually run in parallel on multiple CPU cores.

Process

Definition: An independent program in execution with its own memory space.

Characteristics:

Isolated memory: Each process has its own address space
Independent execution: Processes don't share memory (by default)
Heavyweight: Higher overhead for creation and context switching
Process ID (PID): Unique identifier for each process
Protection: One process cannot directly access another's memory

Memory Layout:

Process A                    Process B
┌─────────────┐             ┌─────────────┐
│   Stack     │             │   Stack     │
│   Heap      │             │   Heap      │
│   Data      │             │   Data      │
│   Code      │             │   Code      │
└─────────────┘             └─────────────┘
     ↓                           ↓
  Separate                      Separate
  Address Space              Address Space

Process Creation Example (Python)

import os
import multiprocessing

def worker_process(name):
    """Worker function for a process"""
    print(f"Process {name} (PID: {os.getpid()})")
    # Each process has its own memory space
    data = [1, 2, 3]  # Isolated to this process
    print(f"Process {name} data: {data}")

if __name__ == '__main__':
    # Create processes
    p1 = multiprocessing.Process(target=worker_process, args=('A',))
    p2 = multiprocessing.Process(target=worker_process, args=('B',))
    
    p1.start()
    p2.start()
    
    p1.join()
    p2.join()

Output:

Process A (PID: 1234)
Process A data: [1, 2, 3]
Process B (PID: 1235)
Process B data: [1, 2, 3]

Thread

Definition: A lightweight unit of execution within a process that shares the process's memory space.

Characteristics:

Shared memory: All threads in a process share the same address space
Lightweight: Lower overhead for creation and context switching
Thread ID (TID): Unique identifier within a process
Communication: Threads can directly access shared memory
Synchronization needed: Requires locks, mutexes to prevent race conditions

Memory Layout:

Process
┌─────────────────────────────────┐
│         Shared Memory            │
│  ┌─────────┐  ┌─────────┐       │
│  │ Thread1 │  │ Thread2 │       │
│  │  Stack  │  │  Stack  │       │
│  └─────────┘  └─────────┘       │
│         Shared Heap              │
│         Shared Data              │
│         Shared Code              │
└─────────────────────────────────┘

Thread Creation Example (Python)

import threading

shared_data = []  # Shared by all threads

def worker_thread(name):
    """Worker function for a thread"""
    print(f"Thread {name} (TID: {threading.current_thread().ident})")
    # All threads share the same memory
    shared_data.append(name)
    print(f"Shared data: {shared_data}")

# Create threads
t1 = threading.Thread(target=worker_thread, args=('A',))
t2 = threading.Thread(target=worker_thread, args=('B',))

t1.start()
t2.start()

t1.join()
t2.join()

print(f"Final shared data: {shared_data}")

Output:

Thread A (TID: 140234567890432)
Shared data: ['A']
Thread B (TID: 140234567891456)
Shared data: ['A', 'B']
Final shared data: ['A', 'B']

Thread Synchronization Example

import threading

counter = 0
lock = threading.Lock()  # Mutex for synchronization

def increment():
    global counter
    for _ in range(100000):
        with lock:  # Acquire lock
            counter += 1
        # Lock released automatically

# Create threads
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)

t1.start()
t2.start()

t1.join()
t2.join()

print(f"Final counter: {counter}")  # Should be 200000

Key Differences

Aspect	Process	Thread
Memory	Isolated address space	Shared address space
Communication	IPC (pipes, sockets, shared memory)	Shared memory (direct access)
Overhead	High (separate memory, resources)	Low (shared memory)
Context Switch	Expensive (save/restore entire memory)	Cheaper (save/restore registers)
Fault Isolation	One process crash doesn't affect others	One thread crash can affect all threads
Creation Time	Slow	Fast
Synchronization	Not needed (isolated)	Required (shared memory)

When to Use Processes vs Threads

Use Processes When:

Fault isolation needed: One failure shouldn't crash the entire system
CPU-bound tasks: Parallel computation on multiple CPUs
Independent tasks: Tasks don't need to share data
Security: Isolated execution environments

Example: Web server handling multiple requests (each request = process)

Use Threads When:

I/O-bound tasks: Waiting for network, disk I/O
Shared data: Tasks need to share memory efficiently
Lightweight concurrency: Many concurrent tasks
GUI applications: Responsive UI while processing

Example: Web server handling multiple requests (each request = thread)

Context Switching

Process Context Switch

Save: Entire process state
  - CPU registers
  - Memory mappings
  - Open files
  - Process control block (PCB)
  
Restore: New process state
  - Flush TLB (Translation Lookaside Buffer)
  - Load new memory mappings
  - Restore registers
  
Cost: High (microseconds)

Thread Context Switch

Save: Thread-specific state
  - CPU registers
  - Stack pointer
  - Thread control block (TCB)
  
Restore: New thread state
  - Same memory space (no TLB flush)
  - Restore registers
  
Cost: Low (nanoseconds)

Performance:

Process context switch: ~1-10 microseconds
Thread context switch: ~0.1-1 microsecond

Failure Stories You'll Recognize

The Thread Pool Exhaustion: A service created threads on demand. During a traffic spike, it tried to create 10,000 threads. The OS limit was 8,000. The service crashed. The fix? Use a bounded thread pool.

The Race Condition That Corrupted Data: A service used threads to process orders. Two threads tried to update the same order simultaneously. One thread's changes overwrote the other's. Orders were lost. The fix? Use locks or use processes (which don't share memory).

The Process Fork Bomb: A script accidentally called itself recursively, spawning new processes in an infinite loop. Within seconds, the server had thousands of processes competing for CPU time. The system became unresponsive. The fix? Kill the parent process, or reboot. The lesson? Process creation is cheap, but unlimited process creation will kill your system.

What Interviewers Are Really Testing

They want to hear you think about isolation, memory sharing, and when to use each. Junior engineers say "use threads for concurrency." Senior engineers say "processes for isolation and CPU-bound work, threads for I/O-bound work, and always use pools with bounds."

When they ask "How would you design a concurrent web server?", they're testing:

Do you understand that threads share memory and need synchronization?
Do you know that processes are isolated but have higher overhead?
Can you choose the right tool for the job?

Interview Questions

Beginner

Q: What is the difference between a process and a thread?

Process:

Independent program in execution
Has its own isolated memory space
Heavyweight (higher overhead)
Process ID (PID) for identification
One process crash doesn't affect others
Communication via IPC (Inter-Process Communication)

Thread:

Lightweight unit of execution within a process
Shares memory space with other threads in the same process
Lightweight (lower overhead)
Thread ID (TID) for identification
One thread crash can affect all threads in the process
Communication via shared memory (direct access)

Key Difference:

Process: Isolated memory, independent execution
Thread: Shared memory, requires synchronization

Example:

Process: Like separate apartments (isolated)
Thread: Like rooms in the same apartment (shared)

Intermediate

Q: When would you use processes vs threads? Explain with examples.

Use Processes When:

Fault Isolation

# Web server: Each request = process
# If one request crashes, others continue
for request in requests:
    process = multiprocessing.Process(target=handle_request, args=(request,))
    process.start()

CPU-Bound Tasks (Python)

# Parallel computation on multiple CPUs
# Python GIL limits threads, use processes
with multiprocessing.Pool() as pool:
    results = pool.map(compute_heavy_task, data)

Independent Tasks

# Tasks don't need to share data
# Each process has its own memory
processes = []
for task in independent_tasks:
    p = multiprocessing.Process(target=task)
    processes.append(p)

Use Threads When:

I/O-Bound Tasks

# Network requests, file I/O
# Threads wait while I/O happens
threads = []
for url in urls:
    t = threading.Thread(target=fetch_url, args=(url,))
    threads.append(t)

Shared Data

# Multiple threads work on shared data structure
shared_queue = queue.Queue()

producer = threading.Thread(target=produce, args=(shared_queue,))
consumer = threading.Thread(target=consume, args=(shared_queue,))

GUI Applications

# Keep UI responsive while processing
def process_data():
    # Long-running task
    result = heavy_computation()
    update_ui(result)

thread = threading.Thread(target=process_data)
thread.start()  # UI remains responsive

Rule of Thumb:

Processes: CPU-bound, fault isolation, independent tasks
Threads: I/O-bound, shared data, lightweight concurrency

Senior

Q: Design a concurrent web server that handles 10,000 concurrent connections. Should you use processes or threads? How do you handle context switching, memory management, and fault isolation?

Hybrid Approach: Process Pool + Thread Pool

class ConcurrentWebServer {
  private threadPool: ThreadPool;
  private processPool: ProcessPool;
  private connectionManager: ConnectionManager;
  
  constructor() {
    // Hybrid approach: Processes for isolation, threads for I/O
    this.processPool = new ProcessPool({
      size: os.cpus().length,  // One process per CPU
      strategy: 'prefork'
    });
    
    // Thread pool within each process
    this.threadPool = new ThreadPool({
      size: 1000,  // 1000 threads per process
      queueSize: 10000
    });
    
    this.connectionManager = new ConnectionManager();
  }
  
  async handleRequest(request: Request): Promise<Response> {
    // 1. Accept connection (I/O-bound, use thread)
    const connection = await this.acceptConnection(request);
    
    // 2. Assign to process (load balanced)
    const process = this.processPool.getProcess();
    
    // 3. Handle in thread pool (I/O-bound)
    return await process.handleInThread(connection, async () => {
      // Process request (I/O: database, network)
      const response = await this.processRequest(connection);
      return response;
    });
  }
}

Design Decisions:

Hybrid Approach: Processes for isolation, threads for I/O
- Processes: Fault isolation, one per CPU core
- Threads: I/O concurrency, many per process
Context Switching Optimization
- Use epoll/kqueue (event-driven I/O)
- Minimize context switches
- Thread pool to reuse threads
Memory Management
- Each process: Isolated memory (crash doesn't affect others)
- Shared memory: Only for connection state (if needed)
- Connection pooling: Reuse connections
Fault Isolation
- Process crash: Only affects connections in that process
- Thread crash: Affects only that thread's connections
- Health checks: Restart failed processes

Alternative: Event-Driven (Node.js style)

// Single-threaded event loop
// Handles 10,000 connections with async I/O
// No context switching overhead
// But: One crash affects all connections

Trade-offs:

Processes + Threads: Better fault isolation, higher overhead
Event-driven: Lower overhead, less fault isolation
Hybrid: Balance of both

Examples

Example 1: Process Isolation

Scenario: Web server with multiple services

Using processes:

# Each service is a separate process
web_server = Process(target=serve_web)
api_server = Process(target=serve_api)
db_server = Process(target=serve_db)

# If web_server crashes, api_server and db_server continue
# Isolated memory, independent execution

Using threads:

# All services in same process
web_thread = Thread(target=serve_web)
api_thread = Thread(target=serve_api)
db_thread = Thread(target=serve_db)

# If web_thread crashes, entire process crashes
# Shared memory, one crash affects all

Example 2: Context Switching Cost

Process context switch:

Save: CPU registers, page table, memory mappings
Time: ~10-30 microseconds

Thread context switch:

Save: CPU registers only (same memory space)
Time: ~1-5 microseconds

Performance: Thread switching is 5-10x faster

Processes (isolated):

# Process A
data = [1, 2, 3]  # In Process A's memory

# Process B
data = [4, 5, 6]  # In Process B's memory (different)

# No sharing, must use IPC

Threads (shared):

# Shared data
shared_data = [1, 2, 3]

# Thread A
shared_data.append(4)  # Modifies shared memory

# Thread B
shared_data.append(5)  # Sees Thread A's changes

# Requires synchronization (locks)

Common Pitfalls

Pitfall 1: Using threads for everything

Problem: Threads share memory, one buggy thread can crash entire process
Solution: Use processes for fault isolation, threads for I/O concurrency
Example: Using threads for independent services (one crash kills all)

Pitfall 2: Creating too many processes

Problem: Process creation and context switching is expensive
Solution: Use process pools, limit process count, use threads for I/O
Example: Creating 10,000 processes exhausts system resources

Pitfall 3: Not synchronizing threads

Problem: Threads share memory, race conditions cause data corruption
Solution: Use mutexes, semaphores, or lock-free data structures
Example: Multiple threads modifying shared counter without locks

Pitfall 4: Confusing concurrency with parallelism

Problem: Threads provide concurrency, but parallelism requires multiple CPUs
Solution: Understand that threads on single CPU just take turns
Example: Creating many threads on single-core CPU doesn't improve performance

Pitfall 5: Ignoring resource limits

Problem: System has limits on processes and threads
Solution: Monitor resource usage, use pools, understand system limits
Example: Hitting max process limit (typically 32,000 on Linux)

Key Takeaways

Process: Isolated memory space, independent execution, heavyweight, fault isolation

Thread: Shared memory space, lightweight, requires synchronization, one crash can affect all

Use processes for: CPU-bound tasks, fault isolation, independent tasks

Use threads for: I/O-bound tasks, shared data, lightweight concurrency

Context switching: Process switch is expensive (save/restore memory), thread switch is cheaper (save/restore registers)

Communication: Processes use IPC, threads use shared memory

Synchronization: Threads need locks/mutexes, processes don't (isolated)

Best practice: Use processes for isolation, threads for I/O concurrency, hybrid approach for high-performance servers

Keep exploring

Kernel concepts stack on each other. Return to the hub and pick the next topic that closes a gap you noticed here.

Process vs Thread

Process vs Thread

Why This Matters

What Engineers Usually Get Wrong

How This Breaks Systems in the Real World

Process

Process Creation Example (Python)

Thread

Thread Creation Example (Python)

Thread Synchronization Example

Key Differences

When to Use Processes vs Threads

Use Processes When:

Use Threads When:

Context Switching

Process Context Switch

Thread Context Switch

Failure Stories You'll Recognize

What Interviewers Are Really Testing

Interview Questions

Beginner

Intermediate

Senior

Examples

Example 1: Process Isolation

Example 2: Context Switching Cost

Example 3: Memory Sharing

Common Pitfalls

Key Takeaways

Related Topics

Keep exploring