Topic Overview
Process vs Thread
Understand the fundamental differences between processes and threads: isolation, memory sharing, context switching, and when to use each.
Processes and threads are fundamental concepts in operating systems for achieving concurrency. Understanding their differences is crucial for designing efficient, scalable systems.
Process
Definition: An independent program in execution with its own memory space.
Characteristics:
- Isolated memory: Each process has its own address space
- Independent execution: Processes don't share memory (by default)
- Heavyweight: Higher overhead for creation and context switching
- Process ID (PID): Unique identifier for each process
- Protection: One process cannot directly access another's memory
Memory Layout:
Process A Process B
┌─────────────┐ ┌─────────────┐
│ Stack │ │ Stack │
│ Heap │ │ Heap │
│ Data │ │ Data │
│ Code │ │ Code │
└─────────────┘ └─────────────┘
↓ ↓
Separate Separate
Address Space Address Space
Thread
Definition: A lightweight unit of execution within a process that shares the process's memory space.
Characteristics:
- Shared memory: All threads in a process share the same address space
- Lightweight: Lower overhead for creation and context switching
- Thread ID (TID): Unique identifier within a process
- Communication: Threads can directly access shared memory
- Synchronization needed: Requires locks, mutexes to prevent race conditions
Memory Layout:
Process
┌─────────────────────────────────┐
│ Shared Memory │
│ ┌─────────┐ ┌─────────┐ │
│ │ Thread1 │ │ Thread2 │ │
│ │ Stack │ │ Stack │ │
│ └─────────┘ └─────────┘ │
│ Shared Heap │
│ Shared Data │
│ Shared Code │
└─────────────────────────────────┘
Key Differences
| Aspect | Process | Thread |
|---|---|---|
| Memory | Isolated address space | Shared address space |
| Communication | IPC (pipes, sockets, shared memory) | Shared memory (direct access) |
| Overhead | High (separate memory, resources) | Low (shared memory) |
| Context Switch | Expensive (save/restore entire memory) | Cheaper (save/restore registers) |
| Fault Isolation | One process crash doesn't affect others | One thread crash can affect all threads |
| Creation Time | Slow | Fast |
| Synchronization | Not needed (isolated) | Required (shared memory) |
Examples
Process Creation (Python)
import os
import multiprocessing
def worker_process(name):
"""Worker function for a process"""
print(f"Process {name} (PID: {os.getpid()})")
# Each process has its own memory space
data = [1, 2, 3] # Isolated to this process
print(f"Process {name} data: {data}")
if __name__ == '__main__':
# Create processes
p1 = multiprocessing.Process(target=worker_process, args=('A',))
p2 = multiprocessing.Process(target=worker_process, args=('B',))
p1.start()
p2.start()
p1.join()
p2.join()
Output:
Process A (PID: 1234)
Process A data: [1, 2, 3]
Process B (PID: 1235)
Process B data: [1, 2, 3]
Thread Creation (Python)
import threading
shared_data = [] # Shared by all threads
def worker_thread(name):
"""Worker function for a thread"""
print(f"Thread {name} (TID: {threading.current_thread().ident})")
# All threads share the same memory
shared_data.append(name)
print(f"Shared data: {shared_data}")
# Create threads
t1 = threading.Thread(target=worker_thread, args=('A',))
t2 = threading.Thread(target=worker_thread, args=('B',))
t1.start()
t2.start()
t1.join()
t2.join()
print(f"Final shared data: {shared_data}")
Output:
Thread A (TID: 140234567890432)
Shared data: ['A']
Thread B (TID: 140234567891456)
Shared data: ['A', 'B']
Final shared data: ['A', 'B']
Process Communication (IPC)
import multiprocessing
import queue
def producer(q):
"""Producer process"""
for i in range(5):
q.put(f"Item {i}")
print(f"Produced: Item {i}")
def consumer(q):
"""Consumer process"""
while True:
item = q.get()
if item is None:
break
print(f"Consumed: {item}")
if __name__ == '__main__':
# Queue for inter-process communication
q = multiprocessing.Queue()
p1 = multiprocessing.Process(target=producer, args=(q,))
p2 = multiprocessing.Process(target=consumer, args=(q,))
p1.start()
p2.start()
p1.join()
q.put(None) # Signal to stop
p2.join()
Thread Synchronization
import threading
counter = 0
lock = threading.Lock() # Mutex for synchronization
def increment():
global counter
for _ in range(100000):
with lock: # Acquire lock
counter += 1
# Lock released automatically
# Create threads
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start()
t2.start()
t1.join()
t2.join()
print(f"Final counter: {counter}") # Should be 200000
When to Use Processes vs Threads
Use Processes When:
- Fault isolation needed: One failure shouldn't crash the entire system
- CPU-bound tasks: Parallel computation on multiple CPUs
- Independent tasks: Tasks don't need to share data
- Security: Isolated execution environments
Example: Web server handling multiple requests (each request = process)
Use Threads When:
- I/O-bound tasks: Waiting for network, disk I/O
- Shared data: Tasks need to share memory efficiently
- Lightweight concurrency: Many concurrent tasks
- GUI applications: Responsive UI while processing
Example: Web server handling multiple requests (each request = thread)
Context Switching
Process Context Switch
Save: Entire process state
- CPU registers
- Memory mappings
- Open files
- Process control block (PCB)
Restore: New process state
- Flush TLB (Translation Lookaside Buffer)
- Load new memory mappings
- Restore registers
Cost: High (microseconds)
Thread Context Switch
Save: Thread-specific state
- CPU registers
- Stack pointer
- Thread control block (TCB)
Restore: New thread state
- Same memory space (no TLB flush)
- Restore registers
Cost: Low (nanoseconds)
Performance:
- Process context switch: ~1-10 microseconds
- Thread context switch: ~0.1-1 microsecond
Common Pitfalls
- Using threads for CPU-bound tasks: Python GIL limits threads. Fix: Use processes for CPU-bound tasks
- Race conditions in threads: Shared memory without synchronization. Fix: Use locks, mutexes, semaphores
- Process overhead: Creating too many processes. Fix: Use thread pools for I/O-bound tasks
- Memory leaks in threads: Shared memory not cleaned up. Fix: Proper resource management
- Deadlocks: Multiple locks acquired in wrong order. Fix: Always acquire locks in same order
- Not handling process/thread failures: One failure can cascade. Fix: Implement proper error handling, isolation
Interview Questions
Beginner
Q: What is the difference between a process and a thread?
A:
Process:
- Independent program in execution
- Has its own isolated memory space
- Heavyweight (higher overhead)
- Process ID (PID) for identification
- One process crash doesn't affect others
- Communication via IPC (Inter-Process Communication)
Thread:
- Lightweight unit of execution within a process
- Shares memory space with other threads in the same process
- Lightweight (lower overhead)
- Thread ID (TID) for identification
- One thread crash can affect all threads in the process
- Communication via shared memory (direct access)
Key Difference:
- Process: Isolated memory, independent execution
- Thread: Shared memory, requires synchronization
Example:
Process: Like separate apartments (isolated)
Thread: Like rooms in the same apartment (shared)
Intermediate
Q: When would you use processes vs threads? Explain with examples.
A:
Use Processes When:
-
Fault Isolation
# Web server: Each request = process # If one request crashes, others continue for request in requests: process = multiprocessing.Process(target=handle_request, args=(request,)) process.start() -
CPU-Bound Tasks (Python)
# Parallel computation on multiple CPUs # Python GIL limits threads, use processes with multiprocessing.Pool() as pool: results = pool.map(compute_heavy_task, data) -
Independent Tasks
# Tasks don't need to share data # Each process has its own memory processes = [] for task in independent_tasks: p = multiprocessing.Process(target=task) processes.append(p)
Use Threads When:
-
I/O-Bound Tasks
# Network requests, file I/O # Threads wait while I/O happens threads = [] for url in urls: t = threading.Thread(target=fetch_url, args=(url,)) threads.append(t) -
Shared Data
# Multiple threads work on shared data structure shared_queue = queue.Queue() producer = threading.Thread(target=produce, args=(shared_queue,)) consumer = threading.Thread(target=consume, args=(shared_queue,)) -
GUI Applications
# Keep UI responsive while processing def process_data(): # Long-running task result = heavy_computation() update_ui(result) thread = threading.Thread(target=process_data) thread.start() # UI remains responsive
Rule of Thumb:
- Processes: CPU-bound, fault isolation, independent tasks
- Threads: I/O-bound, shared data, lightweight concurrency
Senior
Q: Design a concurrent web server that handles 10,000 concurrent connections. Should you use processes or threads? How do you handle context switching, memory management, and fault isolation?
A:
class ConcurrentWebServer {
private threadPool: ThreadPool;
private processPool: ProcessPool;
private connectionManager: ConnectionManager;
constructor() {
// Hybrid approach: Processes for isolation, threads for I/O
this.processPool = new ProcessPool({
size: os.cpus().length, // One process per CPU
strategy: 'prefork'
});
// Thread pool within each process
this.threadPool = new ThreadPool({
size: 1000, // 1000 threads per process
queueSize: 10000
});
this.connectionManager = new ConnectionManager();
}
// Architecture: Process Pool + Thread Pool
async handleRequest(request: Request): Promise<Response> {
// 1. Accept connection (I/O-bound, use thread)
const connection = await this.acceptConnection(request);
// 2. Assign to process (load balanced)
const process = this.processPool.getProcess();
// 3. Handle in thread pool (I/O-bound)
return await process.handleInThread(connection, async () => {
// Process request (I/O: database, network)
const response = await this.processRequest(connection);
return response;
});
}
// Process Pool (Fault Isolation)
class ProcessPool {
private processes: WorkerProcess[];
getProcess(): WorkerProcess {
// Load balance across processes
return this.processes[this.getLeastLoaded()];
}
// Each process handles ~1000 connections
// If one process crashes, others continue
}
// Thread Pool (I/O Concurrency)
class ThreadPool {
private threads: Thread[];
private taskQueue: Queue<Task>;
async execute(task: Task): Promise<Result> {
// Get available thread
const thread = this.getAvailableThread();
// Execute task (I/O-bound)
return await thread.execute(task);
}
}
// Connection Management
class ConnectionManager {
private connections: Map<number, Connection>;
async acceptConnection(request: Request): Promise<Connection> {
// Use epoll/kqueue for efficient I/O
const fd = await this.epoll.wait();
return new Connection(fd);
}
}
}
Design Decisions:
-
Hybrid Approach: Processes for isolation, threads for I/O
- Processes: Fault isolation, one per CPU core
- Threads: I/O concurrency, many per process
-
Context Switching Optimization
- Use epoll/kqueue (event-driven I/O)
- Minimize context switches
- Thread pool to reuse threads
-
Memory Management
- Each process: Isolated memory (crash doesn't affect others)
- Shared memory: Only for connection state (if needed)
- Connection pooling: Reuse connections
-
Fault Isolation
- Process crash: Only affects connections in that process
- Thread crash: Affects only that thread's connections
- Health checks: Restart failed processes
Alternative: Event-Driven (Node.js style)
// Single-threaded event loop
// Handles 10,000 connections with async I/O
// No context switching overhead
// But: One crash affects all connections
Trade-offs:
- Processes + Threads: Better fault isolation, higher overhead
- Event-driven: Lower overhead, less fault isolation
- Hybrid: Balance of both
Key Takeaways
- Process: Isolated memory space, independent execution, heavyweight, fault isolation
- Thread: Shared memory space, lightweight, requires synchronization, one crash can affect all
- Use processes for: CPU-bound tasks, fault isolation, independent tasks
- Use threads for: I/O-bound tasks, shared data, lightweight concurrency
- Context switching: Process switch is expensive (save/restore memory), thread switch is cheaper (save/restore registers)
- Communication: Processes use IPC, threads use shared memory
- Synchronization: Threads need locks/mutexes, processes don't (isolated)
- Best practice: Use processes for isolation, threads for I/O concurrency, hybrid approach for high-performance servers