Topic Overview
B-Trees & Balanced Multi-way Trees
Master B-trees: disk-friendly balanced trees used in databases and file systems. Learn how B-trees optimize for disk I/O and database indexing.
B-Trees & Balanced Multi-way Trees
Why This Matters
B-trees are self-balancing multi-way search trees designed for systems that read and write large blocks of data, like databases and file systems. Unlike binary search trees optimized for in-memory operations, B-trees minimize disk I/O by storing multiple keys per node and matching node size to disk block size.
This matters because disk I/O is thousands of times slower than memory access. A binary search tree might require O(log n) disk reads, but each read is slow. A B-tree reduces the number of disk reads by storing many keys per node, making each disk read more efficient. This is why virtually every database (PostgreSQL, MySQL, MongoDB) uses B-trees or B-tree variants for indexing.
In interviews, B-tree questions test whether you understand how data structures must adapt to hardware constraints. Real-world systems don't just optimize for time complexity—they optimize for what's actually slow (disk I/O, network latency). Understanding B-trees shows you can think beyond abstract algorithms to practical system design.
What Engineers Usually Get Wrong
Most engineers think "B-trees are just BSTs with more children per node." While true structurally, the real insight is that B-trees are designed to minimize disk I/O, not just to be balanced. Engineers often implement B-trees without understanding why the order (branching factor) matters—it should match disk block size.
Engineers also confuse B-trees with B+ trees. B-trees store data in all nodes. B+ trees (used by most databases) store data only in leaves and use internal nodes as indexes. This separation optimizes for range scans, which are common in databases.
Another common mistake is using B-trees for in-memory data structures. B-trees have overhead (more complex insertion/deletion) that's only worth it when disk I/O is the bottleneck. For in-memory data, balanced BSTs (AVL, Red-Black) are simpler and faster.
How This Breaks Systems in the Real World
A database was using a simple hash table for indexing. This worked fine for exact lookups, but range queries (like "find all users with age 25-30") required scanning the entire table. The database added a B-tree index on the age column. Now range queries became O(log n + k) where k is result size, making queries fast even with millions of records. The B-tree's sorted structure enabled efficient range scans that hash tables couldn't provide.
Another story: A file system was storing directory entries in a simple array. As directories grew to thousands of files, listing files became slow (O(n) scan). The fix? Use a B-tree to organize directory entries. File lookups became O(log n), and the B-tree's structure matched disk block size, minimizing I/O operations. This is why modern file systems (ext4, NTFS) use B-trees for directory indexing.
What is a B-Tree?
A B-tree of order m is a multi-way search tree with these properties:
- All leaves at same level: Tree is perfectly balanced
- Node capacity: Each node has at most
m-1keys and at mostmchildren - Minimum keys: Root has at least 1 key; internal nodes have at least
⌈m/2⌉ - 1keys - Sorted keys: Keys in each node are sorted
- Child separation: Keys in child nodes are separated by parent keys
Structure Example (Order 3)
[20, 40]
/ | \
[10] [30] [50, 60]
/ \ / \ / | \
[5] [15][25][35][45][55][65]
Each node can have up to 2 keys (m-1) and 3 children (m).
B-Tree Operations
Search
1class BTreeNode {2 keys: number[];3 children: BTreeNode[];4 isLeaf: boolean;56 constructor(isLeaf: boolean) {7 this.keys = [];8 this.children = [];9 this.isLeaf = isLeaf;10 }11}1213class BTree {14 private root: BTreeNode;15 private order: number; // Maximum children per node1617 constructororder
Insertion
Insertion in B-trees is more complex because nodes can overflow. When a node becomes full, it must be split.
1insert(key: number): void {2 const root = this.root;34 // If root is full, split it5 if (root.keys.length === this.order - 1) {6 const newRoot = new BTreeNode(false);7 newRoot.children.push(root);8 this.splitChild(newRoot, 0);9 this.root = newRoot;10 }1112 this.insertNonFullroot key
B-Tree vs Binary Search Tree
| Aspect | Binary Search Tree | B-Tree |
|---|---|---|
| Children per node | 2 | m (typically 100-1000) |
| Height | O(log n) | O(log_m n) - much shorter |
| Disk I/O | O(log n) reads | O(log_m n) reads - fewer |
| Node size | Small (fits in cache) | Large (matches disk block) |
| Use case | In-memory data | Disk-based data |
| Complexity | Simple | More complex |
Why B-Trees Win for Disk
- Fewer disk reads: Height is O(log_m n) vs O(log n) for BST
- Block-aligned: Each node fits in one disk block (typically 4KB)
- Sequential access: Reading one block gets many keys
- Cache efficiency: Fewer but larger reads are better for disk
B+ Trees (Database Variant)
Most databases use B+ trees, a variant where:
- Data only in leaves: Internal nodes are indexes only
- Leaf nodes linked: Leaves form a linked list for range scans
- Better for databases: Separates index from data, optimizes range queries
B+ Tree Structure
[20, 40] (Internal nodes - indexes only)
/ | \
[10] [30] [50, 60]
/ \ / \ / | \
[5→10→15] [25→30→35] [45→50→55→60→65] (Leaves - contain data, linked)
Common Pitfalls
- Wrong order: Choosing order that doesn't match disk block size wastes space
- Incorrect split: Not handling split correctly causes tree corruption
- Merge complexity: Deletion requires merging nodes, which is complex
- Over-engineering: Using B-trees for in-memory data adds unnecessary complexity
- Range queries: Not optimizing leaf structure for range scans
Failure Stories You'll Recognize
A database was using a hash table for its primary index. This worked fine for exact lookups, but range queries required full table scans. As the table grew to millions of rows, range queries became prohibitively slow. The fix? Replace hash index with a B+ tree index. Range queries became O(log n + k) where k is result size, making queries fast. The B+ tree's sorted structure and linked leaves enabled efficient range scans.
Another story: A file system was storing large directories (10,000+ files) in a simple array. File lookups required scanning the array (O(n)). As directories grew, file operations became slow. The fix? Use a B-tree to organize directory entries. Lookups became O(log n), and the B-tree structure matched disk block size, minimizing I/O. This is why modern file systems use B-trees.
What Interviewers Are Really Testing
- System thinking: Can you adapt data structures to hardware constraints?
- Database knowledge: Do you understand how databases actually work?
- Trade-offs: Can you explain when B-trees are worth the complexity?
- Design decisions: Can you explain why B-trees are used in real systems?
- Implementation: Can you implement B-tree operations correctly?
Interview Questions
Intermediate
- Explain why B-trees are used in databases instead of binary search trees
- Describe the structure of a B-tree node
- Explain the split operation in B-tree insertion
Senior
- Implement B-tree search and insertion
- Design a B+ tree and explain why it's better for databases
- Compare B-trees, B+ trees, and LSM trees for database indexing
- Explain how B-tree order relates to disk block size
- Design a system that uses B-trees for time-series data
- B-trees minimize disk I/O: Fewer but larger reads are better for disk
- Order matters: Should match disk block size (typically 100-1000)
- Height advantage: O(log_m n) height means fewer disk reads than BST
- Database standard: Virtually all databases use B-trees or B+ trees
- Not for memory: B-trees are overkill for in-memory data structures
- B+ trees for databases: Separate index from data, optimize range scans
- Split and merge: Complex operations required to maintain balance
- File systems: Modern file systems use B-trees for directory indexing
- Range queries: B-trees excel at range queries (sorted structure)
- Hardware-aware: Understanding B-trees shows system-level thinking
- Trees - Understanding tree structures
- Binary Search Trees - Comparison with BSTs
- Databases - Database indexing
- File Systems - File system internals
What's next?