Topic Overview

B-Trees & Balanced Multi-way Trees

Master B-trees: disk-friendly balanced trees used in databases and file systems. Learn how B-trees optimize for disk I/O and database indexing.

Senior16 min read

B-Trees & Balanced Multi-way Trees

Why This Matters

B-trees are self-balancing multi-way search trees designed for systems that read and write large blocks of data, like databases and file systems. Unlike binary search trees optimized for in-memory operations, B-trees minimize disk I/O by storing multiple keys per node and matching node size to disk block size.

This matters because disk I/O is thousands of times slower than memory access. A binary search tree might require O(log n) disk reads, but each read is slow. A B-tree reduces the number of disk reads by storing many keys per node, making each disk read more efficient. This is why virtually every database (PostgreSQL, MySQL, MongoDB) uses B-trees or B-tree variants for indexing.

In interviews, B-tree questions test whether you understand how data structures must adapt to hardware constraints. Real-world systems don't just optimize for time complexity—they optimize for what's actually slow (disk I/O, network latency). Understanding B-trees shows you can think beyond abstract algorithms to practical system design.

What Engineers Usually Get Wrong

Most engineers think "B-trees are just BSTs with more children per node." While true structurally, the real insight is that B-trees are designed to minimize disk I/O, not just to be balanced. Engineers often implement B-trees without understanding why the order (branching factor) matters—it should match disk block size.

Engineers also confuse B-trees with B+ trees. B-trees store data in all nodes. B+ trees (used by most databases) store data only in leaves and use internal nodes as indexes. This separation optimizes for range scans, which are common in databases.

Another common mistake is using B-trees for in-memory data structures. B-trees have overhead (more complex insertion/deletion) that's only worth it when disk I/O is the bottleneck. For in-memory data, balanced BSTs (AVL, Red-Black) are simpler and faster.

How This Breaks Systems in the Real World

A database was using a simple hash table for indexing. This worked fine for exact lookups, but range queries (like "find all users with age 25-30") required scanning the entire table. The database added a B-tree index on the age column. Now range queries became O(log n + k) where k is result size, making queries fast even with millions of records. The B-tree's sorted structure enabled efficient range scans that hash tables couldn't provide.

Another story: A file system was storing directory entries in a simple array. As directories grew to thousands of files, listing files became slow (O(n) scan). The fix? Use a B-tree to organize directory entries. File lookups became O(log n), and the B-tree's structure matched disk block size, minimizing I/O operations. This is why modern file systems (ext4, NTFS) use B-trees for directory indexing.

What is a B-Tree?

A B-tree of order m is a multi-way search tree with these properties:

All leaves at same level: Tree is perfectly balanced
Node capacity: Each node has at most m-1 keys and at most m children
Minimum keys: Root has at least 1 key; internal nodes have at least ⌈m/2⌉ - 1 keys
Sorted keys: Keys in each node are sorted
Child separation: Keys in child nodes are separated by parent keys

Structure Example (Order 3)

        [20, 40]
       /    |    \
   [10]  [30]  [50, 60]
   /  \   /  \   /  |  \
 [5] [15][25][35][45][55][65]

Each node can have up to 2 keys (m-1) and 3 children (m).

B-Tree Operations

Search

1class BTreeNode {
2  keys: number[];
3  children: BTreeNode[];
4  isLeaf: boolean;
5
6  constructor(isLeaf: boolean) {
7    this.keys = [];
8    this.children = [];
9    this.isLeaf = isLeaf;
10  }
11}
12
13class BTree {
14  private root: BTreeNode;
15  private order: number; // Maximum children per node
16
17  constructororder

Insertion

Insertion in B-trees is more complex because nodes can overflow. When a node becomes full, it must be split.

1insert(key: number): void {
2  const root = this.root;
3
4  // If root is full, split it
5  if (root.keys.length === this.order - 1) {
6    const newRoot = new BTreeNode(false);
7    newRoot.children.push(root);
8    this.splitChild(newRoot, 0);
9    this.root = newRoot;
10  }
11
12  this.insertNonFullroot key

B-Tree vs Binary Search Tree

Aspect	Binary Search Tree	B-Tree
Children per node	2	m (typically 100-1000)
Height	O(log n)	O(log_m n) - much shorter
Disk I/O	O(log n) reads	O(log_m n) reads - fewer
Node size	Small (fits in cache)	Large (matches disk block)
Use case	In-memory data	Disk-based data
Complexity	Simple	More complex

Why B-Trees Win for Disk

Fewer disk reads: Height is O(log_m n) vs O(log n) for BST
Block-aligned: Each node fits in one disk block (typically 4KB)
Sequential access: Reading one block gets many keys
Cache efficiency: Fewer but larger reads are better for disk

B+ Trees (Database Variant)

Most databases use B+ trees, a variant where:

Data only in leaves: Internal nodes are indexes only
Leaf nodes linked: Leaves form a linked list for range scans
Better for databases: Separates index from data, optimizes range queries

B+ Tree Structure

        [20, 40]          (Internal nodes - indexes only)
       /    |    \
   [10]  [30]  [50, 60]
   /  \   /  \   /  |  \
 [5→10→15] [25→30→35] [45→50→55→60→65]  (Leaves - contain data, linked)

Common Pitfalls

Wrong order: Choosing order that doesn't match disk block size wastes space
Incorrect split: Not handling split correctly causes tree corruption
Merge complexity: Deletion requires merging nodes, which is complex
Over-engineering: Using B-trees for in-memory data adds unnecessary complexity
Range queries: Not optimizing leaf structure for range scans

Failure Stories You'll Recognize

A database was using a hash table for its primary index. This worked fine for exact lookups, but range queries required full table scans. As the table grew to millions of rows, range queries became prohibitively slow. The fix? Replace hash index with a B+ tree index. Range queries became O(log n + k) where k is result size, making queries fast. The B+ tree's sorted structure and linked leaves enabled efficient range scans.

Another story: A file system was storing large directories (10,000+ files) in a simple array. File lookups required scanning the array (O(n)). As directories grew, file operations became slow. The fix? Use a B-tree to organize directory entries. Lookups became O(log n), and the B-tree structure matched disk block size, minimizing I/O. This is why modern file systems use B-trees.

What Interviewers Are Really Testing

System thinking: Can you adapt data structures to hardware constraints?
Database knowledge: Do you understand how databases actually work?
Trade-offs: Can you explain when B-trees are worth the complexity?
Design decisions: Can you explain why B-trees are used in real systems?
Implementation: Can you implement B-tree operations correctly?

Interview Questions

Intermediate

Explain why B-trees are used in databases instead of binary search trees
Describe the structure of a B-tree node
Explain the split operation in B-tree insertion

Senior

Implement B-tree search and insertion
Design a B+ tree and explain why it's better for databases
Compare B-trees, B+ trees, and LSM trees for database indexing
Explain how B-tree order relates to disk block size
Design a system that uses B-trees for time-series data

B-trees minimize disk I/O: Fewer but larger reads are better for disk
Order matters: Should match disk block size (typically 100-1000)
Height advantage: O(log_m n) height means fewer disk reads than BST
Database standard: Virtually all databases use B-trees or B+ trees
Not for memory: B-trees are overkill for in-memory data structures
B+ trees for databases: Separate index from data, optimize range scans
Split and merge: Complex operations required to maintain balance
File systems: Modern file systems use B-trees for directory indexing
Range queries: B-trees excel at range queries (sorted structure)
Hardware-aware: Understanding B-trees shows system-level thinking

Trees - Understanding tree structures
Binary Search Trees - Comparison with BSTs
Databases - Database indexing
File Systems - File system internals

B-Trees & Balanced Multi-way Trees