Topic Overview
Hash Tables: Fundamentals, Operations & Complexity
Master hash tables: hashing, collision resolution, load factor, and hash table operations. Understand when to use hash tables for O(1) average operations.
Hash Tables
Why This Matters
Hash tables provide O(1) average time for insert, delete, and search operations. They work by mapping keys to array indices using a hash function, enabling direct access to values.
This matters because hash tables are one of the most widely used data structures. They're used in databases (indexing), caches (Redis, Memcached), programming languages (dictionaries, maps), compilers (symbol tables), and distributed systems (consistent hashing). Many problems that seem O(n²) become O(n) when you use a hash table.
In interviews, hash table problems test your understanding of hashing, collision resolution, and your ability to recognize when a hash table can optimize an algorithm. Many "two sum" or "find duplicates" problems are hash table problems in disguise.
What Engineers Usually Get Wrong
Most engineers think "hash tables are always O(1)." But that's only average case. Worst case is O(n) if all keys hash to the same bucket (collision). Also, hash functions must be good—bad hash functions cause many collisions, degrading performance.
Engineers also don't understand load factor. When a hash table gets too full (high load factor), performance degrades. Resizing (rehashing) is expensive but necessary. Understanding when to resize is crucial for performance.
Another common mistake is not handling collisions correctly. Two keys can hash to the same index. How you handle this (chaining vs open addressing) affects performance and memory usage.
How This Breaks Systems in the Real World
A service was using a hash table to cache user sessions. The hash function was poorly designed—it only used the first character of the user ID. All user IDs starting with 'A' hashed to the same bucket. This created a linked list in that bucket. Lookups became O(n) instead of O(1). During peak traffic, the service became slow. The fix? Use a better hash function that distributes keys evenly.
Another story: A database was using a hash table for indexing. The load factor reached 0.9 (90% full). Collisions increased dramatically. Lookups became slow. The database didn't automatically resize. The fix? Monitor load factor and resize when it exceeds 0.75. This keeps performance good.
Hash Table Fundamentals
How Hash Tables Work
- Hash Function: Maps key to array index
- Array (Buckets): Stores key-value pairs
- Collision Resolution: Handles when multiple keys hash to same index
Basic Structure
Key → Hash Function → Index → Bucket → Value
Example:
"apple" → hash("apple") → 3 → buckets[3] → "red fruit"
"banana" → hash("banana") → 7 → buckets[7] → "yellow fruit"
Hash Functions
A good hash function:
- Deterministic: Same key always produces same hash
- Uniform distribution: Keys distributed evenly across buckets
- Fast computation: O(1) time
- Minimal collisions: Few keys hash to same index
Simple Hash Function (String)
1function simpleHash(key: string, size: number): number {2 let hash = 0;3 for (let i = 0; i < key.length; i++) {4 hash = (hash + key.charCodeAt(i)) % size;5 }6 return hash;7}89// Problem: Poor distribution, many collisions10// Time: O(k) where k is key length11// Better: Use polynomial rolling hash
Polynomial Rolling Hash
1function polynomialHash(key: string, size: number): number {2 const base = 31;3 const mod = 1e9 + 7;4 let hash = 0;5 let power = 1;67 for (let i = 0; i < key.length; i++) {8 hash = (hash + (key.charCodeAt(i) * power) % mod) % mod;9 power = (power * base mod
Integer Hash Function
1function integerHash(key: number, size: number): number {2 // Multiplication method3 const A = 0.6180339887; // (√5 - 1) / 2 (golden ratio)4 return Math.floor(size * ((key * A) % 1));5}67// Or simple modulo (if keys are well-distributed)8function simpleIntegerHash(key: number, size: number): number {9 return key % size;10}1112// Time: O(1)13// Edge cases: negative numbers, zero
Collision Resolution
1. Chaining (Separate Chaining)
Each bucket is a linked list (or array) of key-value pairs.
1class HashTableChaining {2 private buckets: Array<Array<[string, any]>>;3 private size: number;4 private loadFactor: number = 0.75;5 private count: number = 0;67 constructor(initialSize: number = 16) {8 this.size = initialSize;9 this.buckets = new Array(this.size).fill(null)
2. Open Addressing
Store key-value pairs directly in array. When collision occurs, find next available slot.
Linear Probing
1class HashTableLinearProbing {2 private buckets: Array<[string, any] | null>;3 private size: number;4 private count: number = 0;5 private loadFactor: number = 0.75;67 constructor(initialSize: number = 16) {8 this.size = initialSize;9 this.buckets = new Array(this.size).fill(null)
Quadratic Probing
1// Instead of (index + 1), use (index + i²)2index = (hash(key) + i * i) % size;34// Reduces clustering compared to linear probing5// But can fail to find slot even if table isn't full
Double Hashing
1// Use second hash function for step size2const step = hash2(key);3index = (hash1(key) + i * step) % size;45// Best open addressing method (fewer clusters)
Load Factor and Resizing
Load Factor = Number of elements / Number of buckets
- Low load factor (< 0.5): Wasted space, but fast operations
- High load factor (> 0.75): More collisions, slower operations
- Optimal: 0.5 - 0.75 (balance between space and performance)
Resizing Strategy
1// When load factor > threshold, resize2if (count / size > loadFactor) {3 // Double size and rehash all elements4 resize();5}67// Resizing is O(n) but amortized O(1) per operation
Examples
Example 1: Two Sum
1function twoSum(nums: number[], target: number): number[] {2 const map = new Map<number, number>(); // value -> index34 for (let i = 0; i < nums.length; i++) {5 const complement = target - nums[i];6 if (map.has(complement)) {7 return [map.get(complement i
Example 2: Group Anagrams
1function groupAnagrams(strs: string[]): string[][] {2 const map = new Map<string, string[]>();34 for (const str of strs) {5 // Sort string to get canonical form6 const sorted = str.split('').sort().join('');78 if (!map.has(sorted)
Example 3: Longest Consecutive Sequence
1function longestConsecutive(nums: number[]): number {2 const numSet = new Set(nums);3 let longest = 0;45 for (const num of numSet) {6 // Only start from beginning of sequence7 if (!numSet.has(num - 1)) {8 let current = num;9 let length = 1;1011 // Count consecutive numbers12 while (numSet.has(current + 1
// Time: O(n) - each number visited at most twice // Space: O(n) // Edge cases: empty array, no consecutive, all consecutive
---
## Common Pitfalls
- **Poor hash function**: Causes many collisions, degrading to O(n)
- **Not handling collisions**: Assuming unique hashes causes bugs
- **High load factor**: Not resizing causes performance degradation
- **Resizing too frequently**: Resizing on every insert is expensive
- **Not rehashing after resize**: Old hash values become invalid
- **Memory leaks**: In chaining, not properly deleting nodes
- **Clustering in open addressing**: Linear probing causes clustering, use double hashing
- **Not handling deleted slots**: In open addressing, need tombstones or shifting
---
## Failure Stories You'll Recognize
**The Bad Hash Function:** A service used a hash function that only considered the first character of keys. All keys starting with 'A' hashed to the same bucket, creating a linked list. Lookups became O(n). The fix? Use a better hash function (polynomial rolling hash) that considers all characters.
**The High Load Factor:** A database hash table reached 0.9 load factor. Collisions increased dramatically. Lookups became slow. The fix? Monitor load factor and resize when it exceeds 0.75. This keeps performance good.
**The Resize Bug:** A service resized its hash table but didn't rehash elements. Old hash values pointed to wrong buckets. Lookups failed. The fix? After resize, rehash all elements with new table size.
### What Interviewers Are Really Testing
They want to see you understand hashing and collision resolution. Junior engineers assume O(1) always. Senior engineers understand average vs worst case and choose appropriate collision resolution.
When they ask "implement a hash table," they're testing:
- Do you understand hash functions?
- Can you handle collisions (chaining vs open addressing)?
- Do you understand load factor and resizing?
---
## Interview Questions
### Beginner
**Q: What is a hash table and what are its time complexities?**
**A:**
**Hash Table**: Data structure that maps keys to values using a hash function.
**Operations:**
- **Insert**: O(1) average, O(n) worst case
- **Search**: O(1) average, O(n) worst case
- **Delete**: O(1) average, O(n) worst case
**Average Case**: O(1) - hash function distributes keys evenly, few collisions
**Worst Case**: O(n) - all keys hash to same bucket (degenerate to linked list)
**Key Components:**
- **Hash function**: Maps key to array index
- **Buckets**: Array storing key-value pairs
- **Collision resolution**: Handles when multiple keys hash to same index
**Use Cases**: Fast lookups, caching, indexing, symbol tables, counting frequencies
---
### Intermediate
**Q: Explain different collision resolution strategies. What are their trade-offs?**
**A:**
**1. Chaining (Separate Chaining)**
- Each bucket is a linked list/array
- Colliding keys stored in same bucket
- **Pros**: Simple, handles any number of collisions, easy deletion
- **Cons**: Extra memory for pointers, cache misses (non-contiguous)
- **Time**: O(1 + α) average where α is load factor
**2. Open Addressing**
- Store directly in array, find next available slot on collision
- **Linear Probing**: `(index + 1) % size`
- **Pros**: Better cache locality, no extra memory
- **Cons**: Clustering (consecutive filled slots), performance degrades
- **Quadratic Probing**: `(index + i²) % size`
- **Pros**: Reduces clustering
- **Cons**: Can fail to find slot even if table not full
- **Double Hashing**: `(hash1(key) + i * hash2(key)) % size`
- **Pros**: Best distribution, fewest clusters
- **Cons**: More complex, need second hash function
**Trade-offs:**
- **Chaining**: Better for high load factors, easier to implement
- **Open Addressing**: Better cache locality, less memory overhead, but clustering issues
---
### Senior
**Q: Design a distributed hash table that handles node failures and data replication. How would you implement consistent hashing?**
**A:**
**Challenge**: Traditional hashing breaks when nodes are added/removed (all keys need rehashing).
**Solution: Consistent Hashing**
```typescript
class ConsistentHash {
private ring: Map<number, string> = new Map(); // hash → node
private nodes: Set<string> = new Set();
private replicas: number = 3; // Virtual nodes per physical node
addNode(node: string): void {
this.nodes.add(node);
// Create virtual nodes
for (let i = 0; i < this.replicas; i++) {
const hash = this.hash(`${node}:${i}`);
this.ring.set(hash, node);
}
}
removeNode(node: string): void {
this.nodes.delete(node);
// Remove virtual nodes
for (let i = 0; i < this.replicas; i++) {
const hash = this.hash(`${node}:${i}`);
this.ring.delete(hash);
}
}
getNode(key: string): string {
if (this.ring.size === 0) throw new Error("No nodes");
const keyHash = this.hash(key);
const sortedHashes = Array.from(this.ring.keys()).sort((a, b) => a - b);
// Find first node with hash >= keyHash
for (const hash of sortedHashes) {
if (hash >= keyHash) {
return this.ring.get(hash)!;
}
}
// Wrap around (keyHash > all node hashes)
return this.ring.get(sortedHashes[0])!;
}
private hash(key: string): number {
// Use SHA-256 or MD5, then convert to number
// Simplified for example
let hash = 0;
for (let i = 0; i < key.length; i++) {
hash = ((hash << 5) - hash + key.charCodeAt(i)) & 0xffffffff;
}
return Math.abs(hash);
}
}
// Time: getNode O(log n) with sorted array, O(1) with balanced BST
// Space: O(n * replicas)
// Benefits: Only 1/n keys need rehashing when node added/removed
How it works:
- Hash nodes and keys to same ring (0 to 2³²-1)
- Key maps to first node with hash ≥ key hash (clockwise)
- Virtual nodes improve distribution
- When node added/removed, only nearby keys affected
Benefits:
- Minimal rehashing (only 1/n keys affected)
- Load balancing (virtual nodes)
- Fault tolerance (replication)
- Hash table: Maps keys to values using hash function, provides O(1) average operations
- Hash function: Must be deterministic, uniform, fast. Poor hash functions cause collisions
- Collision resolution: Chaining (linked list per bucket) vs Open addressing (find next slot)
- Load factor: Ratio of elements to buckets. Optimal 0.5-0.75. Resize when > 0.75
- Time complexity: O(1) average, O(n) worst case (all collisions)
- Space complexity: O(n) for n key-value pairs
- Resizing: Double size and rehash when load factor exceeded. Amortized O(1)
- Use cases: Fast lookups, caching, indexing, frequency counting, symbol tables
- Consistent hashing: For distributed systems, minimizes rehashing when nodes change
- Edge cases: Empty table, key not found, all collisions, resize timing
How InterviewCrafted Will Teach This
We'll teach hash tables through lookup optimization problems, not abstract definitions. Instead of memorizing "a hash table uses a hash function," you'll learn through scenarios like "how do you find two numbers that sum to target?" or "how do you group anagrams?"
You'll see how hash tables are used in real systems (caches, databases, distributed systems) and understand when to use them vs other structures. When an interviewer asks "find duplicates," you'll think "hash set"—not "nested loops."
- Arrays - Hash tables are typically implemented using arrays (buckets). Understanding arrays helps understand hash table internals.
- Linked Lists - Used in chaining collision resolution. Each bucket can be a linked list of colliding keys.
- Time & Space Complexity - Understanding Big-O notation is essential for analyzing hash table performance (O(1) average vs O(n) worst case).
- Graphs - Hash tables are used to implement adjacency lists and track visited nodes in graph algorithms.
- Bloom Filters - Probabilistic data structure that uses multiple hash functions, building on hash table concepts.
Key Takeaways
Hash table: Maps keys to values using hash function, provides O(1) average operations
Hash function: Must be deterministic, uniform, fast. Poor hash functions cause collisions
Collision resolution: Chaining (linked list per bucket) vs Open addressing (find next slot)
Load factor: Ratio of elements to buckets. Optimal 0.5-0.75. Resize when > 0.75
Time complexity: O(1) average, O(n) worst case (all collisions)
Space complexity: O(n) for n key-value pairs
Resizing: Double size and rehash when load factor exceeded. Amortized O(1)
Use cases: Fast lookups, caching, indexing, frequency counting, symbol tables
Consistent hashing: For distributed systems, minimizes rehashing when nodes change
Edge cases: Empty table, key not found, all collisions, resize timing
Related Topics
Arrays
Hash tables are typically implemented using arrays (buckets). Understanding arrays helps understand hash table internals.
Linked Lists
Used in chaining collision resolution. Each bucket can be a linked list of colliding keys.
Time & Space Complexity
Understanding Big-O notation is essential for analyzing hash table performance (O(1) average vs O(n) worst case).
Graphs
Hash tables are used to implement adjacency lists and track visited nodes in graph algorithms.
Bloom Filters
Probabilistic data structure that uses multiple hash functions, building on hash table concepts.
What's next?