Topic Overview
Hashing: Patterns, Complexity & Interview Use Cases
Master hash functions, hash tables, collision resolution, and hashing algorithms for efficient data storage and retrieval.
Hashing is a technique that maps data of arbitrary size to fixed-size values, enabling O(1) average-case lookup, insertion, and deletion operations.
Why This Matters in Interviews
Hashing is fundamental because it enables:
- Fast lookups: O(1) average time complexity for search operations
- Problem optimization: Many problems can be solved efficiently using hash tables
- System design: Understanding hashing is crucial for distributed systems, caching, and databases
- Algorithm design: Hash-based algorithms appear frequently in interviews
Interviewers test hashing knowledge to assess your understanding of trade-offs between time and space complexity, and your ability to choose appropriate data structures.
Core Concepts
- Hash function: Maps keys to array indices
- Hash table: Data structure using hashing for fast access
- Collision: When two keys hash to the same index
- Collision resolution: Chaining vs open addressing
- Load factor: Ratio of entries to buckets (affects performance)
- Perfect hashing: Hash function with no collisions
- Consistent hashing: Used in distributed systems
Detailed Explanation
Hash Functions
Properties of good hash function:
- Deterministic: Same input always produces same output
- Uniform distribution: Keys distributed evenly across buckets
- Fast computation: O(1) time complexity
- Avalanche effect: Small input changes cause large output changes
Common hash functions:
1// Division method2function hashDivision(key: number, size: number): number {3 return key % size;4}56// Multiplication method7function hashMultiplication(key: number, size: number): number {8 const A = 0.6180339887; // (√5 - 1) / 29 return Math.floor(size * ((key * A) % 1));10}1112// String hashing (polynomial rolling hash)13function s size
Hash Table Implementation
Chaining (Separate Chaining):
1class HashTableChaining<K, V> {2 private buckets: Array<Array<[K, V]>>;3 private size: number;4 private capacity: number;5 private loadFactor: number = 0.75;67 constructor(initialCapacity: number = 16) {8 this.capacity = initialCapacity;9 this.buckets = new Array(this.capacity).fill
Open Addressing (Linear Probing):
1class HashTableOpenAddressing<K, V> {2 private buckets: Array<[K, V] | null>;3 private capacity: number;4 private size: number;5 private DELETED = Symbol('DELETED');67 constructor(initialCapacity: number = 16) {8 this.capacity = initialCapacity;9 this.buckets = new Array(this.capacity)
Examples
Two Sum Problem
1function twoSum(nums: number[], target: number): number[] {2 const map = new Map<number, number>();34 for (let i = 0; i < nums.length; i++) {5 const complement = target - nums[i];67 if (map.has(complement)) {8 return [map.get(complement i
Group Anagrams
1function groupAnagrams(strs: string[]): string[][] {2 const map = new Map<string, string[]>();34 for (const str of strs) {5 const key = str.split('').sort().join('');67 if (!map.has(key)) {8 mapkey
Longest Consecutive Sequence
1function longestConsecutive(nums: number[]): number {2 const numSet = new Set(nums);3 let maxLength = 0;45 for (const num of numSet) {6 // Only start counting from the beginning of a sequence7 if (!numSet.has(num - 1)) {8 let currentNum = num;9 let currentLength = 1;1011 while (numSet.has(currentNum + 1)
Common Pitfalls
- Poor hash function: Causes clustering and collisions. Fix: Use well-tested hash functions
- Not handling collisions: Assuming perfect hashing. Fix: Always implement collision resolution
- Load factor too high: Degrades to O(n) performance. Fix: Resize when load factor > 0.75
- Mutable keys: Changing keys after insertion breaks hash table. Fix: Use immutable keys or rehash
- Hash collisions in equals: Two equal objects must have same hash. Fix: Override both hashCode and equals
- Not resizing: Table becomes inefficient. Fix: Implement dynamic resizing
Interview Questions
Beginner
Q: Explain how a hash table works. What is the time complexity of basic operations?
A:
Hash Table Overview:
- Hash function: Maps keys to array indices
- Buckets: Array of buckets to store key-value pairs
- Collision resolution: Handles when multiple keys hash to same index
Time Complexity:
- Insert: O(1) average, O(n) worst case (all collisions)
- Search: O(1) average, O(n) worst case
- Delete: O(1) average, O(n) worst case
Space Complexity: O(n)
How it works:
- Compute hash of key → get index
- Store/retrieve value at that index
- Handle collisions if multiple keys map to same index
Collision Resolution:
- Chaining: Store multiple entries in same bucket (linked list)
- Open addressing: Find next available slot (linear/quadratic probing)
Intermediate
Q: Implement a hash table with chaining. How do you handle resizing and what is the optimal load factor?
A:
class HashTable<K, V> {
private buckets: Array<Array<[K, V]>>;
private size: number;
private capacity: number;
private loadFactor: number = 0.75;
constructor(initialCapacity: number = 16) {
this.capacity = initialCapacity;
this.buckets = new Array(this.capacity).fill(null).map(() => []);
this.size = 0;
}
private hash(key: K): number {
// Use built-in or custom hash function
return Math.abs(String(key).split('').reduce((a, b) => {
a = ((a << 5) - a) + b.charCodeAt(0);
return a & a;
}, 0)) % this.capacity;
}
put(key: K, value: V): void {
const index = this.hash(key);
const bucket = this.buckets[index];
// Update if exists
for (let i = 0; i < bucket.length; i++) {
if (bucket[i][0] === key) {
bucket[i][1] = value;
return;
}
}
// Add new
bucket.push([key, value]);
this.size++;
// Resize if needed
if (this.size / this.capacity > this.loadFactor) {
this.resize();
}
}
get(key: K): V | undefined {
const index = this.hash(key);
const bucket = this.buckets[index];
for (const [k, v] of bucket) {
if (k === key) return v;
}
return undefined;
}
private resize(): void {
const oldBuckets = this.buckets;
this.capacity *= 2;
this.buckets = new Array(this.capacity).fill(null).map(() => []);
this.size = 0;
// Rehash all entries
for (const bucket of oldBuckets) {
for (const [key, value] of bucket) {
this.put(key, value);
}
}
}
}
Load Factor:
- Optimal: 0.75 (balance between space and collisions)
- Too low: Wastes space
- Too high: Many collisions, degrades performance
Resizing:
- Trigger: When load factor > threshold
- New capacity: Usually double (2x)
- Rehashing: All entries must be rehashed with new capacity
Senior
Q: Design a distributed hash table system that handles node failures, data replication, and consistent hashing. How do you ensure data availability and handle rebalancing?
A:
class DistributedHashTable {
private nodes: Map<string, Node>;
private ring: ConsistentHashRing;
private replicationFactor: number = 3;
constructor() {
this.nodes = new Map();
this.ring = new ConsistentHashRing();
}
addNode(nodeId: string, node: Node): void {
this.nodes.set(nodeId, node);
this.ring.addNode(nodeId);
this.rebalance();
}
removeNode(nodeId: string): void {
this.nodes.delete(nodeId);
this.ring.removeNode(nodeId);
this.rebalance();
}
put(key: string, value: any): void {
const primaryNode = this.ring.getNode(key);
const replicaNodes = this.ring.getReplicas(key, this.replicationFactor);
// Write to primary and replicas
const nodesToWrite = [primaryNode, ...replicaNodes];
// Write in parallel
Promise.all(
nodesToWrite.map(nodeId =>
this.nodes.get(nodeId)!.put(key, value)
)
);
}
get(key: string): Promise<any> {
const primaryNode = this.ring.getNode(key);
const replicaNodes = this.ring.getReplicas(key, this.replicationFactor);
// Try primary first, then replicas
const nodesToTry = [primaryNode, ...replicaNodes];
for (const nodeId of nodesToTry) {
try {
const value = await this.nodes.get(nodeId)!.get(key);
if (value !== null) {
// Repair if read from replica
if (nodeId !== primaryNode) {
this.nodes.get(primaryNode)!.put(key, value);
}
return value;
}
} catch (error) {
// Node failed, try next
continue;
}
}
return null;
}
private rebalance(): void {
// Redistribute keys when nodes added/removed
// Only move keys that need to change ownership
const keysToMove = this.ring.getKeysToMove();
for (const { key, oldNode, newNode } of keysToMove) {
const value = await this.nodes.get(oldNode)!.get(key);
if (value) {
await this.nodes.get(newNode)!.put(key, value);
await this.nodes.get(oldNode)!.delete(key);
}
}
}
}
class ConsistentHashRing {
private ring: Map<number, string> = new Map();
private sortedKeys: number[] = [];
addNode(nodeId: string): void {
// Add multiple virtual nodes for better distribution
for (let i = 0; i < 150; i++) {
const hash = this.hash(`${nodeId}:${i}`);
this.ring.set(hash, nodeId);
this.sortedKeys.push(hash);
}
this.sortedKeys.sort((a, b) => a - b);
}
removeNode(nodeId: string): void {
for (let i = 0; i < 150; i++) {
const hash = this.hash(`${nodeId}:${i}`);
this.ring.delete(hash);
const index = this.sortedKeys.indexOf(hash);
if (index > -1) {
this.sortedKeys.splice(index, 1);
}
}
}
getNode(key: string): string {
const hash = this.hash(key);
const index = this.findFirstGreater(hash);
const ringKey = this.sortedKeys[index % this.sortedKeys.length];
return this.ring.get(ringKey)!;
}
getReplicas(key: string, count: number): string[] {
const replicas: string[] = [];
const hash = this.hash(key);
let index = this.findFirstGreater(hash);
while (replicas.length < count) {
const ringKey = this.sortedKeys[index % this.sortedKeys.length];
const nodeId = this.ring.get(ringKey)!;
if (!replicas.includes(nodeId)) {
replicas.push(nodeId);
}
index++;
}
return replicas;
}
private hash(key: string): number {
// Use MD5 or SHA-1 hash
return this.simpleHash(key);
}
private simpleHash(key: string): number {
let hash = 0;
for (let i = 0; i < key.length; i++) {
hash = ((hash << 5) - hash) + key.charCodeAt(i);
hash = hash & hash;
}
return Math.abs(hash);
}
private findFirstGreater(hash: number): number {
// Binary search for first key >= hash
let left = 0;
let right = this.sortedKeys.length;
while (left < right) {
const mid = Math.floor((left + right) / 2);
if (this.sortedKeys[mid] < hash) {
left = mid + 1;
} else {
right = mid;
}
}
return left;
}
getKeysToMove(): Array<{ key: string; oldNode: string; newNode: string }> {
// Calculate which keys need to move
// Implementation depends on tracking key ownership
return [];
}
}
Features:
- Consistent hashing: Minimal key movement on node changes
- Replication: Multiple copies for availability
- Rebalancing: Efficient key redistribution
- Failure handling: Read from replicas if primary fails
-
Hash tables: O(1) average time for insert, search, delete
-
Hash functions: Should distribute keys uniformly
-
Collision resolution: Chaining (simple) vs open addressing (space efficient)
-
Load factor: Keep below 0.75 for good performance
-
Resizing: Double capacity and rehash when load factor exceeded
-
Applications: Fast lookups, counting, caching, distributed systems
-
Trade-offs: Space vs time, collision handling complexity
-
Arrays - Hash tables are often implemented using arrays
-
Time & Space Complexity - Analyzing hash table performance
-
Hash Tables - Deep dive into hash table data structure
-
Searching Algorithms - Hash tables enable O(1) search
-
Distributed Systems - Consistent hashing for distributed systems
Key Takeaways
Hash tables: O(1) average time for insert, search, delete
Hash functions: Should distribute keys uniformly
Collision resolution: Chaining (simple) vs open addressing (space efficient)
Load factor: Keep below 0.75 for good performance
Resizing: Double capacity and rehash when load factor exceeded
Applications: Fast lookups, counting, caching, distributed systems
Trade-offs: Space vs time, collision handling complexity
What's next?