Back to problems
Distributed Search System Design (Sharding, Replicas & Scale)
Design a distributed search engine like Google that can index billions of web pages and provide fast, relevant search results. The system must handle web crawling, indexing, ranking, and distributed query processing.
Constraints
Functional
Crawl and index billions of pages, search indexed content, rank by relevance, autocomplete/suggestions, optional image/news search and personalization
Non-functional
< 200ms search, billions of documents, millions of queries/s, regular index updates, high precision, 99.9% uptime
Scale
100B pages, ~50 KB/page → 5 PB (compressed); 10B queries/day, peak ~200K/s; ~1B pages updated/day, ~12K/s crawl; ~500 TB–1 PB index
Stages ahead
1Requirement Analysis
2API Design
3High-Level Design
4HLD Extensions
5Trade-offs