Distributed Search System Design

Design a distributed search engine like Google that can index billions of web pages and provide fast, relevant search results. The system must handle web crawling, indexing, ranking, and distributed query processing.

Constraints

Functional

Crawl and index billions of pages, search indexed content, rank by relevance, autocomplete/suggestions, optional image/news search and personalization

Non-functional

< 200ms search, billions of documents, millions of queries/s, regular index updates, high precision, 99.9% uptime

Scale

100B pages, ~50 KB/page → 5 PB (compressed); 10B queries/day, peak ~200K/s; ~1B pages updated/day, ~12K/s crawl; ~500 TB–1 PB index

Stages ahead

1Requirement Analysis

2API Design

3High-Level Design

4HLD Extensions

5Trade-offs