Database Topic
NoSQL Basics
Master NoSQL database fundamentals and when to choose them over relational databases. Essential for modern system design interviews.
NoSQL (Not Only SQL) databases provide alternatives to traditional relational databases, offering different data models and trade-offs.
Why NoSQL?
Limitations of relational databases:
- Schema rigidity
- Scaling challenges (especially writes)
- Complex JOINs across distributed systems
- Overhead for simple use cases
NoSQL advantages:
- Flexible schemas
- Horizontal scaling
- High performance for specific workloads
- Simpler data models
Types of NoSQL Databases
Document Stores
Store data as documents (typically JSON/BSON).
Examples: MongoDB, CouchDB, DynamoDB
{
"user_id": "123",
"name": "John Doe",
"orders": [
{"order_id": "o1", "total": 100},
{"order_id": "o2", "total": 200}
]
}
Use when: Content management, user profiles, catalogs
Key-Value Stores
Simple key-value pairs, extremely fast lookups.
Examples: Redis, Memcached, DynamoDB
key: "user:123"
value: {"name": "John", "email": "john@example.com"}
Use when: Caching, session storage, real-time data
Columnar Databases
Store data in columns rather than rows.
Examples: Cassandra, HBase, Bigtable
Use when: Time-series data, analytics, write-heavy workloads
Graph Databases
Store entities and relationships as a graph.
Examples: Neo4j, Amazon Neptune
Use when: Social networks, recommendation engines, fraud detection
CAP Theorem
In distributed systems, you can guarantee at most two of three properties:
- Consistency: All nodes see the same data simultaneously
- Availability: System remains operational
- Partition tolerance: System continues despite network failures
NoSQL databases make different CAP choices:
- CP: MongoDB, HBase (consistency + partition tolerance)
- AP: Cassandra, DynamoDB (availability + partition tolerance)
- CA: Traditional RDBMS (not distributed)
When to Use NoSQL
Good Fit
- High write throughput: Logging, metrics, IoT data
- Flexible schema: Rapidly evolving data structures
- Horizontal scaling: Need to scale across many servers
- Simple queries: Key lookups, document retrieval
- Large datasets: Big data, analytics
Not a Good Fit
- Complex relationships: Many JOINs, foreign keys
- ACID transactions: Financial systems, critical data
- Complex queries: Ad-hoc reporting, analytics
- Established schema: Well-defined, stable data model
Common Patterns
Eventual Consistency
Data may be temporarily inconsistent but will become consistent.
User A updates profile → Replicates to Node 1, 2, 3
User B reads from Node 2 (not yet updated) → Sees old data
Eventually, all nodes converge to same state
Acceptable for: Social media feeds, comments, non-critical updates
Denormalization
Store redundant data to avoid JOINs.
// Instead of JOINing orders and users
{
"order_id": "o123",
"user_name": "John Doe", // Denormalized
"user_email": "john@example.com", // Denormalized
"total": 100
}
Sharding
Distribute data across multiple servers.
Shard 1: user_id 1-1000
Shard 2: user_id 1001-2000
Shard 3: user_id 2001-3000
Migration Considerations
From SQL to NoSQL
- Identify access patterns: How is data queried?
- Denormalize: Flatten relational structures
- Choose appropriate model: Document vs. key-value vs. graph
- Plan for consistency: Eventual vs. strong consistency
Hybrid Approach
Many systems use both:
- SQL: User accounts, transactions, critical data
- NoSQL: Logs, sessions, analytics, caching
Best Practices
- Start with SQL: Use NoSQL when you have a specific need
- Understand trade-offs: NoSQL isn't always faster or simpler
- Plan for scale: Design for horizontal scaling from the start
- Monitor consistency: Track replication lag, consistency metrics
- Backup strategy: NoSQL backups can be more complex
Common Pitfalls
- Over-normalization: Applying SQL patterns to NoSQL
- Ignoring consistency: Not understanding eventual consistency implications
- Schema-less doesn't mean no schema: Still need data validation
- Tool selection: Choosing NoSQL because it's "modern" not because it fits
Interview Questions
1. Beginner Question
Q: What is NoSQL, and when should you use it instead of a relational database?
A: NoSQL (Not Only SQL) refers to non-relational databases that use different data models than traditional SQL databases.
When to use NoSQL:
- Flexible schema: Rapidly evolving data structures
- High write throughput: Logging, metrics, IoT data
- Horizontal scaling: Need to scale across many servers
- Simple queries: Key lookups, document retrieval
- Large datasets: Big data, analytics
When to use SQL:
- Complex relationships: Many JOINs, foreign keys
- ACID transactions: Financial systems, critical data
- Complex queries: Ad-hoc reporting, analytics
- Established schema: Well-defined, stable data model
Example: Use MongoDB (NoSQL) for user profiles with varying attributes, but PostgreSQL (SQL) for financial transactions requiring ACID guarantees.
2. Intermediate Question
Q: Explain the CAP theorem and how it applies to NoSQL databases.
A: CAP theorem states that in a distributed system, you can guarantee at most two of three properties:
- Consistency: All nodes see the same data simultaneously
- Availability: System remains operational
- Partition tolerance: System continues despite network failures
NoSQL database choices:
-
CP (Consistency + Partition tolerance):
- MongoDB, HBase
- Sacrifices availability during partitions
- Good for: Financial data, critical systems
-
AP (Availability + Partition tolerance):
- Cassandra, DynamoDB
- Sacrifices consistency (eventual consistency)
- Good for: Social feeds, analytics, high availability needs
-
CA (Consistency + Availability):
- Traditional RDBMS (single node)
- Not distributed, so partition tolerance doesn't apply
Example: Amazon DynamoDB chooses AP—it remains available during network partitions but may serve slightly stale data (eventual consistency).
Follow-up: What does "eventual consistency" mean?
- Data may be temporarily inconsistent across nodes
- All nodes will eventually converge to the same state
- Acceptable for many use cases (social feeds, comments)
3. Senior-Level System Question
Q: Design a social media platform supporting 1B users, 10B posts, and 100B likes. How would you use NoSQL databases in the architecture?
A:
Hybrid architecture (SQL + NoSQL):
-
User profiles (MongoDB - Document Store):
// Flexible schema for varying user attributes { user_id: "123", name: "John", bio: "...", preferences: { theme: "dark", notifications: true }, social_links: { twitter: "@john", github: "john" } }Why: Flexible schema, easy to add new fields, good for user-generated content
-
Posts and feeds (Cassandra - Columnar):
-- Partition by user_id for feed queries CREATE TABLE posts ( user_id UUID, post_id UUID, content TEXT, created_at TIMESTAMP, PRIMARY KEY (user_id, created_at, post_id) );Why: High write throughput, horizontal scaling, time-series data
-
Likes and counters (Redis - Key-Value):
SET post:123:likes 5000 SADD post:123:liked_by user:456 user:789Why: Extremely fast reads/writes, real-time counters
-
Social graph (Neo4j - Graph Database):
// Find friends of friends MATCH (user:User {id: 123})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(fof) RETURN fofWhy: Efficient for relationship queries (who follows whom, recommendations)
-
Search (Elasticsearch - Document Store):
- Full-text search on posts
- Denormalized data for fast search
- Why: Optimized for search, not transactional
-
Critical data (PostgreSQL - SQL):
- User authentication
- Payment transactions
- Why: ACID guarantees, complex relationships
Data flow:
User creates post → Write to Cassandra (fast)
→ Update search index (async)
→ Update user's MongoDB profile (async)
User likes post → Increment counter in Redis (fast)
→ Write to Cassandra for persistence (async)
→ Update feed rankings (async)
Consistency strategy:
- Strong consistency: User auth, payments (PostgreSQL)
- Eventual consistency: Feeds, likes, counters (NoSQL)
- Caching: Hot data in Redis for sub-millisecond access
Scaling:
- Sharding: Partition by user_id across nodes
- Replication: Multiple replicas for availability
- Caching: Redis for frequently accessed data
Trade-offs:
- Complexity: Multiple database systems to manage
- Consistency: Eventual consistency requires handling stale data
- Performance: NoSQL provides better write throughput and horizontal scaling
Key Takeaways
- NoSQL databases use different data models—document, key-value, columnar, graph—each optimized for specific use cases
- CAP theorem limits distributed systems—choose two: Consistency, Availability, or Partition tolerance
- Eventual consistency is acceptable for many use cases (social feeds, analytics) but not for financial data
- NoSQL excels at horizontal scaling and high write throughput, but SQL is better for complex queries and relationships
- Hybrid approaches work best—use SQL for critical/transactional data, NoSQL for scale and flexibility
- Schema-less doesn't mean no validation—still need application-level data validation
- Choose based on requirements, not trends—NoSQL isn't always better than SQL
- Understand trade-offs—NoSQL sacrifices ACID guarantees for performance and scalability
- Document stores are good for flexible schemas and content management
- Key-value stores excel at caching and simple lookups
- Columnar databases are optimized for analytics and time-series data
- Graph databases are ideal for relationship-heavy data (social networks, recommendations)
Keep exploring
Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.