Database Topic
NoSQL Basics
Master NoSQL database fundamentals and when to choose them over relational databases. Essential for modern system design interviews.
NoSQL Basics
Why This Matters
Think of NoSQL databases like different types of storage systems. Relational databases are like filing cabinets—organized, structured, but rigid. NoSQL databases are like different storage solutions: document stores are like folders (flexible structure), key-value stores are like labeled boxes (simple lookups), columnar databases are like spreadsheets (optimized for analytics). Each is designed for different use cases.
This matters because relational databases aren't always the right tool. For simple key-value lookups, a key-value store is faster. For flexible schemas, document stores are better. For analytics, columnar databases are optimized. Understanding NoSQL helps you choose the right database for your use case.
In interviews, when someone asks "How would you design a database for X?", they're testing whether you understand NoSQL vs SQL trade-offs. Do you know when to use document stores? Do you understand CAP theorem? Most engineers don't. They just use relational databases for everything and wonder why it's slow or hard to scale.
What Engineers Usually Get Wrong
Most engineers think "NoSQL means no SQL queries." But NoSQL means "Not Only SQL"—it's about different data models, not about avoiding SQL. Some NoSQL databases support SQL-like queries. The key difference is the data model (document, key-value, columnar, graph) and trade-offs (consistency, availability, partitioning).
Engineers also don't understand CAP theorem. You can't have consistency, availability, and partition tolerance all at once. You must choose two. Relational databases choose consistency and availability (sacrifice partition tolerance). Many NoSQL databases choose availability and partition tolerance (sacrifice consistency, use eventual consistency). Understanding this helps you choose the right database.
How This Breaks Systems in the Real World
A service was using a relational database for a high-traffic application. The database became a bottleneck. Writes were slow, and scaling was difficult. The team tried to optimize, but the relational model wasn't suited for the workload. The fix? Use a NoSQL database. For this use case, a document store or key-value store would be faster and easier to scale. But the team had to migrate data and rewrite queries.
Another story: A service was using a NoSQL database but didn't understand eventual consistency. Users would write data, then immediately read it, but sometimes they didn't see their changes (replication lag). Users thought their writes failed. The fix? Understand eventual consistency. Read from the primary immediately after writes, or use consistent reads. Or accept eventual consistency and design the UI to handle it.
Why NoSQL?
Limitations of relational databases:
- Schema rigidity
- Scaling challenges (especially writes)
- Complex JOINs across distributed systems
- Overhead for simple use cases
NoSQL advantages:
- Flexible schemas
- Horizontal scaling
- High performance for specific workloads
- Simpler data models
Types of NoSQL Databases
Document Stores
Store data as documents (typically JSON/BSON).
Examples: MongoDB, CouchDB, DynamoDB
{
"user_id": "123",
"name": "John Doe",
"orders": [
{"order_id": "o1", "total": 100},
{"order_id": "o2", "total": 200}
]
}
Use when: Content management, user profiles, catalogs
Key-Value Stores
Simple key-value pairs, extremely fast lookups.
Examples: Redis, Memcached, DynamoDB
key: "user:123"
value: {"name": "John", "email": "john@example.com"}
Use when: Caching, session storage, real-time data
Columnar Databases
Store data in columns rather than rows.
Examples: Cassandra, HBase, Bigtable
Use when: Time-series data, analytics, write-heavy workloads
Graph Databases
Store entities and relationships as a graph.
Examples: Neo4j, Amazon Neptune
Use when: Social networks, recommendation engines, fraud detection
CAP Theorem
In distributed systems, you can guarantee at most two of three properties:
- Consistency: All nodes see the same data simultaneously
- Availability: System remains operational
- Partition tolerance: System continues despite network failures
NoSQL databases make different CAP choices:
- CP: MongoDB, HBase (consistency + partition tolerance)
- AP: Cassandra, DynamoDB (availability + partition tolerance)
- CA: Traditional RDBMS (not distributed)
When to Use NoSQL
Good Fit
- High write throughput: Logging, metrics, IoT data
- Flexible schema: Rapidly evolving data structures
- Horizontal scaling: Need to scale across many servers
- Simple queries: Key lookups, document retrieval
- Large datasets: Big data, analytics
Not a Good Fit
- Complex relationships: Many JOINs, foreign keys
- ACID transactions: Financial systems, critical data
- Complex queries: Ad-hoc reporting, analytics
- Established schema: Well-defined, stable data model
Common Patterns
Eventual Consistency
Data may be temporarily inconsistent but will become consistent.
User A updates profile → Replicates to Node 1, 2, 3
User B reads from Node 2 (not yet updated) → Sees old data
Eventually, all nodes converge to same state
Acceptable for: Social media feeds, comments, non-critical updates
Denormalization
Store redundant data to avoid JOINs.
// Instead of JOINing orders and users
{
"order_id": "o123",
"user_name": "John Doe", // Denormalized
"user_email": "john@example.com", // Denormalized
"total": 100
}
Sharding
Distribute data across multiple servers.
Shard 1: user_id 1-1000
Shard 2: user_id 1001-2000
Shard 3: user_id 2001-3000
Migration Considerations
From SQL to NoSQL
- Identify access patterns: How is data queried?
- Denormalize: Flatten relational structures
- Choose appropriate model: Document vs. key-value vs. graph
- Plan for consistency: Eventual vs. strong consistency
Hybrid Approach
Many systems use both:
- SQL: User accounts, transactions, critical data
- NoSQL: Logs, sessions, analytics, caching
Best Practices
- Start with SQL: Use NoSQL when you have a specific need
- Understand trade-offs: NoSQL isn't always faster or simpler
- Plan for scale: Design for horizontal scaling from the start
- Monitor consistency: Track replication lag, consistency metrics
- Backup strategy: NoSQL backups can be more complex
Common Pitfalls
- Over-normalization: Applying SQL patterns to NoSQL
- Ignoring consistency: Not understanding eventual consistency implications
- Schema-less doesn't mean no schema: Still need data validation
- Tool selection: Choosing NoSQL because it's "modern" not because it fits
Interview Questions
1. Beginner Question
Q: What is NoSQL, and when should you use it instead of a relational database?
A: NoSQL (Not Only SQL) refers to non-relational databases that use different data models than traditional SQL databases.
When to use NoSQL:
- Flexible schema: Rapidly evolving data structures
- High write throughput: Logging, metrics, IoT data
- Horizontal scaling: Need to scale across many servers
- Simple queries: Key lookups, document retrieval
- Large datasets: Big data, analytics
When to use SQL:
- Complex relationships: Many JOINs, foreign keys
- ACID transactions: Financial systems, critical data
- Complex queries: Ad-hoc reporting, analytics
- Established schema: Well-defined, stable data model
Example: Use MongoDB (NoSQL) for user profiles with varying attributes, but PostgreSQL (SQL) for financial transactions requiring ACID guarantees.
2. Intermediate Question
Q: Explain the CAP theorem and how it applies to NoSQL databases.
A: CAP theorem states that in a distributed system, you can guarantee at most two of three properties:
- Consistency: All nodes see the same data simultaneously
- Availability: System remains operational
- Partition tolerance: System continues despite network failures
NoSQL database choices:
-
CP (Consistency + Partition tolerance):
- MongoDB, HBase
- Sacrifices availability during partitions
- Good for: Financial data, critical systems
-
AP (Availability + Partition tolerance):
- Cassandra, DynamoDB
- Sacrifices consistency (eventual consistency)
- Good for: Social feeds, analytics, high availability needs
-
CA (Consistency + Availability):
- Traditional RDBMS (single node)
- Not distributed, so partition tolerance doesn't apply
Example: Amazon DynamoDB chooses AP—it remains available during network partitions but may serve slightly stale data (eventual consistency).
Follow-up: What does "eventual consistency" mean?
- Data may be temporarily inconsistent across nodes
- All nodes will eventually converge to the same state
- Acceptable for many use cases (social feeds, comments)
3. Senior-Level System Question
Q: Design a social media platform supporting 1B users, 10B posts, and 100B likes. How would you use NoSQL databases in the architecture?
A:
Hybrid architecture (SQL + NoSQL):
-
User profiles (MongoDB - Document Store):
// Flexible schema for varying user attributes { user_id: "123", name: "John", bio: "...", preferences: { theme: "dark", notifications: true }, social_links: { twitter: "@john", github: "john" } }Why: Flexible schema, easy to add new fields, good for user-generated content
-
Posts and feeds (Cassandra - Columnar):
-- Partition by user_id for feed queries CREATE TABLE posts ( user_id UUID, post_id UUID, content TEXT, created_at TIMESTAMP, PRIMARY KEY (user_id, created_at, post_id) );Why: High write throughput, horizontal scaling, time-series data
-
Likes and counters (Redis - Key-Value):
SET post:123:likes 5000 SADD post:123:liked_by user:456 user:789Why: Extremely fast reads/writes, real-time counters
-
Social graph (Neo4j - Graph Database):
// Find friends of friends MATCH (user:User {id: 123})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(fof) RETURN fofWhy: Efficient for relationship queries (who follows whom, recommendations)
-
Search (Elasticsearch - Document Store):
- Full-text search on posts
- Denormalized data for fast search
- Why: Optimized for search, not transactional
-
Critical data (PostgreSQL - SQL):
- User authentication
- Payment transactions
- Why: ACID guarantees, complex relationships
Data flow:
User creates post → Write to Cassandra (fast)
→ Update search index (async)
→ Update user's MongoDB profile (async)
User likes post → Increment counter in Redis (fast)
→ Write to Cassandra for persistence (async)
→ Update feed rankings (async)
Consistency strategy:
- Strong consistency: User auth, payments (PostgreSQL)
- Eventual consistency: Feeds, likes, counters (NoSQL)
- Caching: Hot data in Redis for sub-millisecond access
Scaling:
- Sharding: Partition by user_id across nodes
- Replication: Multiple replicas for availability
- Caching: Redis for frequently accessed data
Trade-offs:
- Complexity: Multiple database systems to manage
- Consistency: Eventual consistency requires handling stale data
- Performance: NoSQL provides better write throughput and horizontal scaling
-
NoSQL databases use different data models—document, key-value, columnar, graph—each optimized for specific use cases
-
CAP theorem limits distributed systems—choose two: Consistency, Availability, or Partition tolerance
-
Eventual consistency is acceptable for many use cases (social feeds, analytics) but not for financial data
-
NoSQL excels at horizontal scaling and high write throughput, but SQL is better for complex queries and relationships
-
Hybrid approaches work best—use SQL for critical/transactional data, NoSQL for scale and flexibility
-
Schema-less doesn't mean no validation—still need application-level data validation
-
Choose based on requirements, not trends—NoSQL isn't always better than SQL
-
Understand trade-offs—NoSQL sacrifices ACID guarantees for performance and scalability
-
Document stores are good for flexible schemas and content management
-
Key-value stores excel at caching and simple lookups
-
Columnar databases are optimized for analytics and time-series data
-
Graph databases are ideal for relationship-heavy data (social networks, recommendations)
-
Document Stores - One type of NoSQL database. Understanding document stores helps understand NoSQL data modeling.
-
Key-Value Stores - Simple NoSQL database type. Understanding key-value stores helps understand NoSQL basics.
-
Columnar Databases - NoSQL database optimized for analytics. Understanding columnar databases helps understand NoSQL variety.
-
Time-Series Databases - Specialized NoSQL database for time-stamped data. Understanding time-series databases helps understand NoSQL specialization.
-
ACID Properties - NoSQL databases often relax ACID for performance. Understanding ACID helps understand NoSQL trade-offs.
-
Data Replication - NoSQL databases use replication for availability. Understanding replication helps understand NoSQL architectures.
Key Takeaways
NoSQL databases use different data models—document, key-value, columnar, graph—each optimized for specific use cases
CAP theorem limits distributed systems—choose two: Consistency, Availability, or Partition tolerance
Eventual consistency is acceptable for many use cases (social feeds, analytics) but not for financial data
NoSQL excels at horizontal scaling and high write throughput, but SQL is better for complex queries and relationships
Hybrid approaches work best—use SQL for critical/transactional data, NoSQL for scale and flexibility
Schema-less doesn't mean no validation—still need application-level data validation
Choose based on requirements, not trends—NoSQL isn't always better than SQL
Understand trade-offs—NoSQL sacrifices ACID guarantees for performance and scalability
Document stores are good for flexible schemas and content management
Key-value stores excel at caching and simple lookups
Columnar databases are optimized for analytics and time-series data
Graph databases are ideal for relationship-heavy data (social networks, recommendations)
Related Topics
Document Stores
One type of NoSQL database. Understanding document stores helps understand NoSQL data modeling.
Key-Value Stores
Simple NoSQL database type. Understanding key-value stores helps understand NoSQL basics.
Columnar Databases
NoSQL database optimized for analytics. Understanding columnar databases helps understand NoSQL variety.
Time-Series Databases
Specialized NoSQL database for time-stamped data. Understanding time-series databases helps understand NoSQL specialization.
ACID Properties
NoSQL databases often relax ACID for performance. Understanding ACID helps understand NoSQL trade-offs.
Data Replication
NoSQL databases use replication for availability. Understanding replication helps understand NoSQL architectures.
Keep exploring
Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.