Database Topic

NoSQL Basics

Master NoSQL database fundamentals and when to choose them over relational databases. Essential for modern system design interviews.

NoSQL Basics

Why This Matters

Think of NoSQL databases like different types of storage systems. Relational databases are like filing cabinets—organized, structured, but rigid. NoSQL databases are like different storage solutions: document stores are like folders (flexible structure), key-value stores are like labeled boxes (simple lookups), columnar databases are like spreadsheets (optimized for analytics). Each is designed for different use cases.

This matters because relational databases aren't always the right tool. For simple key-value lookups, a key-value store is faster. For flexible schemas, document stores are better. For analytics, columnar databases are optimized. Understanding NoSQL helps you choose the right database for your use case.

In interviews, when someone asks "How would you design a database for X?", they're testing whether you understand NoSQL vs SQL trade-offs. Do you know when to use document stores? Do you understand CAP theorem? Most engineers don't. They just use relational databases for everything and wonder why it's slow or hard to scale.

What Engineers Usually Get Wrong

Most engineers think "NoSQL means no SQL queries." But NoSQL means "Not Only SQL"—it's about different data models, not about avoiding SQL. Some NoSQL databases support SQL-like queries. The key difference is the data model (document, key-value, columnar, graph) and trade-offs (consistency, availability, partitioning).

Engineers also don't understand CAP theorem. You can't have consistency, availability, and partition tolerance all at once. You must choose two. Relational databases choose consistency and availability (sacrifice partition tolerance). Many NoSQL databases choose availability and partition tolerance (sacrifice consistency, use eventual consistency). Understanding this helps you choose the right database.

How This Breaks Systems in the Real World

A service was using a relational database for a high-traffic application. The database became a bottleneck. Writes were slow, and scaling was difficult. The team tried to optimize, but the relational model wasn't suited for the workload. The fix? Use a NoSQL database. For this use case, a document store or key-value store would be faster and easier to scale. But the team had to migrate data and rewrite queries.

Another story: A service was using a NoSQL database but didn't understand eventual consistency. Users would write data, then immediately read it, but sometimes they didn't see their changes (replication lag). Users thought their writes failed. The fix? Understand eventual consistency. Read from the primary immediately after writes, or use consistent reads. Or accept eventual consistency and design the UI to handle it.

Why NoSQL?

Limitations of relational databases:

Schema rigidity
Scaling challenges (especially writes)
Complex JOINs across distributed systems
Overhead for simple use cases

NoSQL advantages:

Flexible schemas
Horizontal scaling
High performance for specific workloads
Simpler data models

Types of NoSQL Databases

Document Stores

Store data as documents (typically JSON/BSON).

Examples: MongoDB, CouchDB, DynamoDB

{
  "user_id": "123",
  "name": "John Doe",
  "orders": [
    {"order_id": "o1", "total": 100},
    {"order_id": "o2", "total": 200}
  ]
}

Use when: Content management, user profiles, catalogs

Key-Value Stores

Simple key-value pairs, extremely fast lookups.

Examples: Redis, Memcached, DynamoDB

key: "user:123"
value: {"name": "John", "email": "john@example.com"}

Use when: Caching, session storage, real-time data

Columnar Databases

Store data in columns rather than rows.

Examples: Cassandra, HBase, Bigtable

Use when: Time-series data, analytics, write-heavy workloads

Graph Databases

Store entities and relationships as a graph.

Examples: Neo4j, Amazon Neptune

Use when: Social networks, recommendation engines, fraud detection

CAP Theorem

In distributed systems, you can guarantee at most two of three properties:

Consistency: All nodes see the same data simultaneously
Availability: System remains operational
Partition tolerance: System continues despite network failures

NoSQL databases make different CAP choices:

CP: MongoDB, HBase (consistency + partition tolerance)
AP: Cassandra, DynamoDB (availability + partition tolerance)
CA: Traditional RDBMS (not distributed)

When to Use NoSQL

Good Fit

High write throughput: Logging, metrics, IoT data
Flexible schema: Rapidly evolving data structures
Horizontal scaling: Need to scale across many servers
Simple queries: Key lookups, document retrieval
Large datasets: Big data, analytics

Not a Good Fit

Complex relationships: Many JOINs, foreign keys
ACID transactions: Financial systems, critical data
Complex queries: Ad-hoc reporting, analytics
Established schema: Well-defined, stable data model

Common Patterns

Eventual Consistency

Data may be temporarily inconsistent but will become consistent.

User A updates profile → Replicates to Node 1, 2, 3
User B reads from Node 2 (not yet updated) → Sees old data
Eventually, all nodes converge to same state

Acceptable for: Social media feeds, comments, non-critical updates

Denormalization

Store redundant data to avoid JOINs.

// Instead of JOINing orders and users
{
  "order_id": "o123",
  "user_name": "John Doe",  // Denormalized
  "user_email": "john@example.com",  // Denormalized
  "total": 100
}

Sharding

Distribute data across multiple servers.

Shard 1: user_id 1-1000
Shard 2: user_id 1001-2000
Shard 3: user_id 2001-3000

Migration Considerations

From SQL to NoSQL

Identify access patterns: How is data queried?
Denormalize: Flatten relational structures
Choose appropriate model: Document vs. key-value vs. graph
Plan for consistency: Eventual vs. strong consistency

Hybrid Approach

Many systems use both:

SQL: User accounts, transactions, critical data
NoSQL: Logs, sessions, analytics, caching

Best Practices

Start with SQL: Use NoSQL when you have a specific need
Understand trade-offs: NoSQL isn't always faster or simpler
Plan for scale: Design for horizontal scaling from the start
Monitor consistency: Track replication lag, consistency metrics
Backup strategy: NoSQL backups can be more complex

Common Pitfalls

Over-normalization: Applying SQL patterns to NoSQL
Ignoring consistency: Not understanding eventual consistency implications
Schema-less doesn't mean no schema: Still need data validation
Tool selection: Choosing NoSQL because it's "modern" not because it fits

Interview Questions

1. Beginner Question

Q: What is NoSQL, and when should you use it instead of a relational database?

A: NoSQL (Not Only SQL) refers to non-relational databases that use different data models than traditional SQL databases.

When to use NoSQL:

Flexible schema: Rapidly evolving data structures
High write throughput: Logging, metrics, IoT data
Horizontal scaling: Need to scale across many servers
Simple queries: Key lookups, document retrieval
Large datasets: Big data, analytics

When to use SQL:

Complex relationships: Many JOINs, foreign keys
ACID transactions: Financial systems, critical data
Complex queries: Ad-hoc reporting, analytics
Established schema: Well-defined, stable data model

Example: Use MongoDB (NoSQL) for user profiles with varying attributes, but PostgreSQL (SQL) for financial transactions requiring ACID guarantees.

2. Intermediate Question

Q: Explain the CAP theorem and how it applies to NoSQL databases.

A: CAP theorem states that in a distributed system, you can guarantee at most two of three properties:

Consistency: All nodes see the same data simultaneously
Availability: System remains operational
Partition tolerance: System continues despite network failures

NoSQL database choices:

CP (Consistency + Partition tolerance):
- MongoDB, HBase
- Sacrifices availability during partitions
- Good for: Financial data, critical systems
AP (Availability + Partition tolerance):
- Cassandra, DynamoDB
- Sacrifices consistency (eventual consistency)
- Good for: Social feeds, analytics, high availability needs
CA (Consistency + Availability):
- Traditional RDBMS (single node)
- Not distributed, so partition tolerance doesn't apply

Example: Amazon DynamoDB chooses AP—it remains available during network partitions but may serve slightly stale data (eventual consistency).

Follow-up: What does "eventual consistency" mean?

Data may be temporarily inconsistent across nodes
All nodes will eventually converge to the same state
Acceptable for many use cases (social feeds, comments)

3. Senior-Level System Question

Q: Design a social media platform supporting 1B users, 10B posts, and 100B likes. How would you use NoSQL databases in the architecture?

Hybrid architecture (SQL + NoSQL):

User profiles (MongoDB - Document Store):

// Flexible schema for varying user attributes
{
  user_id: "123",
  name: "John",
  bio: "...",
  preferences: { theme: "dark", notifications: true },
  social_links: { twitter: "@john", github: "john" }
}

Why: Flexible schema, easy to add new fields, good for user-generated content

Posts and feeds (Cassandra - Columnar):

-- Partition by user_id for feed queries
CREATE TABLE posts (
  user_id UUID,
  post_id UUID,
  content TEXT,
  created_at TIMESTAMP,
  PRIMARY KEY (user_id, created_at, post_id)
);

Why: High write throughput, horizontal scaling, time-series data

Likes and counters (Redis - Key-Value):
```
SET post:123:likes 5000
SADD post:123:liked_by user:456 user:789
```
Why: Extremely fast reads/writes, real-time counters

Social graph (Neo4j - Graph Database):

// Find friends of friends
MATCH (user:User {id: 123})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(fof)
RETURN fof

Why: Efficient for relationship queries (who follows whom, recommendations)

Search (Elasticsearch - Document Store):
- Full-text search on posts
- Denormalized data for fast search
- Why: Optimized for search, not transactional
Critical data (PostgreSQL - SQL):
- User authentication
- Payment transactions
- Why: ACID guarantees, complex relationships

Data flow:

User creates post → Write to Cassandra (fast)
                 → Update search index (async)
                 → Update user's MongoDB profile (async)

User likes post → Increment counter in Redis (fast)
                → Write to Cassandra for persistence (async)
                → Update feed rankings (async)

Consistency strategy:

Strong consistency: User auth, payments (PostgreSQL)
Eventual consistency: Feeds, likes, counters (NoSQL)
Caching: Hot data in Redis for sub-millisecond access

Scaling:

Sharding: Partition by user_id across nodes
Replication: Multiple replicas for availability
Caching: Redis for frequently accessed data

Trade-offs:

Complexity: Multiple database systems to manage
Consistency: Eventual consistency requires handling stale data
Performance: NoSQL provides better write throughput and horizontal scaling

NoSQL databases use different data models—document, key-value, columnar, graph—each optimized for specific use cases
CAP theorem limits distributed systems—choose two: Consistency, Availability, or Partition tolerance
Eventual consistency is acceptable for many use cases (social feeds, analytics) but not for financial data
NoSQL excels at horizontal scaling and high write throughput, but SQL is better for complex queries and relationships
Hybrid approaches work best—use SQL for critical/transactional data, NoSQL for scale and flexibility
Schema-less doesn't mean no validation—still need application-level data validation
Choose based on requirements, not trends—NoSQL isn't always better than SQL
Understand trade-offs—NoSQL sacrifices ACID guarantees for performance and scalability
Document stores are good for flexible schemas and content management
Key-value stores excel at caching and simple lookups
Columnar databases are optimized for analytics and time-series data
Graph databases are ideal for relationship-heavy data (social networks, recommendations)
Document Stores - One type of NoSQL database. Understanding document stores helps understand NoSQL data modeling.
Key-Value Stores - Simple NoSQL database type. Understanding key-value stores helps understand NoSQL basics.
Columnar Databases - NoSQL database optimized for analytics. Understanding columnar databases helps understand NoSQL variety.
Time-Series Databases - Specialized NoSQL database for time-stamped data. Understanding time-series databases helps understand NoSQL specialization.
ACID Properties - NoSQL databases often relax ACID for performance. Understanding ACID helps understand NoSQL trade-offs.
Data Replication - NoSQL databases use replication for availability. Understanding replication helps understand NoSQL architectures.

Key Takeaways

NoSQL databases use different data models—document, key-value, columnar, graph—each optimized for specific use cases

CAP theorem limits distributed systems—choose two: Consistency, Availability, or Partition tolerance

Eventual consistency is acceptable for many use cases (social feeds, analytics) but not for financial data

NoSQL excels at horizontal scaling and high write throughput, but SQL is better for complex queries and relationships

Hybrid approaches work best—use SQL for critical/transactional data, NoSQL for scale and flexibility

Schema-less doesn't mean no validation—still need application-level data validation

Choose based on requirements, not trends—NoSQL isn't always better than SQL

Understand trade-offs—NoSQL sacrifices ACID guarantees for performance and scalability

Document stores are good for flexible schemas and content management

Key-value stores excel at caching and simple lookups

Columnar databases are optimized for analytics and time-series data

Graph databases are ideal for relationship-heavy data (social networks, recommendations)

Keep exploring

Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.

NoSQL Basics

NoSQL Basics

Why This Matters

What Engineers Usually Get Wrong

How This Breaks Systems in the Real World

Why NoSQL?

Types of NoSQL Databases

Document Stores

Key-Value Stores

Columnar Databases

Graph Databases

CAP Theorem

When to Use NoSQL

Good Fit

Not a Good Fit

Common Patterns

Eventual Consistency

Denormalization

Sharding

Migration Considerations

From SQL to NoSQL

Hybrid Approach

Best Practices

Common Pitfalls

Interview Questions

1. Beginner Question

2. Intermediate Question

3. Senior-Level System Question

Key Takeaways

Related Topics

Keep exploring