← Back to databases

Database Topic

NoSQL Basics

Master NoSQL database fundamentals and when to choose them over relational databases. Essential for modern system design interviews.

NoSQL Basics

Why This Matters

Think of NoSQL databases like different types of storage systems. Relational databases are like filing cabinets—organized, structured, but rigid. NoSQL databases are like different storage solutions: document stores are like folders (flexible structure), key-value stores are like labeled boxes (simple lookups), columnar databases are like spreadsheets (optimized for analytics). Each is designed for different use cases.

This matters because relational databases aren't always the right tool. For simple key-value lookups, a key-value store is faster. For flexible schemas, document stores are better. For analytics, columnar databases are optimized. Understanding NoSQL helps you choose the right database for your use case.

In interviews, when someone asks "How would you design a database for X?", they're testing whether you understand NoSQL vs SQL trade-offs. Do you know when to use document stores? Do you understand CAP theorem? Most engineers don't. They just use relational databases for everything and wonder why it's slow or hard to scale.

What Engineers Usually Get Wrong

Most engineers think "NoSQL means no SQL queries." But NoSQL means "Not Only SQL"—it's about different data models, not about avoiding SQL. Some NoSQL databases support SQL-like queries. The key difference is the data model (document, key-value, columnar, graph) and trade-offs (consistency, availability, partitioning).

Engineers also don't understand CAP theorem. You can't have consistency, availability, and partition tolerance all at once. You must choose two. Relational databases choose consistency and availability (sacrifice partition tolerance). Many NoSQL databases choose availability and partition tolerance (sacrifice consistency, use eventual consistency). Understanding this helps you choose the right database.

How This Breaks Systems in the Real World

A service was using a relational database for a high-traffic application. The database became a bottleneck. Writes were slow, and scaling was difficult. The team tried to optimize, but the relational model wasn't suited for the workload. The fix? Use a NoSQL database. For this use case, a document store or key-value store would be faster and easier to scale. But the team had to migrate data and rewrite queries.

Another story: A service was using a NoSQL database but didn't understand eventual consistency. Users would write data, then immediately read it, but sometimes they didn't see their changes (replication lag). Users thought their writes failed. The fix? Understand eventual consistency. Read from the primary immediately after writes, or use consistent reads. Or accept eventual consistency and design the UI to handle it.


Why NoSQL?

Limitations of relational databases:

  • Schema rigidity
  • Scaling challenges (especially writes)
  • Complex JOINs across distributed systems
  • Overhead for simple use cases

NoSQL advantages:

  • Flexible schemas
  • Horizontal scaling
  • High performance for specific workloads
  • Simpler data models

Types of NoSQL Databases

Document Stores

Store data as documents (typically JSON/BSON).

Examples: MongoDB, CouchDB, DynamoDB

{
  "user_id": "123",
  "name": "John Doe",
  "orders": [
    {"order_id": "o1", "total": 100},
    {"order_id": "o2", "total": 200}
  ]
}

Use when: Content management, user profiles, catalogs

Key-Value Stores

Simple key-value pairs, extremely fast lookups.

Examples: Redis, Memcached, DynamoDB

key: "user:123"
value: {"name": "John", "email": "john@example.com"}

Use when: Caching, session storage, real-time data

Columnar Databases

Store data in columns rather than rows.

Examples: Cassandra, HBase, Bigtable

Use when: Time-series data, analytics, write-heavy workloads

Graph Databases

Store entities and relationships as a graph.

Examples: Neo4j, Amazon Neptune

Use when: Social networks, recommendation engines, fraud detection


CAP Theorem

In distributed systems, you can guarantee at most two of three properties:

  • Consistency: All nodes see the same data simultaneously
  • Availability: System remains operational
  • Partition tolerance: System continues despite network failures

NoSQL databases make different CAP choices:

  • CP: MongoDB, HBase (consistency + partition tolerance)
  • AP: Cassandra, DynamoDB (availability + partition tolerance)
  • CA: Traditional RDBMS (not distributed)

When to Use NoSQL

Good Fit

  • High write throughput: Logging, metrics, IoT data
  • Flexible schema: Rapidly evolving data structures
  • Horizontal scaling: Need to scale across many servers
  • Simple queries: Key lookups, document retrieval
  • Large datasets: Big data, analytics

Not a Good Fit

  • Complex relationships: Many JOINs, foreign keys
  • ACID transactions: Financial systems, critical data
  • Complex queries: Ad-hoc reporting, analytics
  • Established schema: Well-defined, stable data model

Common Patterns

Eventual Consistency

Data may be temporarily inconsistent but will become consistent.

User A updates profile → Replicates to Node 1, 2, 3
User B reads from Node 2 (not yet updated) → Sees old data
Eventually, all nodes converge to same state

Acceptable for: Social media feeds, comments, non-critical updates

Denormalization

Store redundant data to avoid JOINs.

// Instead of JOINing orders and users
{
  "order_id": "o123",
  "user_name": "John Doe",  // Denormalized
  "user_email": "john@example.com",  // Denormalized
  "total": 100
}

Sharding

Distribute data across multiple servers.

Shard 1: user_id 1-1000
Shard 2: user_id 1001-2000
Shard 3: user_id 2001-3000

Migration Considerations

From SQL to NoSQL

  1. Identify access patterns: How is data queried?
  2. Denormalize: Flatten relational structures
  3. Choose appropriate model: Document vs. key-value vs. graph
  4. Plan for consistency: Eventual vs. strong consistency

Hybrid Approach

Many systems use both:

  • SQL: User accounts, transactions, critical data
  • NoSQL: Logs, sessions, analytics, caching

Best Practices

  1. Start with SQL: Use NoSQL when you have a specific need
  2. Understand trade-offs: NoSQL isn't always faster or simpler
  3. Plan for scale: Design for horizontal scaling from the start
  4. Monitor consistency: Track replication lag, consistency metrics
  5. Backup strategy: NoSQL backups can be more complex

Common Pitfalls

  • Over-normalization: Applying SQL patterns to NoSQL
  • Ignoring consistency: Not understanding eventual consistency implications
  • Schema-less doesn't mean no schema: Still need data validation
  • Tool selection: Choosing NoSQL because it's "modern" not because it fits

Interview Questions

1. Beginner Question

Q: What is NoSQL, and when should you use it instead of a relational database?

A: NoSQL (Not Only SQL) refers to non-relational databases that use different data models than traditional SQL databases.

When to use NoSQL:

  • Flexible schema: Rapidly evolving data structures
  • High write throughput: Logging, metrics, IoT data
  • Horizontal scaling: Need to scale across many servers
  • Simple queries: Key lookups, document retrieval
  • Large datasets: Big data, analytics

When to use SQL:

  • Complex relationships: Many JOINs, foreign keys
  • ACID transactions: Financial systems, critical data
  • Complex queries: Ad-hoc reporting, analytics
  • Established schema: Well-defined, stable data model

Example: Use MongoDB (NoSQL) for user profiles with varying attributes, but PostgreSQL (SQL) for financial transactions requiring ACID guarantees.

2. Intermediate Question

Q: Explain the CAP theorem and how it applies to NoSQL databases.

A: CAP theorem states that in a distributed system, you can guarantee at most two of three properties:

  • Consistency: All nodes see the same data simultaneously
  • Availability: System remains operational
  • Partition tolerance: System continues despite network failures

NoSQL database choices:

  1. CP (Consistency + Partition tolerance):

    • MongoDB, HBase
    • Sacrifices availability during partitions
    • Good for: Financial data, critical systems
  2. AP (Availability + Partition tolerance):

    • Cassandra, DynamoDB
    • Sacrifices consistency (eventual consistency)
    • Good for: Social feeds, analytics, high availability needs
  3. CA (Consistency + Availability):

    • Traditional RDBMS (single node)
    • Not distributed, so partition tolerance doesn't apply

Example: Amazon DynamoDB chooses AP—it remains available during network partitions but may serve slightly stale data (eventual consistency).

Follow-up: What does "eventual consistency" mean?

  • Data may be temporarily inconsistent across nodes
  • All nodes will eventually converge to the same state
  • Acceptable for many use cases (social feeds, comments)

3. Senior-Level System Question

Q: Design a social media platform supporting 1B users, 10B posts, and 100B likes. How would you use NoSQL databases in the architecture?

A:

Hybrid architecture (SQL + NoSQL):

  1. User profiles (MongoDB - Document Store):

    // Flexible schema for varying user attributes
    {
      user_id: "123",
      name: "John",
      bio: "...",
      preferences: { theme: "dark", notifications: true },
      social_links: { twitter: "@john", github: "john" }
    }
    

    Why: Flexible schema, easy to add new fields, good for user-generated content

  2. Posts and feeds (Cassandra - Columnar):

    -- Partition by user_id for feed queries
    CREATE TABLE posts (
      user_id UUID,
      post_id UUID,
      content TEXT,
      created_at TIMESTAMP,
      PRIMARY KEY (user_id, created_at, post_id)
    );
    

    Why: High write throughput, horizontal scaling, time-series data

  3. Likes and counters (Redis - Key-Value):

    SET post:123:likes 5000
    SADD post:123:liked_by user:456 user:789
    

    Why: Extremely fast reads/writes, real-time counters

  4. Social graph (Neo4j - Graph Database):

    // Find friends of friends
    MATCH (user:User {id: 123})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(fof)
    RETURN fof
    

    Why: Efficient for relationship queries (who follows whom, recommendations)

  5. Search (Elasticsearch - Document Store):

    • Full-text search on posts
    • Denormalized data for fast search
    • Why: Optimized for search, not transactional
  6. Critical data (PostgreSQL - SQL):

    • User authentication
    • Payment transactions
    • Why: ACID guarantees, complex relationships

Data flow:

User creates post → Write to Cassandra (fast)
                 → Update search index (async)
                 → Update user's MongoDB profile (async)

User likes post → Increment counter in Redis (fast)
                → Write to Cassandra for persistence (async)
                → Update feed rankings (async)

Consistency strategy:

  • Strong consistency: User auth, payments (PostgreSQL)
  • Eventual consistency: Feeds, likes, counters (NoSQL)
  • Caching: Hot data in Redis for sub-millisecond access

Scaling:

  • Sharding: Partition by user_id across nodes
  • Replication: Multiple replicas for availability
  • Caching: Redis for frequently accessed data

Trade-offs:

  • Complexity: Multiple database systems to manage
  • Consistency: Eventual consistency requires handling stale data
  • Performance: NoSQL provides better write throughput and horizontal scaling

  • NoSQL databases use different data models—document, key-value, columnar, graph—each optimized for specific use cases

  • CAP theorem limits distributed systems—choose two: Consistency, Availability, or Partition tolerance

  • Eventual consistency is acceptable for many use cases (social feeds, analytics) but not for financial data

  • NoSQL excels at horizontal scaling and high write throughput, but SQL is better for complex queries and relationships

  • Hybrid approaches work best—use SQL for critical/transactional data, NoSQL for scale and flexibility

  • Schema-less doesn't mean no validation—still need application-level data validation

  • Choose based on requirements, not trends—NoSQL isn't always better than SQL

  • Understand trade-offs—NoSQL sacrifices ACID guarantees for performance and scalability

  • Document stores are good for flexible schemas and content management

  • Key-value stores excel at caching and simple lookups

  • Columnar databases are optimized for analytics and time-series data

  • Graph databases are ideal for relationship-heavy data (social networks, recommendations)

  • Document Stores - One type of NoSQL database. Understanding document stores helps understand NoSQL data modeling.

  • Key-Value Stores - Simple NoSQL database type. Understanding key-value stores helps understand NoSQL basics.

  • Columnar Databases - NoSQL database optimized for analytics. Understanding columnar databases helps understand NoSQL variety.

  • Time-Series Databases - Specialized NoSQL database for time-stamped data. Understanding time-series databases helps understand NoSQL specialization.

  • ACID Properties - NoSQL databases often relax ACID for performance. Understanding ACID helps understand NoSQL trade-offs.

  • Data Replication - NoSQL databases use replication for availability. Understanding replication helps understand NoSQL architectures.

Key Takeaways

NoSQL databases use different data models—document, key-value, columnar, graph—each optimized for specific use cases

CAP theorem limits distributed systems—choose two: Consistency, Availability, or Partition tolerance

Eventual consistency is acceptable for many use cases (social feeds, analytics) but not for financial data

NoSQL excels at horizontal scaling and high write throughput, but SQL is better for complex queries and relationships

Hybrid approaches work best—use SQL for critical/transactional data, NoSQL for scale and flexibility

Schema-less doesn't mean no validation—still need application-level data validation

Choose based on requirements, not trends—NoSQL isn't always better than SQL

Understand trade-offs—NoSQL sacrifices ACID guarantees for performance and scalability

Document stores are good for flexible schemas and content management

Key-value stores excel at caching and simple lookups

Columnar databases are optimized for analytics and time-series data

Graph databases are ideal for relationship-heavy data (social networks, recommendations)

Keep exploring

Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.