← Back to databases

Database Topic

NoSQL Basics

Master NoSQL database fundamentals and when to choose them over relational databases. Essential for modern system design interviews.

NoSQL (Not Only SQL) databases provide alternatives to traditional relational databases, offering different data models and trade-offs.


Why NoSQL?

Limitations of relational databases:

  • Schema rigidity
  • Scaling challenges (especially writes)
  • Complex JOINs across distributed systems
  • Overhead for simple use cases

NoSQL advantages:

  • Flexible schemas
  • Horizontal scaling
  • High performance for specific workloads
  • Simpler data models

Types of NoSQL Databases

Document Stores

Store data as documents (typically JSON/BSON).

Examples: MongoDB, CouchDB, DynamoDB

{
  "user_id": "123",
  "name": "John Doe",
  "orders": [
    {"order_id": "o1", "total": 100},
    {"order_id": "o2", "total": 200}
  ]
}

Use when: Content management, user profiles, catalogs

Key-Value Stores

Simple key-value pairs, extremely fast lookups.

Examples: Redis, Memcached, DynamoDB

key: "user:123"
value: {"name": "John", "email": "john@example.com"}

Use when: Caching, session storage, real-time data

Columnar Databases

Store data in columns rather than rows.

Examples: Cassandra, HBase, Bigtable

Use when: Time-series data, analytics, write-heavy workloads

Graph Databases

Store entities and relationships as a graph.

Examples: Neo4j, Amazon Neptune

Use when: Social networks, recommendation engines, fraud detection


CAP Theorem

In distributed systems, you can guarantee at most two of three properties:

  • Consistency: All nodes see the same data simultaneously
  • Availability: System remains operational
  • Partition tolerance: System continues despite network failures

NoSQL databases make different CAP choices:

  • CP: MongoDB, HBase (consistency + partition tolerance)
  • AP: Cassandra, DynamoDB (availability + partition tolerance)
  • CA: Traditional RDBMS (not distributed)

When to Use NoSQL

Good Fit

  • High write throughput: Logging, metrics, IoT data
  • Flexible schema: Rapidly evolving data structures
  • Horizontal scaling: Need to scale across many servers
  • Simple queries: Key lookups, document retrieval
  • Large datasets: Big data, analytics

Not a Good Fit

  • Complex relationships: Many JOINs, foreign keys
  • ACID transactions: Financial systems, critical data
  • Complex queries: Ad-hoc reporting, analytics
  • Established schema: Well-defined, stable data model

Common Patterns

Eventual Consistency

Data may be temporarily inconsistent but will become consistent.

User A updates profile → Replicates to Node 1, 2, 3
User B reads from Node 2 (not yet updated) → Sees old data
Eventually, all nodes converge to same state

Acceptable for: Social media feeds, comments, non-critical updates

Denormalization

Store redundant data to avoid JOINs.

// Instead of JOINing orders and users
{
  "order_id": "o123",
  "user_name": "John Doe",  // Denormalized
  "user_email": "john@example.com",  // Denormalized
  "total": 100
}

Sharding

Distribute data across multiple servers.

Shard 1: user_id 1-1000
Shard 2: user_id 1001-2000
Shard 3: user_id 2001-3000

Migration Considerations

From SQL to NoSQL

  1. Identify access patterns: How is data queried?
  2. Denormalize: Flatten relational structures
  3. Choose appropriate model: Document vs. key-value vs. graph
  4. Plan for consistency: Eventual vs. strong consistency

Hybrid Approach

Many systems use both:

  • SQL: User accounts, transactions, critical data
  • NoSQL: Logs, sessions, analytics, caching

Best Practices

  1. Start with SQL: Use NoSQL when you have a specific need
  2. Understand trade-offs: NoSQL isn't always faster or simpler
  3. Plan for scale: Design for horizontal scaling from the start
  4. Monitor consistency: Track replication lag, consistency metrics
  5. Backup strategy: NoSQL backups can be more complex

Common Pitfalls

  • Over-normalization: Applying SQL patterns to NoSQL
  • Ignoring consistency: Not understanding eventual consistency implications
  • Schema-less doesn't mean no schema: Still need data validation
  • Tool selection: Choosing NoSQL because it's "modern" not because it fits

Interview Questions

1. Beginner Question

Q: What is NoSQL, and when should you use it instead of a relational database?

A: NoSQL (Not Only SQL) refers to non-relational databases that use different data models than traditional SQL databases.

When to use NoSQL:

  • Flexible schema: Rapidly evolving data structures
  • High write throughput: Logging, metrics, IoT data
  • Horizontal scaling: Need to scale across many servers
  • Simple queries: Key lookups, document retrieval
  • Large datasets: Big data, analytics

When to use SQL:

  • Complex relationships: Many JOINs, foreign keys
  • ACID transactions: Financial systems, critical data
  • Complex queries: Ad-hoc reporting, analytics
  • Established schema: Well-defined, stable data model

Example: Use MongoDB (NoSQL) for user profiles with varying attributes, but PostgreSQL (SQL) for financial transactions requiring ACID guarantees.

2. Intermediate Question

Q: Explain the CAP theorem and how it applies to NoSQL databases.

A: CAP theorem states that in a distributed system, you can guarantee at most two of three properties:

  • Consistency: All nodes see the same data simultaneously
  • Availability: System remains operational
  • Partition tolerance: System continues despite network failures

NoSQL database choices:

  1. CP (Consistency + Partition tolerance):

    • MongoDB, HBase
    • Sacrifices availability during partitions
    • Good for: Financial data, critical systems
  2. AP (Availability + Partition tolerance):

    • Cassandra, DynamoDB
    • Sacrifices consistency (eventual consistency)
    • Good for: Social feeds, analytics, high availability needs
  3. CA (Consistency + Availability):

    • Traditional RDBMS (single node)
    • Not distributed, so partition tolerance doesn't apply

Example: Amazon DynamoDB chooses AP—it remains available during network partitions but may serve slightly stale data (eventual consistency).

Follow-up: What does "eventual consistency" mean?

  • Data may be temporarily inconsistent across nodes
  • All nodes will eventually converge to the same state
  • Acceptable for many use cases (social feeds, comments)

3. Senior-Level System Question

Q: Design a social media platform supporting 1B users, 10B posts, and 100B likes. How would you use NoSQL databases in the architecture?

A:

Hybrid architecture (SQL + NoSQL):

  1. User profiles (MongoDB - Document Store):

    // Flexible schema for varying user attributes
    {
      user_id: "123",
      name: "John",
      bio: "...",
      preferences: { theme: "dark", notifications: true },
      social_links: { twitter: "@john", github: "john" }
    }
    

    Why: Flexible schema, easy to add new fields, good for user-generated content

  2. Posts and feeds (Cassandra - Columnar):

    -- Partition by user_id for feed queries
    CREATE TABLE posts (
      user_id UUID,
      post_id UUID,
      content TEXT,
      created_at TIMESTAMP,
      PRIMARY KEY (user_id, created_at, post_id)
    );
    

    Why: High write throughput, horizontal scaling, time-series data

  3. Likes and counters (Redis - Key-Value):

    SET post:123:likes 5000
    SADD post:123:liked_by user:456 user:789
    

    Why: Extremely fast reads/writes, real-time counters

  4. Social graph (Neo4j - Graph Database):

    // Find friends of friends
    MATCH (user:User {id: 123})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(fof)
    RETURN fof
    

    Why: Efficient for relationship queries (who follows whom, recommendations)

  5. Search (Elasticsearch - Document Store):

    • Full-text search on posts
    • Denormalized data for fast search
    • Why: Optimized for search, not transactional
  6. Critical data (PostgreSQL - SQL):

    • User authentication
    • Payment transactions
    • Why: ACID guarantees, complex relationships

Data flow:

User creates post → Write to Cassandra (fast)
                 → Update search index (async)
                 → Update user's MongoDB profile (async)

User likes post → Increment counter in Redis (fast)
                → Write to Cassandra for persistence (async)
                → Update feed rankings (async)

Consistency strategy:

  • Strong consistency: User auth, payments (PostgreSQL)
  • Eventual consistency: Feeds, likes, counters (NoSQL)
  • Caching: Hot data in Redis for sub-millisecond access

Scaling:

  • Sharding: Partition by user_id across nodes
  • Replication: Multiple replicas for availability
  • Caching: Redis for frequently accessed data

Trade-offs:

  • Complexity: Multiple database systems to manage
  • Consistency: Eventual consistency requires handling stale data
  • Performance: NoSQL provides better write throughput and horizontal scaling

Key Takeaways

  • NoSQL databases use different data models—document, key-value, columnar, graph—each optimized for specific use cases
  • CAP theorem limits distributed systems—choose two: Consistency, Availability, or Partition tolerance
  • Eventual consistency is acceptable for many use cases (social feeds, analytics) but not for financial data
  • NoSQL excels at horizontal scaling and high write throughput, but SQL is better for complex queries and relationships
  • Hybrid approaches work best—use SQL for critical/transactional data, NoSQL for scale and flexibility
  • Schema-less doesn't mean no validation—still need application-level data validation
  • Choose based on requirements, not trends—NoSQL isn't always better than SQL
  • Understand trade-offs—NoSQL sacrifices ACID guarantees for performance and scalability
  • Document stores are good for flexible schemas and content management
  • Key-value stores excel at caching and simple lookups
  • Columnar databases are optimized for analytics and time-series data
  • Graph databases are ideal for relationship-heavy data (social networks, recommendations)

Keep exploring

Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.