Database Topic
Document Stores
Master document-oriented databases like MongoDB. Essential for NoSQL and modern database design interviews.
Document stores are NoSQL databases that store data as documents, typically in JSON, BSON, or XML format.
What is a Document?
A document is a self-contained unit of data, similar to a row in SQL but more flexible.
{
"_id": "user_123",
"name": "John Doe",
"email": "john@example.com",
"address": {
"street": "123 Main St",
"city": "New York",
"zip": "10001"
},
"orders": [
{
"order_id": "o1",
"date": "2024-01-15",
"total": 100.50,
"items": [
{"product": "Widget", "quantity": 2, "price": 50.25}
]
}
],
"preferences": {
"theme": "dark",
"notifications": true
}
}
Key Characteristics
Schema Flexibility
Documents in the same collection can have different structures:
// Document 1
{"_id": "1", "name": "John", "age": 30}
// Document 2
{"_id": "2", "name": "Jane", "email": "jane@example.com", "tags": ["vip", "premium"]}
Benefit: Easy schema evolution, no migrations needed.
Challenge: Application must handle varying structures.
Embedded vs. Referenced
Embedded (denormalized):
{
"user_id": "123",
"name": "John",
"orders": [
{"order_id": "o1", "total": 100},
{"order_id": "o2", "total": 200}
]
}
Referenced (normalized):
// Users collection
{"_id": "123", "name": "John"}
// Orders collection
{"_id": "o1", "user_id": "123", "total": 100}
{"_id": "o2", "user_id": "123", "total": 200}
Rule of thumb: Embed when data is accessed together, reference when data is large or shared.
Popular Document Stores
MongoDB
- Most popular document database
- Rich query language
- Aggregation pipeline
- Horizontal scaling with sharding
// MongoDB query
db.users.find({ "address.city": "New York" })
db.users.aggregate([
{ $match: { "orders.total": { $gt: 100 } } },
{ $group: { _id: "$name", total_spent: { $sum: "$orders.total" } } }
])
CouchDB
- Multi-master replication
- RESTful API
- Built-in conflict resolution
- Good for offline-first applications
DynamoDB (AWS)
- Fully managed
- Serverless scaling
- Integrated with AWS ecosystem
- Pay-per-use pricing
Data Modeling Patterns
One-to-Few: Embed
{
"blog_post_id": "p1",
"title": "My Post",
"author": "John",
"comments": [
{"user": "Alice", "text": "Great post!"},
{"user": "Bob", "text": "Thanks for sharing"}
]
}
When: Comments are always loaded with the post, limited in number.
One-to-Many: Reference
// Posts collection
{"_id": "p1", "title": "My Post", "author": "John"}
// Comments collection
{"_id": "c1", "post_id": "p1", "user": "Alice", "text": "Great post!"}
{"_id": "c2", "post_id": "p1", "user": "Bob", "text": "Thanks"}
When: Comments can be numerous, accessed independently, or shared across posts.
Many-to-Many: Reference Array
// Users collection
{"_id": "u1", "name": "John", "group_ids": ["g1", "g2"]}
// Groups collection
{"_id": "g1", "name": "Developers", "member_ids": ["u1", "u2"]}
Querying Documents
Basic Queries
// Equality
db.users.find({ "name": "John" })
// Comparison
db.users.find({ "age": { $gt: 18 } })
// Array contains
db.users.find({ "tags": "vip" })
// Nested field
db.users.find({ "address.city": "New York" })
Aggregation Pipeline
db.orders.aggregate([
// Match stage: filter documents
{ $match: { "status": "completed" } },
// Group stage: group and aggregate
{ $group: {
_id: "$user_id",
total_spent: { $sum: "$total" },
order_count: { $sum: 1 }
}},
// Sort stage: order results
{ $sort: { total_spent: -1 } },
// Limit stage: top N
{ $limit: 10 }
])
Indexing
Document stores support various index types:
// Single field index
db.users.createIndex({ "email": 1 })
// Compound index
db.users.createIndex({ "last_name": 1, "first_name": 1 })
// Text index (full-text search)
db.posts.createIndex({ "title": "text", "content": "text" })
// Geospatial index
db.locations.createIndex({ "coordinates": "2dsphere" })
When to Use Document Stores
Good Fit
- Content management: Blogs, CMS, wikis
- User profiles: Flexible user data
- Catalogs: Product catalogs with varying attributes
- Real-time analytics: Event logging, metrics
- Mobile apps: Flexible schema for rapid iteration
Not a Good Fit
- Complex transactions: Multi-document ACID transactions are limited
- Heavy JOINs: Document stores don't support SQL-style JOINs
- Strict schema: When data structure must be consistent
- Reporting: Complex analytical queries are harder
Best Practices
- Design for access patterns: Structure documents based on how you query
- Limit document size: Large documents are slow to transfer and update
- Use indexes wisely: Index fields you query frequently
- Plan for growth: Consider document size limits (MongoDB: 16MB)
- Validate at application level: Schema-less doesn't mean no validation
Common Patterns
Versioning
{
"_id": "user_123",
"version": 2,
"name": "John Doe",
"previous_versions": [
{"version": 1, "name": "John D."}
]
}
Soft Deletes
{
"_id": "post_123",
"title": "My Post",
"deleted": false,
"deleted_at": null
}
Polymorphic Collections
// All in "content" collection
{"_id": "1", "type": "article", "title": "...", "body": "..."}
{"_id": "2", "type": "video", "title": "...", "url": "..."}
{"_id": "3", "type": "image", "title": "...", "src": "..."}
Interview Questions
1. Beginner Question
Q: What is a document store, and how does it differ from a relational database?
A: A document store is a NoSQL database that stores data as documents (typically JSON/BSON), rather than rows and columns.
Key differences:
- Schema flexibility: Documents can have different structures in the same collection
- No JOINs: Related data is often embedded in documents
- Horizontal scaling: Easier to scale across multiple servers
- Query language: Uses document-based queries instead of SQL
Example:
// Document store (MongoDB)
{
_id: "user123",
name: "John",
orders: [
{order_id: "o1", total: 100},
{order_id: "o2", total: 200}
]
}
// Relational database (SQL)
// Requires separate tables: users, orders, order_items
2. Intermediate Question
Q: When should you embed data vs. reference it in a document store?
A:
Embed when:
- Data is accessed together (e.g., user and their profile)
- One-to-few relationship (e.g., user and their addresses)
- Data doesn't change frequently
- Data size is small
Reference when:
- One-to-many or many-to-many relationship
- Data is large (e.g., large arrays)
- Data is shared across documents
- Data changes independently
Example:
// Embed: User and their preferences (small, accessed together)
{
_id: "user123",
name: "John",
preferences: {theme: "dark", notifications: true}
}
// Reference: User and their orders (many orders, large data)
{
_id: "user123",
name: "John",
order_ids: ["o1", "o2", "o3"] // Reference to orders collection
}
Trade-off: Embedding improves read performance but can cause data duplication. Referencing reduces duplication but requires additional queries.
3. Senior-Level System Question
Q: Design a content management system (CMS) using MongoDB. The system needs to handle 1M articles, support versioning, handle comments, and enable full-text search. How would you model the data?
A:
Data modeling strategy:
-
Articles collection (embedded approach for content):
{ _id: ObjectId("..."), slug: "article-title", title: "Article Title", content: "...", // Full article content author_id: "user123", published_at: ISODate("..."), status: "published", // draft, published, archived tags: ["tech", "database"], metadata: { reading_time: 5, word_count: 1000 }, // Embedded: Current version version: 2, // Reference: Previous versions version_history: [ {version: 1, created_at: ISODate("..."), snapshot_id: "..."} ] } -
Comments collection (reference approach):
{ _id: ObjectId("..."), article_id: ObjectId("..."), // Reference to article user_id: "user456", content: "Great article!", created_at: ISODate("..."), parent_id: null // For nested comments } -
Search index (Elasticsearch):
- Denormalized article data for fast search
- Full-text indexing on title, content, tags
- Updated asynchronously when articles change
-
Indexes:
// Fast lookups db.articles.createIndex({slug: 1}, {unique: true}) db.articles.createIndex({author_id: 1, published_at: -1}) db.articles.createIndex({status: 1, published_at: -1}) db.articles.createIndex({tags: 1}) // Array index // Comments db.comments.createIndex({article_id: 1, created_at: -1}) -
Versioning strategy:
// Store versions in separate collection for large articles { article_id: ObjectId("..."), version: 1, content: "...", created_at: ISODate("...") } -
Query patterns:
// Get article with comments const article = await db.articles.findOne({slug: "..."}) const comments = await db.comments.find({article_id: article._id}) .sort({created_at: -1}) .limit(50) // Search (use Elasticsearch) const results = await elasticsearch.search({ index: "articles", body: {query: {match: {content: "search term"}}} })
Optimizations:
- Caching: Cache popular articles in Redis
- Pagination: Use cursor-based pagination for comments
- Async updates: Update search index asynchronously
- Sharding: Shard articles by date or author for scale
Key Takeaways
- Document stores use flexible schemas—documents can have different structures in the same collection
- Embed vs. reference—embed for one-to-few, reference for one-to-many or large data
- No JOINs—related data is embedded or requires application-level joins
- Schema design matters—even without strict schemas, plan your document structure
- Indexes are still important—index frequently queried fields for performance
- Denormalization is common—store redundant data to avoid multiple queries
- Horizontal scaling—document stores scale well across multiple servers
- Query patterns—design documents based on how you'll query them
- Versioning—use separate collections or embedded arrays for document versioning
- Full-text search—often requires separate search engine (Elasticsearch) for complex queries
Keep exploring
Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.