Database Topic
Time-Series Databases
Master time-series databases for metrics, logs, and IoT data. Essential for observability and monitoring system design interviews.
Time-series databases are optimized for storing and querying data points indexed by time, such as metrics, logs, and sensor data.
What is Time-Series Data?
Data points with timestamps, typically collected over time:
timestamp | metric_name | value | tags
2024-01-15 10:00:00 | cpu_usage | 45.2 | host=server1
2024-01-15 10:00:05 | cpu_usage | 46.1 | host=server1
2024-01-15 10:00:10 | cpu_usage | 44.8 | host=server1
2024-01-15 10:00:00 | memory_usage | 62.5 | host=server1
2024-01-15 10:00:05 | memory_usage | 63.2 | host=server1
Characteristics:
- High write volume (millions of points per second)
- Append-only (rarely updated, mostly inserted)
- Time-ordered queries
- Aggregations over time windows
Why Specialized Databases?
Traditional Databases Struggle With
- High write throughput: Millions of inserts per second
- Time-based queries: Range queries, downsampling, aggregations
- Data retention: Efficiently deleting old data
- Compression: Time-series data compresses very well
Time-Series Databases Excel At
- Append-optimized storage: Optimized for writes
- Time-indexed queries: Fast time-range scans
- Automatic downsampling: Pre-aggregate data at different resolutions
- Data lifecycle: Automatic expiration of old data
Popular Time-Series Databases
InfluxDB
Purpose-built for time-series data.
-- Create database
CREATE DATABASE metrics;
-- Write data (Line Protocol)
cpu,host=server1,region=us-east value=45.2 1705312800000000000
memory,host=server1,region=us-east value=62.5 1705312800000000000
-- Query
SELECT mean("value")
FROM "cpu"
WHERE time >= now() - 1h
GROUP BY time(5m), "host";
Features:
- High write throughput
- Downsampling and retention policies
- Continuous queries
- TICK stack integration
TimescaleDB
PostgreSQL extension for time-series.
-- Create hypertable
CREATE TABLE metrics (
time TIMESTAMPTZ NOT NULL,
device_id INT,
temperature FLOAT,
humidity FLOAT
);
SELECT create_hypertable('metrics', 'time');
-- Query
SELECT
time_bucket('1 hour', time) as hour,
device_id,
avg(temperature) as avg_temp
FROM metrics
WHERE time >= now() - INTERVAL '24 hours'
GROUP BY hour, device_id;
Features:
- SQL interface (PostgreSQL compatible)
- Automatic partitioning by time
- Compression
- Continuous aggregates
Prometheus
Monitoring and alerting toolkit with time-series database.
# Query
cpu_usage{host="server1"}[5m]
# Aggregation
rate(http_requests_total[5m])
# Functions
avg_over_time(cpu_usage[1h])
Features:
- Pull-based metrics collection
- PromQL query language
- Alerting rules
- Integration with Grafana
Amazon Timestream
Fully managed time-series database on AWS.
CREATE TABLE metrics (
measure_name VARCHAR(2048),
measure_value DOUBLE,
time TIMESTAMP,
device_id VARCHAR(2048)
);
SELECT
BIN(time, 5m) as time_bin,
device_id,
AVG(measure_value) as avg_value
FROM metrics
WHERE time >= ago(1h)
GROUP BY time_bin, device_id;
Data Modeling
Metrics and Tags
Metric: cpu_usage
Tags: host=server1, region=us-east, env=production
Value: 45.2
Time: 2024-01-15 10:00:00
Tags: Used for filtering and grouping (like dimensions) Values: The actual measurement
Series
A unique combination of metric name and tags:
Series 1: cpu_usage{host=server1, region=us-east}
Series 2: cpu_usage{host=server2, region=us-east}
Series 3: cpu_usage{host=server1, region=us-west}
Cardinality
High cardinality (many unique series) impacts performance:
Low cardinality: cpu_usage{host=server1}
High cardinality: cpu_usage{host=server1, user_id=12345, request_id=abc123}
Best practice: Limit tag cardinality, use high-cardinality data as values.
Common Operations
Downsampling
Reduce data resolution over time:
Raw data: 1 point per second
After 1 hour: 1 point per minute (average)
After 1 day: 1 point per hour (average)
After 1 week: 1 point per day (average)
Benefits:
- Reduced storage
- Faster queries on historical data
- Maintains trends while discarding detail
Retention Policies
Automatically delete old data:
Raw data: Keep for 7 days
Downsampled 1h: Keep for 30 days
Downsampled 1d: Keep for 1 year
Aggregations
-- Average over time window
SELECT avg(value) FROM metrics
WHERE time >= now() - 1h
GROUP BY time(5m);
-- Rate of change
SELECT derivative(mean(value), 1m)
FROM metrics
WHERE time >= now() - 1h;
-- Percentiles
SELECT percentile(value, 95)
FROM metrics
WHERE time >= now() - 1h;
Query Patterns
Range Queries
-- Last hour
SELECT * FROM metrics
WHERE time >= now() - 1h;
-- Specific time range
SELECT * FROM metrics
WHERE time >= '2024-01-15 10:00:00'
AND time < '2024-01-15 11:00:00';
Filtering by Tags
-- Filter by tag
SELECT * FROM metrics
WHERE host = 'server1'
AND time >= now() - 1h;
-- Multiple tags
SELECT * FROM metrics
WHERE region = 'us-east'
AND env = 'production'
AND time >= now() - 1h;
Aggregations
-- Group by tag
SELECT host, avg(value)
FROM metrics
WHERE time >= now() - 1h
GROUP BY host;
-- Time-based grouping
SELECT
time_bucket('5 minutes', time) as interval,
avg(value) as avg_value
FROM metrics
WHERE time >= now() - 1h
GROUP BY interval;
Use Cases
Monitoring and Observability
- System metrics: CPU, memory, disk, network
- Application metrics: Request rate, latency, error rate
- Business metrics: Revenue, user signups, conversions
IoT and Sensors
- Temperature sensors: Building climate control
- GPS tracking: Vehicle locations over time
- Smart meters: Energy consumption
Financial Data
- Stock prices: Price movements over time
- Trading volumes: Market activity
- Exchange rates: Currency fluctuations
Log Analysis
- Application logs: Error rates, request patterns
- Security logs: Failed login attempts, suspicious activity
- Audit logs: User actions over time
Best Practices
- Limit cardinality: Too many unique series slow down queries
- Use appropriate retention: Delete old data you don't need
- Downsample early: Pre-aggregate data at ingestion time
- Batch writes: Group multiple points in single write
- Index time first: Time is the primary access pattern
- Compress data: Time-series data compresses very well
Performance Considerations
Write Optimization
- Batch inserts: Group multiple points
- Async writes: Don't block on every insert
- Compression: Enable compression for storage efficiency
Query Optimization
- Time ranges: Always specify time ranges in queries
- Limit series: Filter by tags to reduce data scanned
- Use downsampled data: Query aggregated data when possible
- Index tags: Index frequently filtered tags
Trade-offs
Advantages
- High write throughput: Millions of points per second
- Efficient storage: Excellent compression ratios
- Time-optimized queries: Fast range scans and aggregations
- Automatic lifecycle: Built-in retention and downsampling
Limitations
- Limited updates: Not optimized for updating existing points
- Complex queries: Less flexible than general-purpose databases
- Learning curve: Specialized query languages (PromQL, Flux)
Integration Patterns
Metrics Collection
Application → StatsD/Telegraf → Time-Series DB → Grafana
Log Aggregation
Applications → Log shipper → Time-Series DB → Dashboards
Alerting
Time-Series DB → Alerting Rules → Notification System
Interview Questions
1. Beginner Question
Q: What is a time-series database, and when should you use one?
A: A time-series database is optimized for storing and querying data points indexed by time (metrics, logs, sensor data).
When to use:
- High write throughput: Millions of data points per second
- Time-ordered queries: Range queries, aggregations over time
- Automatic downsampling: Reduce data resolution over time
- Data retention: Efficiently delete old data
Example use cases:
- System metrics (CPU, memory, disk)
- Application metrics (request rate, latency, errors)
- IoT sensor data (temperature, GPS)
- Financial data (stock prices, trading volumes)
When NOT to use:
- Transactional data (use relational databases)
- Complex relationships (use graph databases)
- Ad-hoc queries (use data warehouses)
2. Intermediate Question
Q: Explain how time-series databases handle high write throughput and data retention.
A:
High write throughput:
- Append-only storage: Optimized for writes, not updates
- Batch writes: Group multiple points in single write
- Compression: Time-series data compresses very well (similar values)
- No indexes on every write: Indexes built asynchronously
Data retention:
-- Retention policy: Keep raw data for 7 days
CREATE RETENTION POLICY raw_data ON metrics DURATION 7d REPLICATION 1;
-- Downsampled data: Keep 1-minute averages for 30 days
CREATE RETENTION POLICY downsampled_1m ON metrics DURATION 30d REPLICATION 1;
-- Further downsampled: Keep 1-hour averages for 1 year
CREATE RETENTION POLICY downsampled_1h ON metrics DURATION 365d REPLICATION 1;
Automatic downsampling:
-- Continuous query that downsamples automatically
CREATE CONTINUOUS QUERY downsample_1m ON metrics
BEGIN
SELECT mean(value) INTO metrics."1m".downsampled
FROM metrics."raw"
GROUP BY time(1m), *
END
Benefits:
- Raw data: High detail, short retention
- Downsampled: Lower detail, longer retention
- Automatic: No manual cleanup needed
3. Senior-Level System Question
Q: Design a monitoring system for a microservices architecture with 1000 services, each emitting 1000 metrics/second. The system needs to support: real-time dashboards, alerting, and 1-year historical analysis. How would you architect this?
A:
Architecture: Multi-tier time-series storage
-
Data ingestion (High throughput):
Services → StatsD/Telegraf → Kafka → Time-Series DB- Kafka buffer: Handles traffic spikes, provides backpressure
- Batching: Collect metrics for 10-30 seconds before writing
- Sharding: Partition by service/metric type
-
Storage tiers:
-- Tier 1: Hot storage (InfluxDB/TimescaleDB) - Last 7 days -- Raw data, high resolution (1 second) -- Fast queries for dashboards -- Tier 2: Warm storage (ClickHouse) - 7-90 days -- Downsampled to 1-minute resolution -- Good for historical analysis -- Tier 3: Cold storage (S3 + Parquet) - 90-365 days -- Downsampled to 1-hour resolution -- Cheap storage, slower queries -
Real-time layer (Redis):
# Cache recent metrics for dashboards def get_metric(metric_name, time_range): if time_range < 1_hour: # Get from Redis (in-memory, fast) return redis.get(f"metric:{metric_name}:{time_range}") else: # Query time-series database return query_tsdb(metric_name, time_range) -
Downsampling pipeline:
# Continuous downsampling jobs def downsample_metrics(): # Every hour: Downsample 1-second data to 1-minute raw_data = query_tsdb("raw", last_hour, resolution="1s") downsampled = aggregate(raw_data, "1m", function="mean") write_to_tsdb("downsampled_1m", downsampled) # Every day: Downsample 1-minute to 1-hour minute_data = query_tsdb("downsampled_1m", last_day) hour_data = aggregate(minute_data, "1h", function="mean") write_to_s3("downsampled_1h", hour_data) # Cold storage -
Query optimization:
- Materialized views: Pre-aggregate common queries
- Time-based partitioning: Partition by day/week
- Compression: Use columnar format (Parquet) for cold storage
-
Alerting system:
# Real-time alerting on recent data def check_alerts(): recent_metrics = query_redis("metrics:recent") for alert_rule in alert_rules: if evaluate(alert_rule, recent_metrics): send_alert(alert_rule)
Scalability:
- Write path: 1M metrics/sec → Kafka (handles spikes) → Batch writes to TSDB
- Read path: Redis cache → Hot TSDB → Warm storage → Cold storage
- Storage: Compression + downsampling reduces storage by 1000x
Trade-offs:
- Latency vs. Cost: Hot storage is fast but expensive
- Detail vs. Retention: More downsampling = longer retention
- Complexity: Multiple storage tiers require data pipeline
Key Takeaways
- Time-series databases are optimized for time-stamped data—metrics, logs, sensor data
- Append-only storage enables high write throughput (millions of points per second)
- Automatic downsampling reduces data resolution over time (1s → 1m → 1h) for longer retention
- Retention policies automatically delete old data to manage storage costs
- High cardinality is expensive—limit unique series (tags) for better performance
- Use for monitoring and observability—system metrics, application metrics, IoT data
- Not for transactional data—use relational databases for that
- Compression is excellent—time-series data compresses 10-100x better than random data
- Time-based partitioning is essential for large datasets
- Batch writes are much more efficient than individual point writes
- Downsample early—pre-aggregate data at ingestion time when possible
- Multi-tier storage (hot/warm/cold) balances performance and cost
Keep exploring
Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.