← Back to databases

Database Topic

Time-Series Databases

Master time-series databases for metrics, logs, and IoT data. Essential for observability and monitoring system design interviews.

Time-series databases are optimized for storing and querying data points indexed by time, such as metrics, logs, and sensor data.


What is Time-Series Data?

Data points with timestamps, typically collected over time:

timestamp                | metric_name    | value | tags
2024-01-15 10:00:00     | cpu_usage      | 45.2  | host=server1
2024-01-15 10:00:05     | cpu_usage      | 46.1  | host=server1
2024-01-15 10:00:10     | cpu_usage      | 44.8  | host=server1
2024-01-15 10:00:00     | memory_usage   | 62.5  | host=server1
2024-01-15 10:00:05     | memory_usage   | 63.2  | host=server1

Characteristics:

  • High write volume (millions of points per second)
  • Append-only (rarely updated, mostly inserted)
  • Time-ordered queries
  • Aggregations over time windows

Why Specialized Databases?

Traditional Databases Struggle With

  • High write throughput: Millions of inserts per second
  • Time-based queries: Range queries, downsampling, aggregations
  • Data retention: Efficiently deleting old data
  • Compression: Time-series data compresses very well

Time-Series Databases Excel At

  • Append-optimized storage: Optimized for writes
  • Time-indexed queries: Fast time-range scans
  • Automatic downsampling: Pre-aggregate data at different resolutions
  • Data lifecycle: Automatic expiration of old data

InfluxDB

Purpose-built for time-series data.

-- Create database
CREATE DATABASE metrics;

-- Write data (Line Protocol)
cpu,host=server1,region=us-east value=45.2 1705312800000000000
memory,host=server1,region=us-east value=62.5 1705312800000000000

-- Query
SELECT mean("value") 
FROM "cpu" 
WHERE time >= now() - 1h 
GROUP BY time(5m), "host";

Features:

  • High write throughput
  • Downsampling and retention policies
  • Continuous queries
  • TICK stack integration

TimescaleDB

PostgreSQL extension for time-series.

-- Create hypertable
CREATE TABLE metrics (
    time TIMESTAMPTZ NOT NULL,
    device_id INT,
    temperature FLOAT,
    humidity FLOAT
);

SELECT create_hypertable('metrics', 'time');

-- Query
SELECT 
    time_bucket('1 hour', time) as hour,
    device_id,
    avg(temperature) as avg_temp
FROM metrics
WHERE time >= now() - INTERVAL '24 hours'
GROUP BY hour, device_id;

Features:

  • SQL interface (PostgreSQL compatible)
  • Automatic partitioning by time
  • Compression
  • Continuous aggregates

Prometheus

Monitoring and alerting toolkit with time-series database.

# Query
cpu_usage{host="server1"}[5m]

# Aggregation
rate(http_requests_total[5m])

# Functions
avg_over_time(cpu_usage[1h])

Features:

  • Pull-based metrics collection
  • PromQL query language
  • Alerting rules
  • Integration with Grafana

Amazon Timestream

Fully managed time-series database on AWS.

CREATE TABLE metrics (
    measure_name VARCHAR(2048),
    measure_value DOUBLE,
    time TIMESTAMP,
    device_id VARCHAR(2048)
);

SELECT 
    BIN(time, 5m) as time_bin,
    device_id,
    AVG(measure_value) as avg_value
FROM metrics
WHERE time >= ago(1h)
GROUP BY time_bin, device_id;

Data Modeling

Metrics and Tags

Metric: cpu_usage
Tags:   host=server1, region=us-east, env=production
Value:  45.2
Time:   2024-01-15 10:00:00

Tags: Used for filtering and grouping (like dimensions) Values: The actual measurement

Series

A unique combination of metric name and tags:

Series 1: cpu_usage{host=server1, region=us-east}
Series 2: cpu_usage{host=server2, region=us-east}
Series 3: cpu_usage{host=server1, region=us-west}

Cardinality

High cardinality (many unique series) impacts performance:

Low cardinality:  cpu_usage{host=server1}
High cardinality: cpu_usage{host=server1, user_id=12345, request_id=abc123}

Best practice: Limit tag cardinality, use high-cardinality data as values.


Common Operations

Downsampling

Reduce data resolution over time:

Raw data:    1 point per second
After 1 hour: 1 point per minute (average)
After 1 day:  1 point per hour (average)
After 1 week: 1 point per day (average)

Benefits:

  • Reduced storage
  • Faster queries on historical data
  • Maintains trends while discarding detail

Retention Policies

Automatically delete old data:

Raw data:        Keep for 7 days
Downsampled 1h:  Keep for 30 days
Downsampled 1d:  Keep for 1 year

Aggregations

-- Average over time window
SELECT avg(value) FROM metrics 
WHERE time >= now() - 1h 
GROUP BY time(5m);

-- Rate of change
SELECT derivative(mean(value), 1m) 
FROM metrics 
WHERE time >= now() - 1h;

-- Percentiles
SELECT percentile(value, 95) 
FROM metrics 
WHERE time >= now() - 1h;

Query Patterns

Range Queries

-- Last hour
SELECT * FROM metrics 
WHERE time >= now() - 1h;

-- Specific time range
SELECT * FROM metrics 
WHERE time >= '2024-01-15 10:00:00' 
  AND time < '2024-01-15 11:00:00';

Filtering by Tags

-- Filter by tag
SELECT * FROM metrics 
WHERE host = 'server1' 
  AND time >= now() - 1h;

-- Multiple tags
SELECT * FROM metrics 
WHERE region = 'us-east' 
  AND env = 'production'
  AND time >= now() - 1h;

Aggregations

-- Group by tag
SELECT host, avg(value) 
FROM metrics 
WHERE time >= now() - 1h 
GROUP BY host;

-- Time-based grouping
SELECT 
    time_bucket('5 minutes', time) as interval,
    avg(value) as avg_value
FROM metrics
WHERE time >= now() - 1h
GROUP BY interval;

Use Cases

Monitoring and Observability

  • System metrics: CPU, memory, disk, network
  • Application metrics: Request rate, latency, error rate
  • Business metrics: Revenue, user signups, conversions

IoT and Sensors

  • Temperature sensors: Building climate control
  • GPS tracking: Vehicle locations over time
  • Smart meters: Energy consumption

Financial Data

  • Stock prices: Price movements over time
  • Trading volumes: Market activity
  • Exchange rates: Currency fluctuations

Log Analysis

  • Application logs: Error rates, request patterns
  • Security logs: Failed login attempts, suspicious activity
  • Audit logs: User actions over time

Best Practices

  1. Limit cardinality: Too many unique series slow down queries
  2. Use appropriate retention: Delete old data you don't need
  3. Downsample early: Pre-aggregate data at ingestion time
  4. Batch writes: Group multiple points in single write
  5. Index time first: Time is the primary access pattern
  6. Compress data: Time-series data compresses very well

Performance Considerations

Write Optimization

  • Batch inserts: Group multiple points
  • Async writes: Don't block on every insert
  • Compression: Enable compression for storage efficiency

Query Optimization

  • Time ranges: Always specify time ranges in queries
  • Limit series: Filter by tags to reduce data scanned
  • Use downsampled data: Query aggregated data when possible
  • Index tags: Index frequently filtered tags

Trade-offs

Advantages

  • High write throughput: Millions of points per second
  • Efficient storage: Excellent compression ratios
  • Time-optimized queries: Fast range scans and aggregations
  • Automatic lifecycle: Built-in retention and downsampling

Limitations

  • Limited updates: Not optimized for updating existing points
  • Complex queries: Less flexible than general-purpose databases
  • Learning curve: Specialized query languages (PromQL, Flux)

Integration Patterns

Metrics Collection

Application → StatsD/Telegraf → Time-Series DB → Grafana

Log Aggregation

Applications → Log shipper → Time-Series DB → Dashboards

Alerting

Time-Series DB → Alerting Rules → Notification System

Interview Questions

1. Beginner Question

Q: What is a time-series database, and when should you use one?

A: A time-series database is optimized for storing and querying data points indexed by time (metrics, logs, sensor data).

When to use:

  • High write throughput: Millions of data points per second
  • Time-ordered queries: Range queries, aggregations over time
  • Automatic downsampling: Reduce data resolution over time
  • Data retention: Efficiently delete old data

Example use cases:

  • System metrics (CPU, memory, disk)
  • Application metrics (request rate, latency, errors)
  • IoT sensor data (temperature, GPS)
  • Financial data (stock prices, trading volumes)

When NOT to use:

  • Transactional data (use relational databases)
  • Complex relationships (use graph databases)
  • Ad-hoc queries (use data warehouses)

2. Intermediate Question

Q: Explain how time-series databases handle high write throughput and data retention.

A:

High write throughput:

  • Append-only storage: Optimized for writes, not updates
  • Batch writes: Group multiple points in single write
  • Compression: Time-series data compresses very well (similar values)
  • No indexes on every write: Indexes built asynchronously

Data retention:

-- Retention policy: Keep raw data for 7 days
CREATE RETENTION POLICY raw_data ON metrics DURATION 7d REPLICATION 1;

-- Downsampled data: Keep 1-minute averages for 30 days
CREATE RETENTION POLICY downsampled_1m ON metrics DURATION 30d REPLICATION 1;

-- Further downsampled: Keep 1-hour averages for 1 year
CREATE RETENTION POLICY downsampled_1h ON metrics DURATION 365d REPLICATION 1;

Automatic downsampling:

-- Continuous query that downsamples automatically
CREATE CONTINUOUS QUERY downsample_1m ON metrics
BEGIN
  SELECT mean(value) INTO metrics."1m".downsampled
  FROM metrics."raw"
  GROUP BY time(1m), *
END

Benefits:

  • Raw data: High detail, short retention
  • Downsampled: Lower detail, longer retention
  • Automatic: No manual cleanup needed

3. Senior-Level System Question

Q: Design a monitoring system for a microservices architecture with 1000 services, each emitting 1000 metrics/second. The system needs to support: real-time dashboards, alerting, and 1-year historical analysis. How would you architect this?

A:

Architecture: Multi-tier time-series storage

  1. Data ingestion (High throughput):

    Services → StatsD/Telegraf → Kafka → Time-Series DB
    
    • Kafka buffer: Handles traffic spikes, provides backpressure
    • Batching: Collect metrics for 10-30 seconds before writing
    • Sharding: Partition by service/metric type
  2. Storage tiers:

    -- Tier 1: Hot storage (InfluxDB/TimescaleDB) - Last 7 days
    -- Raw data, high resolution (1 second)
    -- Fast queries for dashboards
    
    -- Tier 2: Warm storage (ClickHouse) - 7-90 days
    -- Downsampled to 1-minute resolution
    -- Good for historical analysis
    
    -- Tier 3: Cold storage (S3 + Parquet) - 90-365 days
    -- Downsampled to 1-hour resolution
    -- Cheap storage, slower queries
    
  3. Real-time layer (Redis):

    # Cache recent metrics for dashboards
    def get_metric(metric_name, time_range):
        if time_range < 1_hour:
            # Get from Redis (in-memory, fast)
            return redis.get(f"metric:{metric_name}:{time_range}")
        else:
            # Query time-series database
            return query_tsdb(metric_name, time_range)
    
  4. Downsampling pipeline:

    # Continuous downsampling jobs
    def downsample_metrics():
        # Every hour: Downsample 1-second data to 1-minute
        raw_data = query_tsdb("raw", last_hour, resolution="1s")
        downsampled = aggregate(raw_data, "1m", function="mean")
        write_to_tsdb("downsampled_1m", downsampled)
        
        # Every day: Downsample 1-minute to 1-hour
        minute_data = query_tsdb("downsampled_1m", last_day)
        hour_data = aggregate(minute_data, "1h", function="mean")
        write_to_s3("downsampled_1h", hour_data)  # Cold storage
    
  5. Query optimization:

    • Materialized views: Pre-aggregate common queries
    • Time-based partitioning: Partition by day/week
    • Compression: Use columnar format (Parquet) for cold storage
  6. Alerting system:

    # Real-time alerting on recent data
    def check_alerts():
        recent_metrics = query_redis("metrics:recent")
        for alert_rule in alert_rules:
            if evaluate(alert_rule, recent_metrics):
                send_alert(alert_rule)
    

Scalability:

  • Write path: 1M metrics/sec → Kafka (handles spikes) → Batch writes to TSDB
  • Read path: Redis cache → Hot TSDB → Warm storage → Cold storage
  • Storage: Compression + downsampling reduces storage by 1000x

Trade-offs:

  • Latency vs. Cost: Hot storage is fast but expensive
  • Detail vs. Retention: More downsampling = longer retention
  • Complexity: Multiple storage tiers require data pipeline

Key Takeaways

  • Time-series databases are optimized for time-stamped data—metrics, logs, sensor data
  • Append-only storage enables high write throughput (millions of points per second)
  • Automatic downsampling reduces data resolution over time (1s → 1m → 1h) for longer retention
  • Retention policies automatically delete old data to manage storage costs
  • High cardinality is expensive—limit unique series (tags) for better performance
  • Use for monitoring and observability—system metrics, application metrics, IoT data
  • Not for transactional data—use relational databases for that
  • Compression is excellent—time-series data compresses 10-100x better than random data
  • Time-based partitioning is essential for large datasets
  • Batch writes are much more efficient than individual point writes
  • Downsample early—pre-aggregate data at ingestion time when possible
  • Multi-tier storage (hot/warm/cold) balances performance and cost

Keep exploring

Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.