← Back to databases

Database Topic

Time-Series Databases

Master time-series databases for metrics, logs, and IoT data. Essential for observability and monitoring system design interviews.

Time-Series Databases

Why This Matters

Think of time-series databases like a data logger for a car. Every second, it records speed, RPM, temperature, etc. Over time, you have millions of data points. A time-series database is optimized for this—storing data points with timestamps, querying by time range, and aggregating over time. Regular databases aren't optimized for this workload.

This matters because many systems generate time-series data: metrics (CPU, memory, requests per second), logs (application logs, system logs), IoT sensors (temperature, humidity, motion). Time-series databases handle high write throughput (millions of points per second) and time-based queries efficiently. Regular databases struggle with this workload.

In interviews, when someone asks "How would you store metrics?", they're testing whether you understand time-series databases. Do you know when to use them? Do you understand their characteristics? Most engineers don't. They use regular databases and wonder why writes are slow.

What Engineers Usually Get Wrong

Most engineers think "time-series databases are just databases with timestamps." But time-series databases are optimized differently: they're append-only (rarely updated), they compress data by time (similar values compress well), they automatically downsample old data, and they're optimized for time-range queries. Understanding this helps you use them effectively.

Engineers also don't understand that time-series databases have different trade-offs. They're fast for time-based queries but slow for complex queries. They handle high write throughput but aren't good for updates. Use them for metrics, logs, and sensor data. Don't use them for transactional workloads.

How This Breaks Systems in the Real World

A service was storing metrics in a relational database. Each metric was a row with a timestamp. With millions of metrics per day, writes were slow. The database became a bottleneck. Queries for "average CPU over last hour" required scanning millions of rows. The fix? Use a time-series database. It's optimized for this workload—append-only writes are fast, time-range queries are efficient, and data compresses well.

Another story: A service was using a time-series database but didn't configure retention. Old data accumulated, consuming disk space. The database ran out of space. The fix? Configure retention policies. Automatically delete or downsample old data. Time-series databases are designed for recent data—old data should be archived or deleted.


What is Time-Series Data?

Data points with timestamps, typically collected over time:

timestamp                | metric_name    | value | tags
2024-01-15 10:00:00     | cpu_usage      | 45.2  | host=server1
2024-01-15 10:00:05     | cpu_usage      | 46.1  | host=server1
2024-01-15 10:00:10     | cpu_usage      | 44.8  | host=server1
2024-01-15 10:00:00     | memory_usage   | 62.5  | host=server1
2024-01-15 10:00:05     | memory_usage   | 63.2  | host=server1

Characteristics:

  • High write volume (millions of points per second)
  • Append-only (rarely updated, mostly inserted)
  • Time-ordered queries
  • Aggregations over time windows

Why Specialized Databases?

Traditional Databases Struggle With

  • High write throughput: Millions of inserts per second
  • Time-based queries: Range queries, downsampling, aggregations
  • Data retention: Efficiently deleting old data
  • Compression: Time-series data compresses very well

Time-Series Databases Excel At

  • Append-optimized storage: Optimized for writes
  • Time-indexed queries: Fast time-range scans
  • Automatic downsampling: Pre-aggregate data at different resolutions
  • Data lifecycle: Automatic expiration of old data

InfluxDB

Purpose-built for time-series data.

-- Create database
CREATE DATABASE metrics;

-- Write data (Line Protocol)
cpu,host=server1,region=us-east value=45.2 1705312800000000000
memory,host=server1,region=us-east value=62.5 1705312800000000000

-- Query
SELECT mean("value") 
FROM "cpu" 
WHERE time >= now() - 1h 
GROUP BY time(5m), "host";

Features:

  • High write throughput
  • Downsampling and retention policies
  • Continuous queries
  • TICK stack integration

TimescaleDB

PostgreSQL extension for time-series.

-- Create hypertable
CREATE TABLE metrics (
    time TIMESTAMPTZ NOT NULL,
    device_id INT,
    temperature FLOAT,
    humidity FLOAT
);

SELECT create_hypertable('metrics', 'time');

-- Query
SELECT 
    time_bucket('1 hour', time) as hour,
    device_id,
    avg(temperature) as avg_temp
FROM metrics
WHERE time >= now() - INTERVAL '24 hours'
GROUP BY hour, device_id;

Features:

  • SQL interface (PostgreSQL compatible)
  • Automatic partitioning by time
  • Compression
  • Continuous aggregates

Prometheus

Monitoring and alerting toolkit with time-series database.

# Query
cpu_usage{host="server1"}[5m]

# Aggregation
rate(http_requests_total[5m])

# Functions
avg_over_time(cpu_usage[1h])

Features:

  • Pull-based metrics collection
  • PromQL query language
  • Alerting rules
  • Integration with Grafana

Amazon Timestream

Fully managed time-series database on AWS.

CREATE TABLE metrics (
    measure_name VARCHAR(2048),
    measure_value DOUBLE,
    time TIMESTAMP,
    device_id VARCHAR(2048)
);

SELECT 
    BIN(time, 5m) as time_bin,
    device_id,
    AVG(measure_value) as avg_value
FROM metrics
WHERE time >= ago(1h)
GROUP BY time_bin, device_id;

Data Modeling

Metrics and Tags

Metric: cpu_usage
Tags:   host=server1, region=us-east, env=production
Value:  45.2
Time:   2024-01-15 10:00:00

Tags: Used for filtering and grouping (like dimensions) Values: The actual measurement

Series

A unique combination of metric name and tags:

Series 1: cpu_usage{host=server1, region=us-east}
Series 2: cpu_usage{host=server2, region=us-east}
Series 3: cpu_usage{host=server1, region=us-west}

Cardinality

High cardinality (many unique series) impacts performance:

Low cardinality:  cpu_usage{host=server1}
High cardinality: cpu_usage{host=server1, user_id=12345, request_id=abc123}

Best practice: Limit tag cardinality, use high-cardinality data as values.


Common Operations

Downsampling

Reduce data resolution over time:

Raw data:    1 point per second
After 1 hour: 1 point per minute (average)
After 1 day:  1 point per hour (average)
After 1 week: 1 point per day (average)

Benefits:

  • Reduced storage
  • Faster queries on historical data
  • Maintains trends while discarding detail

Retention Policies

Automatically delete old data:

Raw data:        Keep for 7 days
Downsampled 1h:  Keep for 30 days
Downsampled 1d:  Keep for 1 year

Aggregations

-- Average over time window
SELECT avg(value) FROM metrics 
WHERE time >= now() - 1h 
GROUP BY time(5m);

-- Rate of change
SELECT derivative(mean(value), 1m) 
FROM metrics 
WHERE time >= now() - 1h;

-- Percentiles
SELECT percentile(value, 95) 
FROM metrics 
WHERE time >= now() - 1h;

Query Patterns

Range Queries

-- Last hour
SELECT * FROM metrics 
WHERE time >= now() - 1h;

-- Specific time range
SELECT * FROM metrics 
WHERE time >= '2024-01-15 10:00:00' 
  AND time < '2024-01-15 11:00:00';

Filtering by Tags

-- Filter by tag
SELECT * FROM metrics 
WHERE host = 'server1' 
  AND time >= now() - 1h;

-- Multiple tags
SELECT * FROM metrics 
WHERE region = 'us-east' 
  AND env = 'production'
  AND time >= now() - 1h;

Aggregations

-- Group by tag
SELECT host, avg(value) 
FROM metrics 
WHERE time >= now() - 1h 
GROUP BY host;

-- Time-based grouping
SELECT 
    time_bucket('5 minutes', time) as interval,
    avg(value) as avg_value
FROM metrics
WHERE time >= now() - 1h
GROUP BY interval;

Use Cases

Monitoring and Observability

  • System metrics: CPU, memory, disk, network
  • Application metrics: Request rate, latency, error rate
  • Business metrics: Revenue, user signups, conversions

IoT and Sensors

  • Temperature sensors: Building climate control
  • GPS tracking: Vehicle locations over time
  • Smart meters: Energy consumption

Financial Data

  • Stock prices: Price movements over time
  • Trading volumes: Market activity
  • Exchange rates: Currency fluctuations

Log Analysis

  • Application logs: Error rates, request patterns
  • Security logs: Failed login attempts, suspicious activity
  • Audit logs: User actions over time

Best Practices

  1. Limit cardinality: Too many unique series slow down queries
  2. Use appropriate retention: Delete old data you don't need
  3. Downsample early: Pre-aggregate data at ingestion time
  4. Batch writes: Group multiple points in single write
  5. Index time first: Time is the primary access pattern
  6. Compress data: Time-series data compresses very well

Performance Considerations

Write Optimization

  • Batch inserts: Group multiple points
  • Async writes: Don't block on every insert
  • Compression: Enable compression for storage efficiency

Query Optimization

  • Time ranges: Always specify time ranges in queries
  • Limit series: Filter by tags to reduce data scanned
  • Use downsampled data: Query aggregated data when possible
  • Index tags: Index frequently filtered tags

Trade-offs

Advantages

  • High write throughput: Millions of points per second
  • Efficient storage: Excellent compression ratios
  • Time-optimized queries: Fast range scans and aggregations
  • Automatic lifecycle: Built-in retention and downsampling

Limitations

  • Limited updates: Not optimized for updating existing points
  • Complex queries: Less flexible than general-purpose databases
  • Learning curve: Specialized query languages (PromQL, Flux)

Integration Patterns

Metrics Collection

Application → StatsD/Telegraf → Time-Series DB → Grafana

Log Aggregation

Applications → Log shipper → Time-Series DB → Dashboards

Alerting

Time-Series DB → Alerting Rules → Notification System

Interview Questions

1. Beginner Question

Q: What is a time-series database, and when should you use one?

A: A time-series database is optimized for storing and querying data points indexed by time (metrics, logs, sensor data).

When to use:

  • High write throughput: Millions of data points per second
  • Time-ordered queries: Range queries, aggregations over time
  • Automatic downsampling: Reduce data resolution over time
  • Data retention: Efficiently delete old data

Example use cases:

  • System metrics (CPU, memory, disk)
  • Application metrics (request rate, latency, errors)
  • IoT sensor data (temperature, GPS)
  • Financial data (stock prices, trading volumes)

When NOT to use:

  • Transactional data (use relational databases)
  • Complex relationships (use graph databases)
  • Ad-hoc queries (use data warehouses)

2. Intermediate Question

Q: Explain how time-series databases handle high write throughput and data retention.

A:

High write throughput:

  • Append-only storage: Optimized for writes, not updates
  • Batch writes: Group multiple points in single write
  • Compression: Time-series data compresses very well (similar values)
  • No indexes on every write: Indexes built asynchronously

Data retention:

-- Retention policy: Keep raw data for 7 days
CREATE RETENTION POLICY raw_data ON metrics DURATION 7d REPLICATION 1;

-- Downsampled data: Keep 1-minute averages for 30 days
CREATE RETENTION POLICY downsampled_1m ON metrics DURATION 30d REPLICATION 1;

-- Further downsampled: Keep 1-hour averages for 1 year
CREATE RETENTION POLICY downsampled_1h ON metrics DURATION 365d REPLICATION 1;

Automatic downsampling:

-- Continuous query that downsamples automatically
CREATE CONTINUOUS QUERY downsample_1m ON metrics
BEGIN
  SELECT mean(value) INTO metrics."1m".downsampled
  FROM metrics."raw"
  GROUP BY time(1m), *
END

Benefits:

  • Raw data: High detail, short retention
  • Downsampled: Lower detail, longer retention
  • Automatic: No manual cleanup needed

3. Senior-Level System Question

Q: Design a monitoring system for a microservices architecture with 1000 services, each emitting 1000 metrics/second. The system needs to support: real-time dashboards, alerting, and 1-year historical analysis. How would you architect this?

A:

Architecture: Multi-tier time-series storage

  1. Data ingestion (High throughput):

    Services → StatsD/Telegraf → Kafka → Time-Series DB
    
    • Kafka buffer: Handles traffic spikes, provides backpressure
    • Batching: Collect metrics for 10-30 seconds before writing
    • Sharding: Partition by service/metric type
  2. Storage tiers:

    -- Tier 1: Hot storage (InfluxDB/TimescaleDB) - Last 7 days
    -- Raw data, high resolution (1 second)
    -- Fast queries for dashboards
    
    -- Tier 2: Warm storage (ClickHouse) - 7-90 days
    -- Downsampled to 1-minute resolution
    -- Good for historical analysis
    
    -- Tier 3: Cold storage (S3 + Parquet) - 90-365 days
    -- Downsampled to 1-hour resolution
    -- Cheap storage, slower queries
    
  3. Real-time layer (Redis):

    # Cache recent metrics for dashboards
    def get_metric(metric_name, time_range):
        if time_range < 1_hour:
            # Get from Redis (in-memory, fast)
            return redis.get(f"metric:{metric_name}:{time_range}")
        else:
            # Query time-series database
            return query_tsdb(metric_name, time_range)
    
  4. Downsampling pipeline:

    # Continuous downsampling jobs
    def downsample_metrics():
        # Every hour: Downsample 1-second data to 1-minute
        raw_data = query_tsdb("raw", last_hour, resolution="1s")
        downsampled = aggregate(raw_data, "1m", function="mean")
        write_to_tsdb("downsampled_1m", downsampled)
        
        # Every day: Downsample 1-minute to 1-hour
        minute_data = query_tsdb("downsampled_1m", last_day)
        hour_data = aggregate(minute_data, "1h", function="mean")
        write_to_s3("downsampled_1h", hour_data)  # Cold storage
    
  5. Query optimization:

    • Materialized views: Pre-aggregate common queries
    • Time-based partitioning: Partition by day/week
    • Compression: Use columnar format (Parquet) for cold storage
  6. Alerting system:

    # Real-time alerting on recent data
    def check_alerts():
        recent_metrics = query_redis("metrics:recent")
        for alert_rule in alert_rules:
            if evaluate(alert_rule, recent_metrics):
                send_alert(alert_rule)
    

Scalability:

  • Write path: 1M metrics/sec → Kafka (handles spikes) → Batch writes to TSDB
  • Read path: Redis cache → Hot TSDB → Warm storage → Cold storage
  • Storage: Compression + downsampling reduces storage by 1000x

Trade-offs:

  • Latency vs. Cost: Hot storage is fast but expensive
  • Detail vs. Retention: More downsampling = longer retention
  • Complexity: Multiple storage tiers require data pipeline

  • Time-series databases are optimized for time-stamped data—metrics, logs, sensor data

  • Append-only storage enables high write throughput (millions of points per second)

  • Automatic downsampling reduces data resolution over time (1s → 1m → 1h) for longer retention

  • Retention policies automatically delete old data to manage storage costs

  • High cardinality is expensive—limit unique series (tags) for better performance

  • Use for monitoring and observability—system metrics, application metrics, IoT data

  • Not for transactional data—use relational databases for that

  • Compression is excellent—time-series data compresses 10-100x better than random data

  • Time-based partitioning is essential for large datasets

  • Batch writes are much more efficient than individual point writes

  • Downsample early—pre-aggregate data at ingestion time when possible

  • Multi-tier storage (hot/warm/cold) balances performance and cost

  • NoSQL Basics - Time-series databases are a type of NoSQL database. Understanding NoSQL basics helps understand time-series characteristics.

  • Columnar Databases - Time-series databases often use columnar storage. Understanding columnar databases helps understand time-series internals.

  • Query Optimization - Time-series queries have specific optimization patterns. Understanding query optimization helps optimize time-series queries.

  • Partitioning - Time-series databases partition by time. Understanding partitioning helps design time-series schemas.

  • Data Replication - Time-series databases use replication for availability. Understanding replication helps design distributed time-series systems.

Key Takeaways

Time-series databases are optimized for time-stamped data—metrics, logs, sensor data

Append-only storage enables high write throughput (millions of points per second)

Automatic downsampling reduces data resolution over time (1s → 1m → 1h) for longer retention

Retention policies automatically delete old data to manage storage costs

High cardinality is expensive—limit unique series (tags) for better performance

Use for monitoring and observability—system metrics, application metrics, IoT data

Not for transactional data—use relational databases for that

Compression is excellent—time-series data compresses 10-100x better than random data

Time-based partitioning is essential for large datasets

Batch writes are much more efficient than individual point writes

Downsample early—pre-aggregate data at ingestion time when possible

Multi-tier storage (hot/warm/cold) balances performance and cost

Keep exploring

Database concepts build on each other. Explore related topics to deepen your understanding of how data systems work.