Topic Overview
Distributed Logging
Learn how to collect, aggregate, and analyze logs from distributed systems.
Distributed logging involves collecting, aggregating, and analyzing logs from multiple services in a distributed system.
Challenges
Volume: Thousands of services generate massive log volumes
Correlation: Trace requests across multiple services
Storage: Store and search large amounts of log data
Format: Different services use different log formats
Log Aggregation
Centralized Logging
All logs sent to central system (ELK, Splunk, Datadog).
class CentralizedLogger {
async log(level: string, message: string, metadata: any): Promise<void> {
const logEntry = {
timestamp: Date.now(),
service: this.serviceName,
level,
message,
...metadata,
traceId: this.getTraceId()
};
// Send to central log aggregator
await this.logAggregator.send(logEntry);
}
}
Structured Logging
Use JSON format for easier parsing and querying.
class StructuredLogger {
log(level: string, message: string, fields: Record<string, any>): void {
const entry = {
timestamp: new Date().toISOString(),
level,
message,
service: 'payment-service',
traceId: this.traceId,
spanId: this.spanId,
...fields
};
console.log(JSON.stringify(entry));
}
}
Trace Correlation
Use trace IDs to correlate logs across services.
class TraceLogger {
private traceId: string;
async handleRequest(request: Request): Promise<Response> {
// Extract or generate trace ID
this.traceId = request.headers['x-trace-id'] || this.generateTraceId();
this.log('info', 'Request received', {
traceId: this.traceId,
method: request.method,
path: request.path
});
// Pass trace ID to downstream services
const response = await this.callDownstream({
...request,
headers: { ...request.headers, 'x-trace-id': this.traceId }
});
this.log('info', 'Request completed', {
traceId: this.traceId,
status: response.status
});
return response;
}
}
Examples
ELK Stack Setup
// Logstash configuration
class LogstashPipeline {
process(log: LogEntry): ProcessedLog {
return {
...log,
parsed: this.parseLog(log.message),
enriched: this.enrichWithMetadata(log),
indexed: this.indexForSearch(log)
};
}
}
// Elasticsearch indexing
class LogIndexer {
async index(log: ProcessedLog): Promise<void> {
await this.elasticsearch.index({
index: `logs-${this.getDateIndex()}`,
body: log
});
}
}
Common Pitfalls
- Not using structured logging: Hard to parse and query. Fix: Use JSON
- Missing trace IDs: Can't correlate logs across services. Fix: Propagate trace IDs
- Logging too much: Performance impact, storage costs. Fix: Use appropriate log levels
- Not sampling: High-volume logs expensive. Fix: Sample low-priority logs
- Sensitive data: Logging passwords, tokens. Fix: Sanitize logs
Interview Questions
Beginner
Q: Why is logging challenging in distributed systems?
A:
Challenges:
- Volume: Many services generate huge log volumes
- Correlation: Hard to trace requests across services
- Format: Different services use different formats
- Storage: Need to store and search massive amounts of data
- Debugging: Finding relevant logs across services is difficult
Solution: Centralized logging with structured logs and trace correlation.
Intermediate
Q: How do you implement distributed logging with trace correlation?
A:
Implementation:
- Generate trace ID at request entry point
- Propagate trace ID in all service calls (headers)
- Include trace ID in all log entries
- Aggregate logs in central system
- Query by trace ID to see full request flow
class DistributedLogging {
async handleRequest(request: Request): Promise<Response> {
const traceId = this.generateTraceId();
// Log with trace ID
this.logger.log('info', 'Request started', { traceId });
// Call service with trace ID
const result = await this.service.call({
...request,
headers: { 'x-trace-id': traceId }
});
this.logger.log('info', 'Request completed', { traceId, result });
return result;
}
}
Senior
Q: Design a distributed logging system for a microservices architecture with 1000+ services. How do you handle volume, correlation, and real-time analysis?
A:
Architecture:
- Log agents on each service
- Message queue for log transport (Kafka)
- Log aggregator (Logstash, Fluentd)
- Storage (Elasticsearch, S3)
- Query/Analysis (Kibana, Grafana)
Design:
class ScalableLoggingSystem {
// Log agent on each service
class LogAgent {
private buffer: LogEntry[] = [];
private kafka: KafkaProducer;
async log(entry: LogEntry): Promise<void> {
// Buffer logs
this.buffer.push(entry);
// Batch send to reduce overhead
if (this.buffer.length >= 100) {
await this.flush();
}
}
async flush(): Promise<void> {
await this.kafka.send('logs', this.buffer);
this.buffer = [];
}
}
// Log aggregator
class LogAggregator {
async process(logs: LogEntry[]): Promise<void> {
for (const log of logs) {
// Parse and enrich
const processed = await this.enrich(log);
// Route to appropriate index
await this.index(processed);
}
}
async enrich(log: LogEntry): Promise<ProcessedLog> {
return {
...log,
serviceMetadata: await this.getServiceMetadata(log.service),
environment: this.getEnvironment(),
parsedFields: this.parseStructuredFields(log)
};
}
}
// Sampling for high-volume logs
class SampledLogger {
shouldLog(level: string): boolean {
if (level === 'error') return true; // Always log errors
if (level === 'warn') return Math.random() < 0.1; // 10% of warnings
if (level === 'info') return Math.random() < 0.01; // 1% of info
return false;
}
}
}
Optimizations:
- Sampling: Sample low-priority logs (keep all errors)
- Batching: Batch logs before sending
- Compression: Compress logs in transit
- Indexing strategy: Time-based indices, rollover old indices
- Retention: Archive old logs to cold storage (S3)
Key Takeaways
- Centralized logging essential for distributed systems
- Structured logging (JSON) enables easier parsing and querying
- Trace correlation using trace IDs across services
- Sampling reduces volume and costs for high-frequency logs
- Real-time analysis requires efficient indexing and querying
- Storage strategy: Hot storage for recent, cold storage for old logs