Choosing Between Object Storage and Local TSDB - datnguyendv/monitoring_tools GitHub Wiki
This guide helps you decide when to use object storage (e.g., S3, GCS) vs local time-series databases (TSDBs) for long-term metrics storage in observability stacks.
Overview
Monitoring stacks need long-term storage for metrics to support historical analysis, compliance, and reliability. There are two main approaches:
- Object Storage: Cloud-native, durable, scalable
- Local TSDB: Disk-based performance storage (e.g., Prometheus TSDB, VictoriaMetrics, Cortex chunks)
Object Storage (S3, GCS, Azure Blob)
Pros:
- Infinite scalability
- High durability (11 9s)
- Ideal for cost-effective long-term retention (months–years)
- Integrates natively with Thanos, Cortex, Mimir
- Multi-region access possible
Cons:
- Query latency is higher (cold data access)
- Requires cache layers (e.g., Thanos Store Gateway index cache)
- Complex architecture with more moving parts
- Write-amplification due to chunk uploads
When to Use:
- You have multi-cluster Prometheus and need a global view
- Long-term storage >90 days
- You're in cloud-native environments (Kubernetes, AWS/GCP)
- You need cheap archival storage and can tolerate cold query latency
Local TSDB (e.g., VictoriaMetrics, Prometheus, TSDB on disk)
Pros: Low-latency queries (esp. hot data) Simple deployment and less network dependency Optimized for real-time dashboards and alerts Great performance for high-ingest scenarios
Cons: Disk storage is finite and expensive to scale HA requires extra setup (replication/failover) Limited cross-cluster visibility Retention is capped by disk volume
When to Use: You need fast, real-time queries (e.g., SLAs, dashboards, SLOs) You're ingesting >1M samples/sec and require tight performance Use cases with short-to-medium retention (15–90d) You're on-premise or in hybrid environments with strong IOPS
Decision Matrix
Requirement | Best Fit |
---|---|
Query speed / low-latency | Local TSDB |
Long-term historical data (6mo+) | Object Storage |
Multi-cluster/global observability | Object Storage |
Simplicity / low operational burden | Local TSDB |
Cloud-native architecture | Object Storage |
On-prem / hybrid infrastructure | Local TSDB |
Hybrid Recommendation
- Use remote_write to a long-term object storage backend (e.g., Thanos Receive → S3)
- Keep a short local retention window (e.g., 7–30 days) in Prometheus or VM
- Let Grafana/Thanos Query or VMSelect handle queries across both hot and cold d