Choosing Between Object Storage and Local TSDB - datnguyendv/monitoring_tools GitHub Wiki

This guide helps you decide when to use object storage (e.g., S3, GCS) vs local time-series databases (TSDBs) for long-term metrics storage in observability stacks.

Overview

Monitoring stacks need long-term storage for metrics to support historical analysis, compliance, and reliability. There are two main approaches:

  • Object Storage: Cloud-native, durable, scalable
  • Local TSDB: Disk-based performance storage (e.g., Prometheus TSDB, VictoriaMetrics, Cortex chunks)

Object Storage (S3, GCS, Azure Blob)

Pros:

  • Infinite scalability
  • High durability (11 9s)
  • Ideal for cost-effective long-term retention (months–years)
  • Integrates natively with Thanos, Cortex, Mimir
  • Multi-region access possible

Cons:

  • Query latency is higher (cold data access)
  • Requires cache layers (e.g., Thanos Store Gateway index cache)
  • Complex architecture with more moving parts
  • Write-amplification due to chunk uploads

When to Use:

  • You have multi-cluster Prometheus and need a global view
  • Long-term storage >90 days
  • You're in cloud-native environments (Kubernetes, AWS/GCP)
  • You need cheap archival storage and can tolerate cold query latency

Local TSDB (e.g., VictoriaMetrics, Prometheus, TSDB on disk)

Pros: Low-latency queries (esp. hot data) Simple deployment and less network dependency Optimized for real-time dashboards and alerts Great performance for high-ingest scenarios

Cons: Disk storage is finite and expensive to scale HA requires extra setup (replication/failover) Limited cross-cluster visibility Retention is capped by disk volume

When to Use: You need fast, real-time queries (e.g., SLAs, dashboards, SLOs) You're ingesting >1M samples/sec and require tight performance Use cases with short-to-medium retention (15–90d) You're on-premise or in hybrid environments with strong IOPS

Decision Matrix

Requirement Best Fit
Query speed / low-latency Local TSDB
Long-term historical data (6mo+) Object Storage
Multi-cluster/global observability Object Storage
Simplicity / low operational burden Local TSDB
Cloud-native architecture Object Storage
On-prem / hybrid infrastructure Local TSDB

Hybrid Recommendation

  • Use remote_write to a long-term object storage backend (e.g., Thanos Receive → S3)
  • Keep a short local retention window (e.g., 7–30 days) in Prometheus or VM
  • Let Grafana/Thanos Query or VMSelect handle queries across both hot and cold d