Choosing Between Object Storage and Local TSDB - datnguyendv/monitoring_tools GitHub Wiki

This guide helps you decide when to use object storage (e.g., S3, GCS) vs local time-series databases (TSDBs) for long-term metrics storage in observability stacks.

Overview

Monitoring stacks need long-term storage for metrics to support historical analysis, compliance, and reliability. There are two main approaches:

Object Storage: Cloud-native, durable, scalable
Local TSDB: Disk-based performance storage (e.g., Prometheus TSDB, VictoriaMetrics, Cortex chunks)

Object Storage (S3, GCS, Azure Blob)

Pros:

Infinite scalability
High durability (11 9s)
Ideal for cost-effective long-term retention (months–years)
Integrates natively with Thanos, Cortex, Mimir
Multi-region access possible

Cons:

Query latency is higher (cold data access)
Requires cache layers (e.g., Thanos Store Gateway index cache)
Complex architecture with more moving parts
Write-amplification due to chunk uploads

When to Use:

You have multi-cluster Prometheus and need a global view
Long-term storage >90 days
You're in cloud-native environments (Kubernetes, AWS/GCP)
You need cheap archival storage and can tolerate cold query latency

Local TSDB (e.g., VictoriaMetrics, Prometheus, TSDB on disk)

Pros: Low-latency queries (esp. hot data) Simple deployment and less network dependency Optimized for real-time dashboards and alerts Great performance for high-ingest scenarios

Cons: Disk storage is finite and expensive to scale HA requires extra setup (replication/failover) Limited cross-cluster visibility Retention is capped by disk volume

When to Use: You need fast, real-time queries (e.g., SLAs, dashboards, SLOs) You're ingesting >1M samples/sec and require tight performance Use cases with short-to-medium retention (15–90d) You're on-premise or in hybrid environments with strong IOPS

Decision Matrix

Requirement	Best Fit
Query speed / low-latency	Local TSDB
Long-term historical data (6mo+)	Object Storage
Multi-cluster/global observability	Object Storage
Simplicity / low operational burden	Local TSDB
Cloud-native architecture	Object Storage
On-prem / hybrid infrastructure	Local TSDB

Hybrid Recommendation

Use remote_write to a long-term object storage backend (e.g., Thanos Receive → S3)
Keep a short local retention window (e.g., 7–30 days) in Prometheus or VM
Let Grafana/Thanos Query or VMSelect handle queries across both hot and cold d