Architecture Decisions & Comparisons - datnguyendv/monitoring_tools GitHub Wiki

This section covers in-depth comparisons of critical architectural choices for building a scalable monitoring stack. It helps guide when to use certain technologies such as remote_write vs remote_read, or choose between Thanos and VictoriaMetrics based on your use case, performance needs, and operational complexity.

⚙️ Remote Write vs Remote Read

Remote Write

Definition: Prometheus or vmagent sends collected metrics to a remote storage backend (e.g., VictoriaMetrics, Thanos Receive, Cortex) asynchronously. Use cases:

Offloading time-series data to long-term storage
Reducing load on Prometheus
Decoupling scraping and querying
Multi-tenant ingestion setups

Pros:

Efficient and low-latency ingestion
Prometheus becomes stateless (good for scaling or HA)
Works well with vmagent to push from edge nodes or sidecars

Cons:

Prometheus cannot natively query the remote storage (requires other components)
Requires additional query layer (e.g., VMSelect, Thanos Query)

Remote Read

Definition: Prometheus queries historical metrics back from the remote storage (typically same backend used in remote_write). Use cases:

Display historical data directly in Prometheus UI
Minimal integration: no external query layer

Pros:

Seamless integration with existing Prometheus dashboards
Simple for backfilling or hybrid queries

Cons:

Can be slow or inefficient for large ranges
Not optimized for high cardinality or concurrent queries
Tight coupling between Prometheus and backend performance

Recommendation Summary

Use remote_write for scalable, long-term storage setups and HA designs.
Use remote_read only when you need legacy UI compatibility or minimal architectural changes.
Do not rely solely on remote_read in high-scale or distributed environments.

🔍 Thanos vs VictoriaMetrics

Overview Table

Feature	Thanos	VictoriaMetrics
Architecture	Modular (Sidecar, Store, Query)	Monolithic or cluster-based
Storage backend	Object storage (S3, GCS, etc.)	Local disk or S3/GCS
Query latency	Medium	Very low (especially with VMSelect)
Query language	Full PromQL	Full PromQL + some extensions
Deduplication	Yes (via Store + Sidecars)	Yes (via `replicationFactor`)
High availability	Built-in (multi-store/query)	Built-in (via cluster mode)
Alerting	Thanos Rule	External Alertmanager or built-in VMRule
Setup complexity	High	Low to medium
Scaling model	Horizontal via component shards	Horizontal via cluster roles

When to Use Thanos

Use Thanos if:

You have multiple Prometheus instances and want federated query
You need to store metrics in object storage (S3/GCS)
You want to maintain Prometheus-native format and separation of concerns
You are operating in a cloud-native Kubernetes environment
You need multi-tenant metrics aggregation across environments
Your infra has >10 Prometheus shards across clusters and requires unified long-term query access

When to Use VictoriaMetrics

Use VictoriaMetrics if:

You require ultra-fast ingestion and querying at scale
You prefer a simpler deployment model with fewer components
You want to avoid operational complexity of Thanos sidecars/stores
You are using vmagent at edge locations or VMs and want efficient remote_write
Your metric volume exceeds 1M samples/sec and you need optimized TSDB performance
You want to deploy in a hybrid environment (K8s + bare-metal/VM)

Recommendation Summary

Choose Thanos for long-term, cloud-native, multi-cluster setups, particularly when object storage and query federation across regions/clusters are important.
Choose VictoriaMetrics for high-throughput, cost-effective, and low-latency metric pipelines with simpler operations.
Suggested guideline:
- Small-to-medium scale (<100K samples/sec, single cluster): Prometheus + remote_write to VM single-node is sufficient.
- Medium-to-large (100K–1M samples/sec): VM Cluster is preferred over Thanos for performance.
- Multi-cluster, cloud-first teams (>5 clusters, 10+ Prometheus): Thanos offers better federation and query unification.
- Low-ops team with few infra engineers: VictoriaMetrics simplifies lifecycle management.