Day 9 Monitoring Maturity Model - vinoji2005/GitHub-Repository-Structure-90-Days-Observability-Mastery GitHub Wiki
Distributed tracing allows you to track a single request as it flows through multiple services.
Useful for debugging microservices, identifying latency sources, and performing RCA.
A complete journey of a request.
Example:
Trace ID: fbc6b2ef87d17cba8c40f389e1d3411
A single operation within a trace.
Each span has:
-
span_id
-
parent_span_id
-
start/end time
-
duration
-
status
-
metadata (attributes)
Span A (Frontend) βββ Span B (API) β βββ Span C (Auth) β βββ Span D (Cart) βββ Span E (Logging)
Ensures trace continuity across services using W3C TraceContext.
Example HTTP header:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
docker run -d --name jaeger \ -p 16686:16686 \ jaegertracing/all-in-one:latest
Open Jaeger UI:
http://localhost:16686
pip install opentelemetry-instrumentation opentelemetry-instrument python app.py
Collector config:
exporters: jaeger: endpoint: jaeger:14250
In Jaeger UI:
-
Select a service
-
View traces
-
Open βFlame Graphβ and βTrace Timelineβ
Issue: Checkout API slow (4 seconds)
Trace Breakdown:
-
API Gateway: 20ms
-
Cart Service: 80ms
-
Payment Service: 2900ms
-
Database (inside Payment): 800ms
Root Cause: Slow downstream payment provider + inefficient DB query.
Tracing β found root cause in minutes.
-
What is a trace?
-
What is a span?
-
What is a trace ID?
-
Why is context propagation important?
-
Explain head vs tail sampling.
-
What is a span tree?
-
How do you integrate tracing with logs and metrics?
-
How can tracing reduce MTTR?
-
How do retries show up in tracing?
-
Design a tracing pipeline using OTel Collector.
-
Role of semantic conventions in large orgs.
-
How to implement distributed tracing across 500+ microservices.
β’ Key learnings from Day 9: β’ Concepts I need to revisit: β’ Tools I will test: β’ How tracing can help my project: β’ Questions I still have: