Home - vinoji2005/GitHub-Repository-Structure-90-Days-Observability-Mastery GitHub Wiki
90 Days Observability & Monitoring Mastery
Welcome to the official wiki for the 90 Days Observability & Monitoring Mastery learning journey.
This wiki contains:
- Daily learning notes
- Hands-on labs
- Architecture diagrams
- KQL / PromQL queries
- Dashboards & Best Practices
- Final Observability Architecture (Capstone)
Use the navigation below to explore all modules.
📅 Roadmap Overview
Module 1: Foundations (Day 1–10)
- Day 1 - Monitoring vs Observability
- Day 2 - Logs, Metrics, Traces
- Day 3 - Golden Signals
- Day 4 - SLI, SLO, SLA
- Day 5 - Telemetry Pipeline
- Day 6 - OpenTelemetry Overview
- Day 7 - Event Correlation
- Day 8 - Why Monitoring Fails in Enterprises
- Day 9 - Monitoring Maturity Model
- Day 10 - Hands-on: Azure Monitor + Grafana Basics
Module 2: Infrastructure Monitoring (Day 11–20)
- Day 11 - Metrics Deep Dive — Counters, Gauges, Histograms, Summaries, Cardinality, Aggregations, Labels
- Day 12 - Linux Monitoring Deep Dive
- Day 13 - Windows Monitoring Deep Dive
- Day 14 - Container Monitoring
- Day 15 - Kubernetes Monitoring
- Day 16 - Kubernetes Golden Signals
- Day 17 - Network Monitoring
- Day 18 - Database Monitoring
- Day 19 - Storage Monitoring
- Day 20 - Infrastructure Health Dashboard
Module 3: APM (Application Performance Monitoring) (Day 21–35)
- Day 21 - Introduction to APM
- Day 22 - Instrumentation Techniques
- Day 23 - Distributed Tracing Basics
- Day 24 - Trace Context & Span Lifecycle
- Day 25 - API Monitoring
- Day 26 - Microservices Monitoring
- Day 27 - Function App Monitoring
- Day 28 - Dependency Monitoring
- Day 29 - Mobile APM
- Day 30 - Real User Monitoring (RUM)
- Day 31 - Core Web Vitals
- Day 32 - Error Monitoring
- Day 33 - APM + Logs Correlation
- Day 34 - Business Transaction Tracking
- Day 35 - APM Dashboard
Module 4: Synthetic & User Monitoring (Day 36–45)
- Day 36 - Synthetic Monitoring Overview
- Day 37 - Ping/HTTP/DNS Tests
- Day 38 - Multi-step Browser Tests
- Day 39 - API Contract Testing
- Day 40 - Function App Synthetic Tests
- Day 41 - User Journey Monitoring
- Day 42 - Real User Monitoring Advanced
- Day 43 - Session Replay
- Day 44 - Frontend Error Monitoring
- Day 45 - Synthetics Dashboard
Module 5: Logs, Metrics, Traces Deep Dive (Day 46–55)
- Day 46 - Log Types & Architecture
- Day 47 - Metrics Internals
- Day 48 - Trace Lifecycle
- Day 49 - Metric Cardinality Problems
- Day 50 - Log Storage Architecture
- Day 51 - KQL for Logs
- Day 52 - PromQL for Metrics
- Day 53 - Correlation IDs
- Day 54 - High-cardinality Optimization
- Day 55 - Log + Trace Correlation Dashboard
Module 6: Dashboards (Day 56–65)
- Day 56 - Dashboard Best Practices
- Day 57 - Grafana Essentials
- Day 58 - Grafana Polystat Panels
- Day 59 - Azure Monitor Workbooks
- Day 60 - Real-time Dashboards
- Day 61 - Heatmaps & Histograms
- Day 62 - Leadership Dashboards
- Day 63 - SRE Dashboards
- Day 64 - Multi-Environment Dashboards
- Day 65 - Application Health Dashboard
Module 7: Alerts & Incident Response (Day 66–72)
- Day 66 - Alerting Strategies
- Day 67 - Dynamic Alerts
- Day 68 - Alert Noise Reduction
- Day 69 - PagerDuty Integration
- Day 70 - On-call Best Practices
- Day 71 - Error Budget Alerts
- Day 72 - Auto-Remediation
Module 8: Cost Optimization (Day 73–76)
- Day 73 - Observability Cost Drivers
- Day 74 - Log Retention Strategy
- Day 75 - Metrics Optimization
- Day 76 - Cost-Efficient Monitoring Architecture
Module 9: Security + Observability (Day 77–80)
- Day 77 - SIEM vs Observability
- Day 78 - IAM & Audit Monitoring
- Day 79 - Cloud Security Monitoring
- Day 80 - Threat Detection with Telemetry
Module 10: SRE Practices (Day 81–84)
- Day 81 - What SRE Means
- Day 82 - Error Budgets
- Day 83 - Production Readiness
- Day 84 - SRE + Observability Integration
Module 11: Advanced Topics (Day 85–88)
- Day 85 - AIOps
- Day 86 - Capacity Planning
- Day 87 - Chaos Engineering
- Day 88 - Monitoring AI Workloads