Operations and Stackdriver - bobbae/gcp GitHub Wiki

Cloud Operations

AKA Stackdriver

Introduction

Monitor, trace, troubleshoot, and improve application performance on your Google Cloud environment.

https://cloud.google.com/stackdriver/docs

Introduction to basic logging and monitoring with Stackdriver with a quick demo.

https://www.youtube.com/watch?v=LVFr5qW4wO4

Cloud Logging

Cloud Logging allows you to store, search, analyze, monitor, and alert on logging data and events from Google Cloud and Amazon Web Services. Using Cloud Logging includes access to the BindPlane service, which you can use to collect logging data from over 150 common application components, on-premises systems, and hybrid cloud systems.

https://cloud.google.com/logging/docs

GCP Essentials: Cloud Logging.

https://www.youtube.com/watch?v=gyDp-Cl_MdA

Types of metrics

https://cloud.google.com/blog/products/operations/in-depth-explanation-of-operational-metrics-at-google-cloud

Cloud Monitoring

Cloud Monitoring collects metrics, events, and metadata from Google Cloud, Amazon Web Services (AWS), hosted uptime probes, and application instrumentation. Google Cloud's operations suite ingests that data and generates insights via dashboards, charts, and alerts.

https://cloud.google.com/monitoring/docs

Getting started with Cloud Monitoring

https://www.youtube.com/watch?v=wY8cmFY4ua8

Cloud Monitoring Agents

The Cloud Monitoring agent is a collectd-based daemon that gathers system and application metrics from virtual machine instances and sends them to Monitoring. By default, the Monitoring agent collects disk, CPU, network, and process metrics.

https://cloud.google.com/monitoring/agent

Dashboards

https://cloud.google.com/monitoring/charts/dashboards

Metrics Explorer

https://cloud.google.com/monitoring/charts/metrics-explorer

Cloud Trace

Cloud Trace is a distributed tracing system for Google Cloud that collects latency data from applications and displays it in near real-time in the Google Cloud Console.

https://cloud.google.com/trace/docs

Cloud Trace is a feature of the Google Cloud Platform that allows you to view the RPCs (remote procedure calls) invoked by your App Engine application and to view and analyze the time taken to complete each RPC and the overall latency of processing your applications requests.

https://www.youtube.com/watch?v=NCFDqeo7AeY

Cloud Debugger

Cloud Debugger is a feature of Google Cloud Platform that lets you inspect the state of an application, at any code location, without stopping or slowing down the running app. Cloud Debugger makes it easier to view the application state without adding logging statements.

https://cloud.google.com/debugger/docs

Cloud Debugger makes it easier to view the application state at any point in the code without any modifications to your code.

https://www.youtube.com/watch?v=DCtLE6zPMdQ

Cloud Profiler

Cloud Profiler is a statistical, low-overhead profiler that continuously gathers CPU usage and memory-allocation information from your production applications. It attributes that information to the application's source code, helping you identify the parts of the application consuming the most resources, and otherwise illuminating the performance characteristics of the code.

https://cloud.google.com/profiler/docs

Introduction to Stackdriver Profiler

https://www.youtube.com/watch?v=KXjPhadwr8k

Error Reporting

Error Reporting aggregates and displays errors produced in your running cloud services.

https://cloud.google.com/error-reporting/docs

Error reporting can be useful in identifying and resolving bugs in your application.

https://www.youtube.com/watch?v=GANi9eRxhHs

Service Level Monitoring

Service monitoring and the SLO API help you manage your services like Google manages its own services.

https://cloud.google.com/stackdriver/docs/solutions/slo-monitoring

Basics of Service Level Monitoring.

https://www.youtube.com/watch?v=u84TKyX8SfU

GCPDiag

https://gcpdiag.dev/

Qwiklabs

Operations Suite

Google Cloud's Operations Suite

Cloud Monitoring

Monitor and Log with Google Cloud Operations Suite

Cloud Trace

When supporting a production system that services HTTP requests or provides an API, it is important to measure the latency of your endpoints to detect when a system's performance is not operating within specification. In monolithic systems this single latency measure may be useful to detect and diagnose deteriorating behavior. With modern microservice architectures, however, this becomes much more difficult because a single request may result in numerous additional requests to other systems before the request can be fully handled. Deteriorating performance in an underlying system may impact all other systems that rely on it. While latency can be measured at each service endpoint, it can be difficult to correlate slow behavior in the public endpoint with a particular sub-service that is misbehaving.

Using Cloud Trace on Kubernetes Engine

Cloud Logging

Cloud Logging

Application Logs

Fundamentals of Cloud Logging

BigQuery Logging

Using BigQuery and Cloud Logging to Analyze BigQuery Usage

grafana

Using stackdriver with grafana.

https://grafana.com/docs/grafana/latest/datasources/google-cloud-monitoring/