AWS Containers 02 - keshavbaweja-git/guides GitHub Wiki
CloudWatch Container Insights
- https://aws.amazon.com/blogs/containers/fluent-bit-integration-in-cloudwatch-container-insights-for-eks/
- CloudWatch agent provides container and host metrics, Prometheus metrics, performance metrics
- Fluent Bit is the log-processor and log-forwarder
- Automated dashboards to provide visibility of nodes, cluster, services, pods/tasks, containers, diagnostic information like crashloop backoffs
- Metrics collected by CloudWatch Insights are also available in Metrics section of the CloudWatch console
CloudWatch Container Insights Deployment
- EKS + EC2: Two DaemonSets (cloudwatch-agent, fluentbit)
- EKS + Fargate: AWS Distro for Open Telemetry (ADOT) deployed as a StatefulSet
- ECS: Container Insights with CloudWatch agent
aws ecs put-account-setting --name "containerInsights" --value "enabled"
aws ecs update-cluster-settings --cluster myCICluster --settings name=containerInsights,value=enabled
- ECS: Container Insights with ADOT
Fluent Bit
- open source and multi-platform log processor and forwarder
- collect data and logs from different sources, and unify and send them to different destinations including CloudWatch Logs.
- fully compatible with Docker and Kubernetes environments
- More resource efficient than FluentD
- AWS for Fluent Bit image, which includes Fluent Bit and related plugins, gives Fluent Bit an additional flexibility of adopting new AWS features faster as the image aims to provide a unified experience within AWS ecosystem.
Container logging
- it is recommended to push all the logs, including application logs, through the standard output (stdout) and standard error output (stderr) methods whenever possible using the Docker JSON logging driver
- in EKS, the Docker JSON logging driver is configured by default and everything that a containerized application writes to stdout or stderr is streamed into a JSON file under “/var/log/containers" on the worker node.
- Container Insights classifies those logs into three different categories by default and creates dedicated input streams for each category within Fluent Bit and independent log groups within CloudWatch Logs.
- Application logs: All applications logs stored under “/var/log/containers/*.log" are streamed into the dedicated /aws/containerinsights/Cluster_Name/application log group.
- Host logs: system logs for each EKS worker node are streamed into the /aws/containerinsights/Cluster_Name/host log group. These system logs include the contents of “/var/log/messages,/var/log/dmesg,/var/log/secure” files. Considering the stateless and dynamic nature of containerized workloads, where EKS worker nodes are often terminated during scaling activities, streaming those logs in real time with Fluent Bit and having those logs available in CloudWatch logs, even after the node is terminated, are critical in terms of observability and monitoring health of EKS worker nodes. It also enables you to debug or troubleshoot cluster issues without logging into worker nodes in many cases and analyze these logs in more systematic way.
- Data plane logs: EKS already provides control plane logs. With Fluent Bit integration in Container Insights, the logs generated by EKS data plane components, which run on every worker node and are responsible for maintaining running pods are captured as data plane logs. These logs are also streamed into a dedicated CloudWatch log group under ‘ /aws/containerinsights/Cluster_Name/dataplane. kube-proxy, aws-node, and Docker runtime logs are saved into this log group. In addition to control plane logs, having data plane logs stored in CloudWatch Logs helps to provide a complete picture of your EKS clusters.
CloudWatch Dashboard for Fluent Bit
- Container Insights supports a new CloudWatch dashboard to monitor health and performance of your logging components, specifically Fluent Bit.
- Fluent Bit comes with a built-in HTTP server that can be used to query internal information and more importantly to expose Prometheus style metrics via ‘/api/v1/metrics/prometheus’ for each running plugin and Kubernetes worker node
CloudWatch Logs Insights
- The ability of Container Insights to forward the logs from multiple input streams at large scale using Fluent Bit and group them logically makes it possible to achieve a unified logging and analysis experience for your EKS clusters on AWS.
- For example, with Amazon CloudWatch logs insights, you can interactively search and analyze all the logs generated by your EKS clusters including application logs, and look for the data points, patterns, and trends.
- Using Container Insights along with Log Insights will provide insights you need to understand how your applications and AWS resources are behaving with no additional set up or maintenance requirement on your side. It also provides you fast and interactive tools to analyze and visual them in near real time.
- Log Insights can also handle any log format, and it auto-discovers fields from JSON logs. Together, Container Insights and Logs Insights provide you with a powerful platform to address your operational needs/issues and identify areas for improvement within your EKS clusters.
Container Insights for EKS Fargate with ADOT
-
The ADOT Collector has the concept of a pipeline which comprises three key types of components, namely, receiver, processor, and exporter. - A receiver is how data gets into the collector. It accepts data in a specified format, translates it into the internal format and passes it to processors and exporters defined in the pipeline. It can be pull or push based.
-
A processor is an optional component that is used to perform tasks such as batching, filtering, and transformations on data between being received and being exported.
-
An exporter is used to determine which destination to send the metrics, logs or traces.
-
The collector architecture allows multiple instances of such pipelines to be defined via YAML configuration.
-
The kubelet on a worker node in a Kubernetes cluster exposes resource metrics such as CPU, memory, disk, and network usage at the /metrics/cadvisor endpoint.
-
However, in EKS Fargate networking architecture, a pod is not allowed to directly reach the kubelet on that worker node. Hence, the ADOT Collector calls the Kubernetes API Server to proxy the connection to the kubelet on a worker node, and collect kubelet’s cAdvisor metrics for workloads on that node.
-
These metrics are made available in Prometheus format. Therefore, the collector uses an instance of Prometheus Receiver as a drop-in replacement for a Prometheus server and scrapes these metrics from the Kubernetes API server endpoint.
-
Using Kubernetes service discovery, the receiver can discover all the worker nodes in an EKS cluster. Hence, a single instance of ADOT Collector will suffice to collect resource metrics from all the nodes in a cluster.
-
The metrics then go through a series of processors that perform filtering, renaming, data aggregation and conversion, and so on.
-
The final component in the pipeline is AWS CloudWatch EMF Exporter, which converts the metrics to embedded metric format (EMF) and then sends them directly to CloudWatch Logs using the PutLogEvents API.