AWS_Monitoring - kamialie/knowledge_corner GitHub Wiki

Content

CloudWatch

Application and infrastructure monitoring service.

What is CloudWatch

Enables monitoring and managing various metrics and configuring alarms based on metric data (f.e. resource utilization, application performance and operational health). Collects logs (from custom application as well), metrics, and events from most AWS services. Both built-in and custom metrics can be monitored. Information can be gather from both AWS and On-prem.

CloudWatch alarm uses topics to specify who gets the notification. Topic is created in SNS.

Targets:

  • resources
  • application
  • operations (combination of first 2)

Metric

CloudWatch provides metrics for almost every service in AWS. Metric is a variable to monitor; belongs to a namespace. Dimension is an attribute of a metric (f.e. instance id, environment, etc); up to 10 dimensions per metric.

EC2 metrics update every 5 minutes by default. Detailed monitoring (paid option) enables 1 minute updates. Free Tier provides 10 detailed monitoring metrics. EC2 memory usage is not pushed by default - must be sent from an instance as custom metric.

Metrics are uniquely identified by a name, a namespace, and zero or more dimensions.

Namespace is simply a container for holding metrics, AWS creates many default ones for existing services, e.g. AWS/EC2.

Dimension acts as a filter. For example, InstanceId dimension can be used to search metrics for specific instance. Some service metrics can be aggregated across dimensions.


Custom metrics are pushed via PutMetricData API request. Metric data up to 2 weeks in the past or up to 2 hours ahead is also accepted.

Metric resolution sets frequency updates via StorageResulution API parameter.

  • standard - 1 minute
  • high resolution (extra cost) - 1/5/10/30 seconds

Log

CloudWatch Logs aggregates logs from various sources.

  • SDK
  • CloudWatch Logs Agent
  • CloudWatch Unified Agent
  • AWS services like Elastic Beanstalk, ECS, Lambda, VPC, etc

Logs can be sent to S3 (export), Kinesis, Lambda or OpenSearch. Logs subscription provides real-time log events; targets are Kinesis Data Streams, Kinesis Data Firehose and Lambda. Subscription filter can be additionally applied to choose which logs should be delivered. Cross account subscription is also possible.

Logs are stored in log groups (usually represents an app). Log streams within groups represent instances within app, log files or containers. Logs retention is set to never expire by default. Encryption is set via KMS at the group level. Logs can be tailed from AWS CLI.

Common issue - correct IAM permissions to send logs.

Logs can use filter expression for searching and alarm triggering. Logs Insights allow adding log query results to dashboard.

Filter results with JSON specific syntax format. F.e. see all events performed on CloudWatch log service - pass the following string in search bar:

{ $.eventSource = logs.amazonaws.com }

Agent

Program running on VM (EC2 or on-prem) that can send additional and more granular system-level metrics (RAM, processes, etc) and logs to CloudWatch (source VM must have appropriate permissions). Configurations can be fetched from SSM Parameter Store.

Logs Agent is now deprecated, users should switch to Unified Agent (can be configured using Parameter Store).

# Start the agent
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s

Metrics

Logs Metric Filter can create a custom metric based on a log pattern. Publishes only metric data after it was configured.

Alarm

Trigger notifications based on metrics or logs metric filter.

States:

  • OK - not triggered
  • INSUFFICIENT_DATA - not enough data for evaluation
  • ALARM - threshold breached

Targets (actions):

  • EC2 - stop, start, reboot, recover
  • Auto Scaling
  • SNS
  • Systems Manager

Alarm on high resolution custom metric can be set to a period of 10 or 30 seconds. Regular alarm can be set to multiples of 60 seconds.

Composite Alarm monitors the state of multiple alarms using AND or OR conditions.

Create an alarm for CloudTrail event:

  1. Select Logs -> Log Group Metrics -> IncomingLogEvents metric
  2. Set statistic to sum, period to 1 minute
  3. In conditions set threshold type to static, greater/equal than 1
  4. Additional configuration - missing data treat as good
  5. Select SNS topic to deliver message

Set alarm state to ALARM for testing:

$ aws cloudwatch set-alarm-state --alarm-name <name> --state-value ALARM --state-reason "testing"

Synthetics Canary

Custom script to monitor APIs, URLs or websites. Usually used to reproduce customer interaction. Script can be written in Node.js or Python, also have access to headless Google Chrome browser. Can run once or on schedule.

Dashboard

Group together and show custom dashboards with key metrics and alarms. Global service and can include graphs from other AWS accounts and regions. Can also be shared externally - set up as public, through email or SSO provider through Cognito.

3 dashboards (up to 50 metrics) are free. $3 per dashboard per month for extra dashboards.

Example

Log S3 bucket object deletion.

  1. Create CloudTrail trail - choose a location where to save trail (S3 bucket) and desired resources to monitor - all or specific S3 bucket
  2. Create Lambda function - perform specific action when event occurs; in this example write log, which can be later found in CloudWatch
  3. Create CloudWatch rule - specify event pattern (specific to resource) and target (in this example a Lambda function)

CloudTrail

Comprehensive API auditing and monitoring (event tracker and security analysis tool).

Logs most actions taken on AWS resources, such as API calls, AWS Console actions, and action by other AWS services. That includes what resources were modified and who or what took which action. Automatically enabled when account is created. By default, saves all activity in the last 90 days - available in Event history. To retain logs, create metric events, trigger alerts and so on, create a trail - saves logs to S3 and optionally to CloudWatch Logs for notifications.

Typically takes about 15 minutes to update after an API call. Review, filter, and download of account activity for the past 90 days is free.

Logs can be easily searched and filtered using CloudWatch Logs or Athena. Create Athena table directly in CloudTrail Event history page.

Table schema for queering CloubTrail logs

  • last line contains the full path to S3 bucket where log files are saved.

Event types

Management events include configuration changes to AWS services (creating VPC, instances), reading resources (enumerating security groups), logging in into management console, assuming a role and so on. Configured by default. Read and Write events can be separated.

Data events include object level access to S3, Lambda function execution. Generally high volume events, therefore, are not logged by default. Read and Write events can be separated.

Insights events optional feature that detects unusual activity in the account. F.e. hitting service limits, burst of IAM actions, inaccurate resource provisioning, etc. Creates a baseline for normal management activity and analyzes Write events. Not enabled by default (comes with a cost). Results appear in Console, can be sent to S3 or/and EventBridge.

Trusted Advisor

Inspects AWS infrastructure and provides real-time recommendations based on AWS best practices, in the following categories:

  • cost optimization
  • performance
  • security
  • fault tolerance
  • service limits

7 core Trusted Advisor checks (for free):

  • S3 bucket permissions
  • security groups
  • IAM use
  • MFA on root
  • EBS public snapshots
  • RDS public snapshots
  • service limits

Full Trusted Advisor (available on Business and Enterprise support plans) also provides programmatic access using support API and the ability to set CloudWatch alarms when reaching limits.

Config

Tracks compliance and configuration changes over time based on rules, automates the evaluation of recorded configurations against desired configuration. Free tier is not available.

Provides configuration history for resources, and keeps records of changes in S3. Works as a per-region service, but results can be aggregated across regions and accounts. Notifications about non-compliant resources can be sent to EventBridge. Configuration changes and compliance state can be sent to SNS.

AWS managed rules or custom rules can be specified. Provides Conformance packs with rules for specific compliance standards. Custom rules must be defined in Lambda. Rules can be evaluated after each configuration change and/or on schedule.

Remediation automation uses SSM Automation Documents to perform actions.

⚠️ **GitHub.com Fallback** ⚠️