AWS_Monitoring - kamialie/knowledge_corner GitHub Wiki
Application and infrastructure monitoring service.
Enables monitoring and managing various metrics and configuring alarms based on metric data (f.e. resource utilization, application performance and operational health). Collects logs (from custom application as well), metrics, and events from most AWS services. Both built-in and custom metrics can be monitored. Information can be gather from both AWS and On-prem.
CloudWatch
alarm uses topics to specify who gets the notification. Topic is
created in SNS
.
Targets:
- resources
- application
- operations (combination of first 2)
CloudWatch
provides metrics for almost every service in AWS. Metric is a
variable to monitor; belongs to a namespace. Dimension is an attribute of a
metric (f.e. instance id, environment, etc); up to 10 dimensions per metric.
EC2
metrics update every 5 minutes by default. Detailed monitoring (paid
option) enables 1 minute updates. Free Tier provides 10 detailed monitoring
metrics. EC2
memory usage is not pushed by default - must be sent from an
instance as custom metric.
Metrics are uniquely identified by a name, a namespace, and zero or more dimensions.
Namespace is simply a container for holding metrics, AWS creates
many default ones for existing services, e.g. AWS/EC2
.
Dimension acts as a filter. For example, InstanceId
dimension can be used to
search metrics for specific instance. Some service metrics can be aggregated
across dimensions.
Custom metrics are pushed via PutMetricData API request. Metric data up to 2 weeks in the past or up to 2 hours ahead is also accepted.
Metric resolution sets frequency updates via StorageResulution API parameter.
- standard - 1 minute
- high resolution (extra cost) - 1/5/10/30 seconds
CloudWatch Logs
aggregates logs from various sources.
- SDK
- CloudWatch Logs Agent
- CloudWatch Unified Agent
- AWS services like
Elastic Beanstalk
,ECS
,Lambda
,VPC
, etc
Logs can be sent to S3
(export), Kinesis
, Lambda
or OpenSearch
. Logs
subscription provides real-time log events; targets are Kinesis Data Streams
,
Kinesis Data Firehose
and Lambda
. Subscription filter can be additionally
applied to choose which logs should be delivered. Cross account subscription is
also possible.
Logs are stored in log groups (usually represents an app). Log streams
within groups represent instances within app, log files or containers. Logs
retention is set to never expire by default. Encryption is set via KMS
at the
group level. Logs can be tailed from AWS CLI.
Common issue - correct IAM
permissions to send logs.
Logs can use filter expression for searching and alarm triggering. Logs Insights
allow adding log query results to dashboard.
Filter results with JSON specific syntax format. F.e. see all events performed
on CloudWatch
log service - pass the following string in search bar:
{ $.eventSource = logs.amazonaws.com }
Program running on VM (EC2
or on-prem) that can send additional and more
granular system-level metrics (RAM, processes, etc) and logs to CloudWatch
(source VM must have appropriate permissions). Configurations can be fetched
from SSM Parameter Store
.
Logs Agent
is now deprecated, users should switch to Unified Agent
(can be
configured using Parameter Store
).
# Start the agent
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
Logs Metric Filter
can create a custom metric based on a log pattern.
Publishes only metric data after it was configured.
Trigger notifications based on metrics or logs metric filter.
States:
- OK - not triggered
- INSUFFICIENT_DATA - not enough data for evaluation
- ALARM - threshold breached
Targets (actions):
-
EC2
- stop, start, reboot, recover - Auto Scaling
SNS
Systems Manager
Alarm on high resolution custom metric can be set to a period of 10 or 30 seconds. Regular alarm can be set to multiples of 60 seconds.
Composite Alarm monitors the state of multiple alarms using AND or OR conditions.
Create an alarm for CloudTrail
event:
- Select Logs -> Log Group Metrics -> IncomingLogEvents metric
- Set statistic to sum, period to 1 minute
- In conditions set threshold type to static, greater/equal than 1
- Additional configuration - missing data treat as good
- Select
SNS
topic to deliver message
Set alarm state to ALARM for testing:
$ aws cloudwatch set-alarm-state --alarm-name <name> --state-value ALARM --state-reason "testing"
Custom script to monitor APIs, URLs or websites. Usually used to reproduce customer interaction. Script can be written in Node.js or Python, also have access to headless Google Chrome browser. Can run once or on schedule.
Group together and show custom dashboards with key metrics and alarms. Global
service and can include graphs from other AWS accounts and regions. Can also be
shared externally - set up as public, through email or SSO provider through
Cognito
.
3 dashboards (up to 50 metrics) are free. $3 per dashboard per month for extra dashboards.
Log S3 bucket object deletion.
- Create CloudTrail trail - choose a location where to save trail (S3 bucket) and desired resources to monitor - all or specific S3 bucket
- Create Lambda function - perform specific action when event occurs; in this example write log, which can be later found in CloudWatch
- Create CloudWatch rule - specify event pattern (specific to resource) and target (in this example a Lambda function)
Comprehensive API auditing and monitoring (event tracker and security analysis tool).
Logs most actions taken on AWS resources, such as API calls, AWS Console
actions, and action by other AWS services. That includes what resources were
modified and who or what took which action. Automatically enabled when account
is created. By default, saves all activity in the last 90 days - available in
Event history
. To retain logs, create metric events, trigger alerts and so
on, create a trail - saves logs to S3
and optionally to CloudWatch Logs
for notifications.
Typically takes about 15 minutes to update after an API call. Review, filter, and download of account activity for the past 90 days is free.
Logs can be easily searched and filtered using CloudWatch Logs
or Athena
.
Create Athena
table directly in CloudTrail Event history
page.
Table schema for queering CloubTrail logs
- last line contains the full path to
S3
bucket where log files are saved.
Management events include configuration changes to AWS services (creating
VPC
, instances), reading resources (enumerating security groups), logging in
into management console, assuming a role and so on. Configured by default.
Read and Write events can be separated.
Data events include object level access to S3
, Lambda
function execution.
Generally high volume events, therefore, are not logged by default. Read
and Write events can be separated.
Insights events
optional feature that detects unusual activity in the account. F.e. hitting
service limits, burst of IAM
actions, inaccurate resource provisioning, etc.
Creates a baseline for normal management activity and analyzes Write events.
Not enabled by default (comes with a cost). Results appear in Console, can be
sent to S3
or/and EventBridge
.
Inspects AWS infrastructure and provides real-time recommendations based on AWS best practices, in the following categories:
- cost optimization
- performance
- security
- fault tolerance
- service limits
7 core Trusted Advisor
checks (for free):
-
S3
bucket permissions - security groups
-
IAM
use - MFA on root
-
EBS
public snapshots -
RDS
public snapshots - service limits
Full Trusted Advisor
(available on Business and Enterprise support plans)
also provides programmatic access using support API and the ability to set
CloudWatch
alarms when reaching limits.
Tracks compliance and configuration changes over time based on rules, automates the evaluation of recorded configurations against desired configuration. Free tier is not available.
Provides configuration history for resources, and keeps records of changes in
S3
. Works as a per-region service, but results can be aggregated across
regions and accounts. Notifications about non-compliant resources can be sent
to EventBridge
. Configuration changes and compliance state can be sent to
SNS
.
AWS managed rules or custom rules can be specified. Provides Conformance packs
with rules for specific compliance standards. Custom rules must be defined in
Lambda
. Rules can be evaluated after each configuration change and/or on
schedule.
Remediation automation uses SSM Automation Documents
to perform actions.