Assessment ‐ Advanced Actionplan - Campus-Castolo/m300 GitHub Wiki
🚀 To Advanced Action Plan – M300 Competency Completion Guide
This action plan outlines the precise technical steps I need to take to upgrade all remaining intermediate competencies to the Advanced level in the M300 matrix.
🔷 D1 – Netzwerkverbindungen konfigurieren und testen (Network Configuration & Testing)
🎯 Goals:
- Implement full visibility and proof of network connectivity and control.
- Monitor and document VPC traffic and security posture.
✅ Tasks:
-
Enable VPC Flow Logs
resource "aws_cloudwatch_log_group" "vpc_flow_logs" { name = "/aws/vpc/flow-logs" retention_in_days = 14 } resource "aws_flow_log" "vpc" { vpc_id = aws_vpc.main.id traffic_type = "ALL" log_destination = aws_cloudwatch_log_group.vpc_flow_logs.arn log_destination_type = "cloud-watch-logs" }
-
ECS ↔ RDS Test
- Use
exec
into a running ECS task:nc -vz <rds-endpoint> 3306
- Screenshot output and document result.
- Use
-
Lambda ↔ RDS Snapshot Verification
- In CloudWatch, verify snapshot logs:
- Log group:
/aws/lambda/rds-backup
- Look for
CreateDBSnapshot
events.
- Log group:
- In CloudWatch, verify snapshot logs:
-
(Optional): Add ICMP allow rule and perform ping test between ECS and RDS (if allowed).
-
Document all results in
corrected_network_topology.md
.
🔷 E2 – Betrieb und Überwachung von Services (Service Operations & Observability)
🎯 Goals:
- Validate observability, restore capability, and automated recovery behaviors.
✅ Tasks:
-
Create CloudWatch Dashboards
- ECS Metrics:
- CPUUtilization, MemoryUtilization
- RDS Metrics:
- FreeStorageSpace, DatabaseConnections
- ECS Metrics:
-
Configure Autoscaling for ECS
resource "aws_appautoscaling_target" "ecs" { max_capacity = 4 min_capacity = 1 resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.wordpress.name}" scalable_dimension = "ecs:service:DesiredCount" service_namespace = "ecs" } resource "aws_appautoscaling_policy" "cpu" { policy_type = "TargetTrackingScaling" ... }
-
Trigger and Document CloudWatch Alarm
- Temporarily lower threshold (e.g., CPU < 5%).
- Receive SNS alert via email/SMS.
- Take screenshot of received notification.
-
Restore from Snapshot
- Create a new DB from an RDS snapshot via console or Terraform.
- Validate login via MySQL client.
- Document procedure in
updated_security_concept.md
.
🔷 F1 – Fehleranalyse und Protokollierung (Error Analysis & Logging)
🎯 Goals:
- Prove ability to detect, log, and resolve system-level failures with documentation.
✅ Tasks:
-
Structured Log Format for ECS
- Modify app/container to output JSON logs:
{"timestamp":"...", "level":"error", "message":"Database unreachable"}
- Modify app/container to output JSON logs:
-
Simulate Common Failures
- ECS Task Crash: stop task manually or inject bad config.
- RDS CPU spike: run stress test query.
- ALB health check fail: block port temporarily.
-
Observe Logs + Alerts
- Ensure alert triggers from failure.
- Copy relevant log output (timestamp, error, affected system).
-
Error Classification Table
Error Type Cause Detection Resolution ECS CrashLoopBackOff Bad image env ECS logs, alarm Re-deploy image RDS CPU Spike Query overload CloudWatch metric Optimize / restart ALB 5xx Surge No healthy ECS tasks ALB logs, health fail Restart ECS service
✅ Summary Table – Final Verification
Category | Action Status | Notes |
---|---|---|
D1 | ⏳ In Progress | Add flow logs, connectivity tests |
E2 | ⏳ In Progress | Add dashboards, restore test |
F1 | ⏳ In Progress | Simulate & log error resolution |