Assessment ‐ Advanced Actionplan - Campus-Castolo/m300 GitHub Wiki

🚀 To Advanced Action Plan – M300 Competency Completion Guide

This action plan outlines the precise technical steps I need to take to upgrade all remaining intermediate competencies to the Advanced level in the M300 matrix.

🔷 D1 – Netzwerkverbindungen konfigurieren und testen (Network Configuration & Testing)

🎯 Goals:

Implement full visibility and proof of network connectivity and control.
Monitor and document VPC traffic and security posture.

✅ Tasks:

Enable VPC Flow Logs

resource "aws_cloudwatch_log_group" "vpc_flow_logs" {
  name              = "/aws/vpc/flow-logs"
  retention_in_days = 14
}

resource "aws_flow_log" "vpc" {
  vpc_id              = aws_vpc.main.id
  traffic_type        = "ALL"
  log_destination     = aws_cloudwatch_log_group.vpc_flow_logs.arn
  log_destination_type = "cloud-watch-logs"
}

ECS ↔ RDS Test
- Use exec into a running ECS task:
```
nc -vz <rds-endpoint> 3306
```
- Screenshot output and document result.
Lambda ↔ RDS Snapshot Verification
- In CloudWatch, verify snapshot logs:
  - Log group: /aws/lambda/rds-backup
  - Look for CreateDBSnapshot events.
(Optional): Add ICMP allow rule and perform ping test between ECS and RDS (if allowed).
Document all results in corrected_network_topology.md.

🔷 E2 – Betrieb und Überwachung von Services (Service Operations & Observability)

🎯 Goals:

Validate observability, restore capability, and automated recovery behaviors.

✅ Tasks:

Create CloudWatch Dashboards
- ECS Metrics:
  - CPUUtilization, MemoryUtilization
- RDS Metrics:
  - FreeStorageSpace, DatabaseConnections

Configure Autoscaling for ECS

resource "aws_appautoscaling_target" "ecs" {
  max_capacity       = 4
  min_capacity       = 1
  resource_id        = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.wordpress.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "cpu" {
  policy_type = "TargetTrackingScaling"
  ...
}

Trigger and Document CloudWatch Alarm
- Temporarily lower threshold (e.g., CPU < 5%).
- Receive SNS alert via email/SMS.
- Take screenshot of received notification.
Restore from Snapshot
- Create a new DB from an RDS snapshot via console or Terraform.
- Validate login via MySQL client.
- Document procedure in updated_security_concept.md.

🔷 F1 – Fehleranalyse und Protokollierung (Error Analysis & Logging)

🎯 Goals:

Prove ability to detect, log, and resolve system-level failures with documentation.

✅ Tasks:

Structured Log Format for ECS

Modify app/container to output JSON logs:

{"timestamp":"...", "level":"error", "message":"Database unreachable"}

Simulate Common Failures
- ECS Task Crash: stop task manually or inject bad config.
- RDS CPU spike: run stress test query.
- ALB health check fail: block port temporarily.
Observe Logs + Alerts
- Ensure alert triggers from failure.
- Copy relevant log output (timestamp, error, affected system).

Error Classification Table

Error Type	Cause	Detection	Resolution
ECS CrashLoopBackOff	Bad image env	ECS logs, alarm	Re-deploy image
RDS CPU Spike	Query overload	CloudWatch metric	Optimize / restart
ALB 5xx Surge	No healthy ECS tasks	ALB logs, health fail	Restart ECS service

✅ Summary Table – Final Verification

Category	Action Status	Notes
D1	⏳ In Progress	Add flow logs, connectivity tests
E2	⏳ In Progress	Add dashboards, restore test
F1	⏳ In Progress	Simulate & log error resolution