Using OpenShift Audit Logs for Threat Detection and Incident Response - ralvares/openshift-security-framework GitHub Wiki

Using OpenShift Audit Logs for Threat Detection and Incident Response

1 Overview

Audit logs in OpenShift are an authoritative, tamper-evident record of every API request processed by the control plane. Each entry captures who performed the action, what they did, when it occurred, where in the cluster it happened, and how the request was evaluated. Properly enabled and analysed, these logs form the backbone of threat detection, incident response, compliance evidence, and continuous-monitoring programmes in modern, cloud-native environments.

2 What Are Audit Logs?

2.0 Definition

Audit logs are structured JSON records generated by the OpenShift API server for selected requests initiated by users, service accounts, controllers, and system components.

2.0 Purpose

Discipline	Value Proposition
Security	Surface, triage, and investigate suspicious or unauthorised activity.
Compliance	Provide defensible proof that access controls and change-management processes exist.
Operational Insight	Reconstruct change history, troubleshoot misconfigurations, and measure platform usage.

2.1 Why Audit Logs Are Critical in OpenShift Security

Credential abuse – attackers “log in” with stolen tokens and run standard API calls.
Privilege escalation – misconfigured RoleBindings silently lift permissions.
Secret harvesting / lateral movement – cross-namespace secret reads are visible only in audit metadata.
Policy drift – excessive SCC or PodSecurity exemptions surface first as anomalous admissions.

Audit logs are therefore the single authoritative source for reconstructing the who-did-what-when-and-where narrative and validating control effectiveness.

3 Audit Log Structure

Audit events are emitted as JSON objects. Under the Default profile, only metadata is captured; request bodies are omitted.

Field	Description	Example
`timestamp`	Time the API server received the request.	`2025-07-02T10:15:00Z`
`user.username`	User / service-account identity.	`[email protected]`
`user.groups`	Groups attached to the identity.	`["devs","users"]`
`sourceIPs`	Source IP address(es).	`["203.0.113.25"]`
`verb`	API verb (`get`, `create`, `delete`, …).	`get`
`objectRef.resource`	Target resource kind.	`secrets`
`objectRef.namespace`	Namespace (project).	`production`
`subresource`	Subresource acted on (e.g. `exec`).	`exec`
`requestObject`	Present only when profile ≥ WriteRequestBodies.	(omitted under Default)
`responseStatus.code`	Result of the request.	`200`
`annotations`	RBAC decision, PodSecurity/SCC match, etc.	`"authorization.k8s.io/decision":"allow"`
`requestReceivedTimestamp`	Server-side receipt timestamp.	`2025-07-02T10:15:00Z`

Sensitive-object safeguard: Request bodies for Secret, Route, and OAuthClient are never logged in any profile.

4 OpenShift Audit Policy Configuration

Profile	Behaviour
Default	Metadata for all verbs (low noise; production default).
WriteRequestBodies	Metadata + bodies for write verbs (`create`, `update`, `patch`).
AllRequestBodies	Metadata + bodies for all verbs (high volume; investigation clusters only).
None	Disables audit logging (not advised outside ephemeral test clusters).

Bodies for OAuth access-token requests are always excluded.

5 Practical Use Cases

5.1 Threat Detection

Use Case	Example Log Condition (metadata only unless noted)	MITRE ATT&CK Technique
ClusterRoleBinding → cluster-admin	`verb="create"` and `resource="clusterrolebindings"`request body needed for `roleRef`	T1098.006
Secret access by unknown principal	`verb="get"` and `resource="secrets"` and user ∉ allowlist	T1552.007
Pod exec by unauthorised user	`verb="create"` and `subresource="exec"` and user ∉ `allowed_exec_users`	T1059
Port-forward in sensitive namespace	`verb="create"` and `subresource="portforward"` and namespace ∈ (`prod`,`finance`)	T1572

5.2 Incident Investigation

Timeline reconstruction – ordered list of suspect actions.
Scope analysis – namespaces/resources touched.
Impact assessment – confirmation of data exposure or alteration.

5.3 Compliance & Monitoring

Audit logs supply immutable evidence for PCI DSS, GDPR, ISO 27001 and similar frameworks by demonstrating:

Controlled privileged activity.
Complete change-management history.
Continuous access-control enforcement.

6 Key Strategies for Using Audit Logs Effectively

a. Establish Baselines

Map normal API usage by identity, resource, time, and IP.
Catalogue legitimate secret-read patterns and SCC exceptions.

b. Tune Alerting & Detection

Tier 1 - Must Investigate

Cluster-admin or other high-privilege role changes.
Cross-namespace secret reads by non-infra accounts.
ClusterRoleBindings for service accounts outside ArgoCD scope.
API activity from unknown IP ranges or geolocations.

Tier 2 - Needs Triage

New ServiceAccounts in production namespaces.
RBAC changes by unauthorised actors.
Repeated 403 responses on sensitive resources.
Deployments initiated by unrecognised identities.

c. Integrate with MITRE ATT&CK

Annotate every rule with its ATT&CK technique.

d. Automate & Enrich Investigations

Forward logs to SIEM/SOAR, enrich with asset inventory, identity context, and threat intel.

7 Challenges and Best Practices

Challenge	Mitigation
Volume & Noise	Focused audit policies; metadata-only profile in production.
Retention & Integrity	WORM storage, integrity hashing, regulatory retention adherence.
Privacy	Avoid request-body profiles except when required; sensitive bodies never logged.
Continuous Tuning	Re-baseline thresholds after platform upgrades or organisational changes.

8 Summary

With correct profile selection, well-maintained baselines and tiered detection logic, OpenShift audit logs enable SOC teams to detect threats quickly, investigate incidents thoroughly, and satisfy compliance mandates confidently.

Appendix – Leveraging `sourceIPs` for Threat Detection

Scenario	Indicator example
Expected – automation	`sourceIPs=["10.44.0.25"]` (known CI runner)
Suspicious – external	`sourceIPs=["198.51.100.100"]` (public IP)
Policy violation	Human user origin IP in a GitOps-only production cluster

Best-Practice Checklist

Baseline identities, IPs, and workloads.
Continuously compare events against baselines.
Prioritise privilege escalation, secret access, interactive pod actions.
Correlate related events for lateral-movement insight.
Align detections with ATT&CK techniques.
Automate where feasible; retain manual review where needed.
Document findings and share across SOC / platform teams.
Re-baseline after significant environment changes.
Test detection logic via controlled attack simulations.