Using OpenShift Audit Logs for Threat Detection and Incident Response - ralvares/openshift-security-framework GitHub Wiki

Using OpenShift Audit Logs for Threat Detection and Incident Response


1 Overview

Audit logs in OpenShift are an authoritative, tamper-evident record of every API request processed by the control plane. Each entry captures who performed the action, what they did, when it occurred, where in the cluster it happened, and how the request was evaluated. Properly enabled and analysed, these logs form the backbone of threat detection, incident response, compliance evidence, and continuous-monitoring programmes in modern, cloud-native environments.


2 What Are Audit Logs?

2.0 Definition

Audit logs are structured JSON records generated by the OpenShift API server for selected requests initiated by users, service accounts, controllers, and system components.

2.0 Purpose

Discipline Value Proposition
Security Surface, triage, and investigate suspicious or unauthorised activity.
Compliance Provide defensible proof that access controls and change-management processes exist.
Operational Insight Reconstruct change history, troubleshoot misconfigurations, and measure platform usage.

2.1 Why Audit Logs Are Critical in OpenShift Security

  • Credential abuse – attackers β€œlog in” with stolen tokens and run standard API calls.
  • Privilege escalation – misconfigured RoleBindings silently lift permissions.
  • Secret harvesting / lateral movement – cross-namespace secret reads are visible only in audit metadata.
  • Policy drift – excessive SCC or PodSecurity exemptions surface first as anomalous admissions.

Audit logs are therefore the single authoritative source for reconstructing the who-did-what-when-and-where narrative and validating control effectiveness.


3 Audit Log Structure

Audit events are emitted as JSON objects. Under the Default profile, only metadata is captured; request bodies are omitted.

Field Description Example
timestamp Time the API server received the request. 2025-07-02T10:15:00Z
user.username User / service-account identity. [email protected]
user.groups Groups attached to the identity. ["devs","users"]
sourceIPs Source IP address(es). ["203.0.113.25"]
verb API verb (get, create, delete, …). get
objectRef.resource Target resource kind. secrets
objectRef.namespace Namespace (project). production
subresource Subresource acted on (e.g. exec). exec
requestObject Present only when profile β‰₯ WriteRequestBodies. (omitted under Default)
responseStatus.code Result of the request. 200
annotations RBAC decision, PodSecurity/SCC match, etc. "authorization.k8s.io/decision":"allow"
requestReceivedTimestamp Server-side receipt timestamp. 2025-07-02T10:15:00Z

Sensitive-object safeguard: Request bodies for Secret, Route, and OAuthClient are never logged in any profile.


4 OpenShift Audit Policy Configuration

Profile Behaviour
Default Metadata for all verbs (low noise; production default).
WriteRequestBodies Metadata + bodies for write verbs (create, update, patch).
AllRequestBodies Metadata + bodies for all verbs (high volume; investigation clusters only).
None Disables audit logging (not advised outside ephemeral test clusters).

Bodies for OAuth access-token requests are always excluded.


5 Practical Use Cases

5.1 Threat Detection

Use Case Example Log Condition (metadata only unless noted) MITRE ATT&CK Technique
ClusterRoleBinding β†’ cluster-admin verb="create" and resource="clusterrolebindings"request body needed for roleRef T1098.006
Secret access by unknown principal verb="get" and resource="secrets" and user βˆ‰ allowlist T1552.007
Pod exec by unauthorised user verb="create" and subresource="exec" and user βˆ‰ allowed_exec_users T1059
Port-forward in sensitive namespace verb="create" and subresource="portforward" and namespace ∈ (prod,finance) T1572

5.2 Incident Investigation

  • Timeline reconstruction – ordered list of suspect actions.
  • Scope analysis – namespaces/resources touched.
  • Impact assessment – confirmation of data exposure or alteration.

5.3 Compliance & Monitoring

Audit logs supply immutable evidence for PCI DSS, GDPR, ISO 27001 and similar frameworks by demonstrating:

  • Controlled privileged activity.
  • Complete change-management history.
  • Continuous access-control enforcement.

6 Key Strategies for Using Audit Logs Effectively

a. Establish Baselines

  • Map normal API usage by identity, resource, time, and IP.
  • Catalogue legitimate secret-read patterns and SCC exceptions.

b. Tune Alerting & Detection

Tier 1 - Must Investigate

  • Cluster-admin or other high-privilege role changes.
  • Cross-namespace secret reads by non-infra accounts.
  • ClusterRoleBindings for service accounts outside ArgoCD scope.
  • API activity from unknown IP ranges or geolocations.

Tier 2 - Needs Triage

  • New ServiceAccounts in production namespaces.
  • RBAC changes by unauthorised actors.
  • Repeated 403 responses on sensitive resources.
  • Deployments initiated by unrecognised identities.

c. Integrate with MITRE ATT&CK

Annotate every rule with its ATT&CK technique.

d. Automate & Enrich Investigations

Forward logs to SIEM/SOAR, enrich with asset inventory, identity context, and threat intel.


7 Challenges and Best Practices

Challenge Mitigation
Volume & Noise Focused audit policies; metadata-only profile in production.
Retention & Integrity WORM storage, integrity hashing, regulatory retention adherence.
Privacy Avoid request-body profiles except when required; sensitive bodies never logged.
Continuous Tuning Re-baseline thresholds after platform upgrades or organisational changes.

8 Summary

With correct profile selection, well-maintained baselines and tiered detection logic, OpenShift audit logs enable SOC teams to detect threats quickly, investigate incidents thoroughly, and satisfy compliance mandates confidently.


Appendix – Leveraging sourceIPs for Threat Detection

Scenario Indicator example
Expected – automation sourceIPs=["10.44.0.25"] (known CI runner)
Suspicious – external sourceIPs=["198.51.100.100"] (public IP)
Policy violation Human user origin IP in a GitOps-only production cluster

Best-Practice Checklist

  1. Baseline identities, IPs, and workloads.
  2. Continuously compare events against baselines.
  3. Prioritise privilege escalation, secret access, interactive pod actions.
  4. Correlate related events for lateral-movement insight.
  5. Align detections with ATT&CK techniques.
  6. Automate where feasible; retain manual review where needed.
  7. Document findings and share across SOC / platform teams.
  8. Re-baseline after significant environment changes.
  9. Test detection logic via controlled attack simulations.