A Guide To Kubernetes Logs That Isn't A Vendor Pitch
A guide to logging at each cluster layer with a focus on AuditPolicy.
Published: June 1, 2024
Reading Time: 27 minutes
One of the frustrating aspects of researching topics in the Kubernetes/cloud-native world is having to trek through the vast sea of SEO-optimized articles that are nothing more than rehashed vendor marketing of the Kubernetes documentation thinly veiled as a technical guide. It sometimes reminds me of walking through the vendor floor at Blackhat. I get it, and I’m sure many of the products are great, but sometimes I just want to understand a concept, not pay someone to understand it for me.
Kubernetes logging is not entirely straightforward (are logs ever straightforward?). In this post I’ll discuss logging at each of the “layers” of a Kubernetes cluster and why you should probably spend some time looking into and tuning AuditPolicy if you’re attempting to collect logs from a Kubernetes cluster.
I work in offensive security so the lens that I see logging through is definitely bias. I don’t want to look at logs all day, but part of being a good red teamer is understanding what a full attack can look like from a log perspective (and how to not show up in the logs). In a perfect world, we would just log everything that even thinks about touching our cluster. If you go down this route, you’ll quickly realize that the amount of logs generated by each layer of a cluster is absurd.
Furthemore, your time could be better spent collecting logs elsewhere such as netflow. Even if you could store all the logs a cluster generates, you still have to parse through them to make them useful, and at a certain point, having too many logs gives an attacker more hay to hide their needle in.
What does logging mean in the Kubernetes world?
Despite there being some overlap, I feel there is a need to separate logs into two categories: logs that are helpful for debugging, which I will be referring to as debug logs, and logs that are useful for security which I will refer to as security logs.
- Debug Logs: Logs that are helpful to investigate if something isn’t working properly during setup. These answer the “Why?” behind an issue. IE: Why is this server crashing? Oh, it’s CPU is at 100%.
- Security Logs: Logs that are helpful to investigate during a security incident. These answer: Who? What? When? Where? IE: Graham created a pod called
PWND
on the control node on Friday at 4:00 pm.
Introducing: The Beefy 4 Layer Kuburrito
Thinking about Kubernetes in layers greatly simplifies how to think about a problem. The “4Cs of cloud-native security” is the model I use for this:
- Code Security: Is the code deployed into a pod secure? Is it vulnerable to SQL injection, command injection, or any other type of vulnerability in the OWASP Top 10?
- Container Security: Is the container you’re launching hosting your application trusted? Where did you get the image from? Is the container running as root?
- Cluster Security: Is your cluster configured with the principles of least privilege in mind? Is RBAC in use? Are secrets being stored appropriately?
- Cloud Security: Is the infrastructure hosting the cluster secure? Have the nodes been patched? Are they running SSH with a default password? Is access to the API server restricted?
Using this model, we can separate the many different types of logs that can be gathered from a Kubernetes cluster into each of these buckets.
- Code Logging: Logging that is done at the application level (IE: Graham made a GET request to a webserver server).
- Container Logging: Logs that are produced about the container running an application. (IE: Container X is pulling Image Y)
- Cluster Logging: Logs at the Kubernetes cluster layer (and its components). (IE: Service A issued a
GET
action forsecret super_secret
) - Cloud Level Logging: Logging at the cloud provider level or logs for a managed Kubernetes cluster. (IE: Graham logged into to the management interface at 2am)
So what do logs look like at each of these layers? What generates them? Where do we collect them? Should we even collect them?
Code Level Logging
Getting logs from the applications running in a Kubernetes pod is a bit more difficult than collecting logs from an application running inside a VM for a few reasons. The first issue we run into is pods are ephemeral: when they are deleted, recreated, or removed by the cluster, the logs inside of them are deleted. If an attacker exploited a web server and then the pod crashed, we wouldn’t have a way to see the logs.
Exec into a pod
Probably the most unhinged way you can inspect logs in a container is by execing into the pod and looking for the logs manually by running kubectl exec -it <pod_name> -- bash
. This is probably not something you should ever do unless you’re really in the weeds with troubleshooting or you’re just tinkering. Doing this in a production cluster is almost always a terrible idea.
kubectl logs
The standard way of viewing logs from a Kubernetes Pod is to run kubectl logs <pod_name>
. This will display STDOUT
and STDERR
for the application running in your pod (assuming it’s configured to output text to these file descriptors).
This is great for debug logs, but not great for security logs as they’re not collected in a SIEM. Additionally, there is no way for us to inspect logs that are not written to STDOUT
or STDERR
. What if we want to view logs from a file such as /var/log/syslog
? Utilizing kubectl logs
doesn’t allow us to do so.
Sidecar Containers
Sidecar containers are another way we can grab logs from a container running inside a pod. Remember, each Pod can have one or more containers inside of it which we can define in the Pod manifest as follows:
1# Modified from https://www.airplane.dev/blog/kubernetes-sidecar-container (rip)
2apiVersion: v1
3kind: Pod
4metadata:
5 name: simple-webapp
6 labels:
7 app: webapp
8spec:
9 containers:
10 # Define the main application, nginx
11 - name: main-application
12 image: nginx
13 # Mount /var/log/nginx
14 volumeMounts:
15 - name: shared-logs
16 mountPath: /var/log/nginx
17 # Create a second container inside the pod
18 - name: sidecar-container
19 image: busybox
20 # Read /var/log/nginx/access.log to STDOUT every 30 seconds
21 command: ["sh","-c","while true; do cat /var/log/nginx/access.log; sleep 30; done"]
22 volumeMounts:
23 - name: shared-logs
24 mountPath: /var/log/nginx
25 volumes:
26 - name: shared-logs
27 emptyDir: {}
In this example, we are creating a Pod manifest that creates both an Nginx application, as well as a sidecar that simply reads the /var/log/nginx/access.log
file to STDOUT
. Note that this is just a demonstration, if you want to collect the logs from this container, you would perform some other operation (such as collecting the logs and sending them to a SIEM).
You’ll notice that a sidecar can view the /var/log/nginx/access.log
file because we’ve set up volumeMounts
in our Pod manifests that allow for those resources to be accessed.
This is a much more robust solution to collecting logs from a pod and it’s actually what many vendor products do to collect logs from your Pods. The power of sidecar containers lies in the fact that you don’t have to modify your application to make changes as long as application produces logs somewhere, you can collect them with a sidecar.
Container Level Logging
Logs from the container layer of a cluster are generally debug logs. This means that while they can be used when investigating security incidents, they’re probably not the first place you should look unless you have a very specific reason.
Container Runtime
A container runtime is responsible for (among other things) running the container on the node of a Kubernetes cluster. Some popular container runtimes include docker, containerd, CRI-O, etc. Depending on which container runtime you’re working with, the logs for these may contain different information. You can identify which container runtime your nodes are using by running kubectl get nodes -o wide
Collecting logs at the container level generally means collecting logs from the container runtime. These are typically stored in /var/log/pods/*
and are often symlinked in /var/log/containers/*
Taking a look at the kube-apiserver
logs, they’re not immediately helpful from a security perspective. The general rule of thumb I have for determining if a log is a debug log or a security log is asking myself “If I were a SOC analyst and I saw this log, would I know what is being communicated?”. In this case, the answer is clear: “Absolutely not”:
12024-05-31T19:05:51.584763355Z stderr F I0531 19:05:51.584688 1 trace.go:236] Trace[1866683030]: "Update" accept:application/vnd.kubernetes.protobuf, */*,audit-id:c033a34d-3b3a-4b2c-a871-ec998630828d,client:192.168.1.201,
2<snip_for_brevity>
3user-agent:kube-controller-manager/v1.30.1 (linux/amd64) kubernetes/6911225/leader-election,verb:PUT (31-May-2024 19:05:51.046) (total time: 537ms):
Cluster Level Logging
Logging at the cluster level is where things get a little freaky. Logging at the cluster level means collecting information about events from the orchestration components themselves. In this case, we’re talking about things like the Kubelet and API server.
Kubelet Logs
Logs from the Kubelet display information on what actions the Kubelet is taking. If you remember from Kubernetes 101, the kubelet is a process that runs on each node in a cluster that is responsible for taking requests from the scheduler and launching containers on the node. It also periodically reports the status of the Pods running on the node to the API Server.
1Jun 02 00:39:08 kubecontrol kubelet[43672]: I0602 00:39:08.105506 43672 reconciler_common.go:247] "operationExecutor.VerifyController AttachedVolume started for volume \"etc-pki\" (UniqueName: \"kubernetes.io/host-path/ab9e569b3b5df9381f0b4449875b2fa0-etc-pki\") pod \"kube-apiserver-kubecontrol\" (UID: \"ab9e569b3b5df9381f0b4449875b2fa0\") " pod="kube-system/kube-apiserver-kubecontrol"
These can be useful for both debugging and for security purposes, however, I wouldn’t make them the first thing you collect from a security perspective as they contain a lot of Jargon that a SOC analyst probably won’t understand without sitting with a Kubernetes engineer. If we’re curious about information pertaining to the API server, there is a far better way of gathering that information: AuditPolicy.
AuditPolicy
Kubernetes AuditPolicy is probably what you’re looking for if you’re attempting to collect logs from a Kubernetes cluster to send to a SIEM, but be forewarned, it’s a little more complicated than simply turning it on and pointing it at your SIEM of choice.
AuditPolicy logs (which I’ll be referring to as audit logs) are generated by the API server when traffic traverses it. If you remember from Kubernetes 101, all requests must traverse the API server, making this a great place to collect security logs. According to the Kubernetes documentation:
Auditing allows cluster administrators to answer the following questions:
- what happened?
- when did it happen??
- who initiated it??
- on what did it happen??
- where was it observed??
- from where was it initiated??
- to where was it going?
That’s quite a bit of information. It’s probably TOO much information to collect. For context, if you decide to collect all of these logs in a production cluster, you’re looking at potentially hundreds or even thousands of gigabytes of logs per day.
There are 4 different levels of log data AuditPolicy allows us to collect.
- None: Doesn’t log anything (obviously…)
- Metadata: Logs metadata.
- Request: Logs event metadata and the body of the request sent to the API server but does not record the body of the response from the API server
- RequestResponse: Logs the event metadata, the request posted to the API server, AND the response of the API server.
Each different level adds additional log data to the previous level. For example, requests logged at the RequestResponse level also include log information from the Request and Metadata levels. Below I’ll show some example of each logging level. Beware, there are a lot of logs in the next section!
Metadata
Sample log when only collecting at the Metadata level:
1//
2// Metadata Information
3//
4{
5 "kind": "Event",
6 "apiVersion": "audit.k8s.io/v1",
7 "level": "Metadata",
8 "auditID": "18190867-edaa-48a4-95c5-6935576a9939",
9 "stage": "RequestReceived",
10 "requestURI": "/api/v1/namespaces/default/pods?fieldManager=kubectl-client-side-apply&fieldValidation=Strict",
11 "verb": "create",
12 "user": {
13 "username": "kubernetes-admin",
14 "groups": [
15 "kubeadm:cluster-admins",
16 "system:authenticated"
17 ]
18 },
19 "sourceIPs": [
20 "192.168.1.167"
21 ],
22 // Want something fun to look into? What userAgent do other attack tools use?
23 "userAgent": "kubectl/v1.28.9 (linux/amd64) kubernetes/587f5fe",
24 "objectRef": {
25 "resource": "pods",
26 "namespace": "default",
27 "apiVersion": "v1"
28 },
29 "requestReceivedTimestamp": "2024-05-31T20:30:27.956279Z",
30 "stageTimestamp": "2024-05-31T20:30:27.956279Z"
31}
32{
33 "kind": "Event",
34 "apiVersion": "audit.k8s.io/v1",
35 "level": "Metadata",
36 "auditID": "18190867-edaa-48a4-95c5-6935576a9939",
37 "stage": "ResponseComplete",
38 "requestURI": "/api/v1/namespaces/default/pods?fieldManager=kubectl-client-side-apply&fieldValidation=Strict",
39 "verb": "create",
40 "user": {
41 "username": "kubernetes-admin",
42 "groups": [
43 "kubeadm:cluster-admins",
44 "system:authenticated"
45 ]
46 },
47 "sourceIPs": [
48 "192.168.1.167"
49 ],
50 "userAgent": "kubectl/v1.28.9 (linux/amd64) kubernetes/587f5fe",
51 "objectRef": {
52 "resource": "pods",
53 "namespace": "default",
54 "name": "priv-pod",
55 "apiVersion": "v1"
56 },
57 "responseStatus": {
58 "metadata": {},
59 "code": 201
60 },
61 "requestReceivedTimestamp": "2024-05-31T20:30:27.956279Z",
62 "stageTimestamp": "2024-05-31T20:30:27.982521Z",
63 "annotations": {
64 "authorization.k8s.io/decision": "allow",
65 "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"kubeadm:cluster-admins\" of ClusterRole \"cluster-admin\" to Group \"kubeadm:cluster-admins\"",
66 "pod-security.kubernetes.io/enforce-policy": "privileged:latest"
67 }
68}
Request
Sample log when collecting at the Request level:
1//
2// Metadata Information
3//
4{
5 "kind": "Event",
6 "apiVersion": "audit.k8s.io/v1",
7 "level": "Request",
8 "auditID": "5fd8a404-29b3-4518-93d8-e77135a426fa",
9 "stage": "RequestReceived",
10 "requestURI": "/api/v1/namespaces/default/pods?fieldManager=kubectl-client-side-apply&fieldValidation=Strict",
11 "verb": "create",
12 "user": {
13 "username": "kubernetes-admin",
14 "groups": [
15 "kubeadm:cluster-admins",
16 "system:authenticated"
17 ]
18 },
19 "sourceIPs": [
20 "192.168.1.167"
21 ],
22 "userAgent": "kubectl/v1.28.9 (linux/amd64) kubernetes/587f5fe",
23 "objectRef": {
24 "resource": "pods",
25 "namespace": "default",
26 "apiVersion": "v1"
27 },
28 "requestReceivedTimestamp": "2024-05-31T20:34:55.398698Z",
29 "stageTimestamp": "2024-05-31T20:34:55.398698Z"
30}
31{
32 "kind": "Event",
33 "apiVersion": "audit.k8s.io/v1",
34 "level": "Request",
35 "auditID": "5fd8a404-29b3-4518-93d8-e77135a426fa",
36 "stage": "ResponseComplete",
37 "requestURI": "/api/v1/namespaces/default/pods?fieldManager=kubectl-client-side-apply&fieldValidation=Strict",
38 "verb": "create",
39 "user": {
40 "username": "kubernetes-admin",
41 "groups": [
42 "kubeadm:cluster-admins",
43 "system:authenticated"
44 ]
45 },
46 "sourceIPs": [
47 "192.168.1.167"
48 ],
49 "userAgent": "kubectl/v1.28.9 (linux/amd64) kubernetes/587f5fe",
50 "objectRef": {
51 "resource": "pods",
52 "namespace": "default",
53 "name": "priv-pod",
54 "apiVersion": "v1"
55 },
56 "responseStatus": {
57 "metadata": {},
58 "code": 201
59 },
60//
61// Request Information
62//
63 "requestObject": {
64 "kind": "Pod",
65 "apiVersion": "v1",
66 "metadata": {
67 "name": "priv-pod",
68 "namespace": "default",
69 "creationTimestamp": null,
70 "annotations": {
71 "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"metadata\":{\"annotations\":{},\"name\":\"priv-pod\",\"namespace\":\"default\"},\"spec\":{\"containers\":[{\"image\":\"nginx\",\"name\":\"priv-pod\",\"securityContext\":{\"privileged\":true}}],\"hostNetwork\":true}}\n"
72 }
73 },
74 "spec": {
75 "containers": [
76 {
77 "name": "priv-pod",
78 "image": "nginx",
79 "resources": {},
80 "terminationMessagePath": "/dev/termination-log",
81 "terminationMessagePolicy": "File",
82 "imagePullPolicy": "Always",
83 "securityContext": {
84 "privileged": true
85 }
86 }
87 ],
88 "restartPolicy": "Always",
89 "terminationGracePeriodSeconds": 30,
90 "dnsPolicy": "ClusterFirst",
91 "hostNetwork": true,
92 "securityContext": {},
93 "schedulerName": "default-scheduler",
94 "enableServiceLinks": true
95 },
96 "status": {}
97 },
98 "requestReceivedTimestamp": "2024-05-31T20:34:55.398698Z",
99 "stageTimestamp": "2024-05-31T20:34:55.415504Z",
100//
101// END Request Information
102//
103 "annotations": {
104 "authorization.k8s.io/decision": "allow",
105 "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"kubeadm:cluster-admins\" of ClusterRole \"cluster-admin\" to Group \"kubeadm:cluster-admins\"",
106 "pod-security.kubernetes.io/enforce-policy": "privileged:latest"
107 }
108}
RequestResponse
Sample log when collecting information at the RequestResponse level:
1//
2// Metadata Information
3//
4{
5 "kind": "Event",
6 "apiVersion": "audit.k8s.io/v1",
7 "level": "RequestResponse",
8 "auditID": "1bc5391b-4896-4ac4-a919-34c7d869fbb7",
9 "stage": "RequestReceived",
10 "requestURI": "/api/v1/namespaces/default/pods?fieldManager=kubectl-client-side-apply&fieldValidation=Strict",
11 "verb": "create",
12 "user": {
13 "username": "kubernetes-admin",
14 "groups": [
15 "kubeadm:cluster-admins",
16 "system:authenticated"
17 ]
18 },
19 "sourceIPs": [
20 "192.168.1.167"
21 ],
22 "userAgent": "kubectl/v1.28.9 (linux/amd64) kubernetes/587f5fe",
23 "objectRef": {
24 "resource": "pods",
25 "namespace": "default",
26 "apiVersion": "v1"
27 },
28 "requestReceivedTimestamp": "2024-05-31T20:49:52.847213Z",
29 "stageTimestamp": "2024-05-31T20:49:52.847213Z"
30}
31{
32 "kind": "Event",
33 "apiVersion": "audit.k8s.io/v1",
34 "level": "RequestResponse",
35 "auditID": "1bc5391b-4896-4ac4-a919-34c7d869fbb7",
36 "stage": "ResponseComplete",
37 "requestURI": "/api/v1/namespaces/default/pods?fieldManager=kubectl-client-side-apply&fieldValidation=Strict",
38 "verb": "create",
39 "user": {
40 "username": "kubernetes-admin",
41 "groups": [
42 "kubeadm:cluster-admins",
43 "system:authenticated"
44 ]
45 },
46 "sourceIPs": [
47 "192.168.1.167"
48 ],
49 "userAgent": "kubectl/v1.28.9 (linux/amd64) kubernetes/587f5fe",
50 "objectRef": {
51 "resource": "pods",
52 "namespace": "default",
53 "name": "priv-pod",
54 "apiVersion": "v1"
55 },
56 "responseStatus": {
57 "metadata": {},
58 "code": 201
59 },
60//
61// Request Information
62//
63 "requestObject": {
64 "kind": "Pod",
65 "apiVersion": "v1",
66 "metadata": {
67 "name": "priv-pod",
68 "namespace": "default",
69 "creationTimestamp": null,
70 "annotations": {
71 "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"metadata\":{\"annotations\":{},\"name\":\"priv-pod\",\"namespace\":\"default\"},\"spec\":{\"containers\":[{\"image\":\"nginx\",\"name\":\"priv-pod\",\"securityContext\":{\"privileged\":true}}],\"hostNetwork\":true}}\n"
72 }
73 },
74 "spec": {
75 "containers": [
76 {
77 "name": "priv-pod",
78 "image": "nginx",
79 "resources": {},
80 "terminationMessagePath": "/dev/termination-log",
81 "terminationMessagePolicy": "File",
82 "imagePullPolicy": "Always",
83 "securityContext": {
84 "privileged": true
85 }
86 }
87 ],
88 "restartPolicy": "Always",
89 "terminationGracePeriodSeconds": 30,
90 "dnsPolicy": "ClusterFirst",
91 "hostNetwork": true,
92 "securityContext": {},
93 "schedulerName": "default-scheduler",
94 "enableServiceLinks": true
95 },
96 "status": {}
97 },
98//
99// RequestResponse Information
100//
101 "responseObject": {
102 "kind": "Pod",
103 "apiVersion": "v1",
104 "metadata": {
105 "name": "priv-pod",
106 "namespace": "default",
107 "uid": "34946cec-ef89-470b-9496-da357d082966",
108 "resourceVersion": "358120",
109 "creationTimestamp": "2024-05-31T20:49:52Z",
110 "annotations": {
111 "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Pod\",\"metadata\":{\"annotations\":{},\"name\":\"priv-pod\",\"namespace\":\"default\"},\"spec\":{\"containers\":[{\"image\":\"nginx\",\"name\":\"priv-pod\",\"securityContext\":{\"privileged\":true}}],\"hostNetwork\":true}}\n"
112 },
113 "managedFields": [
114 {
115 "manager": "kubectl-client-side-apply",
116 "operation": "Update",
117 "apiVersion": "v1",
118 "time": "2024-05-31T20:49:52Z",
119 "fieldsType": "FieldsV1",
120 "fieldsV1": {
121 "f:metadata": {
122 "f:annotations": {
123 ".": {},
124 "f:kubectl.kubernetes.io/last-applied-configuration": {}
125 }
126 },
127 "f:spec": {
128 "f:containers": {
129 "k:{\"name\":\"priv-pod\"}": {
130 ".": {},
131 "f:image": {},
132 "f:imagePullPolicy": {},
133 "f:name": {},
134 "f:resources": {},
135 "f:securityContext": {
136 ".": {},
137 "f:privileged": {}
138 },
139 "f:terminationMessagePath": {},
140 "f:terminationMessagePolicy": {}
141 }
142 },
143 "f:dnsPolicy": {},
144 "f:enableServiceLinks": {},
145 "f:hostNetwork": {},
146 "f:restartPolicy": {},
147 "f:schedulerName": {},
148 "f:securityContext": {},
149 "f:terminationGracePeriodSeconds": {}
150 }
151 }
152 }
153 ]
154 },
155 "spec": {
156 "volumes": [
157 {
158 "name": "kube-api-access-hnmnl",
159 "projected": {
160 "sources": [
161 {
162 "serviceAccountToken": {
163 "expirationSeconds": 3607,
164 "path": "token"
165 }
166 },
167 {
168 "configMap": {
169 "name": "kube-root-ca.crt",
170 "items": [
171 {
172 "key": "ca.crt",
173 "path": "ca.crt"
174 }
175 ]
176 }
177 },
178 {
179 "downwardAPI": {
180 "items": [
181 {
182 "path": "namespace",
183 "fieldRef": {
184 "apiVersion": "v1",
185 "fieldPath": "metadata.namespace"
186 }
187 }
188 ]
189 }
190 }
191 ],
192 "defaultMode": 420
193 }
194 }
195 ],
196 "containers": [
197 {
198 "name": "priv-pod",
199 "image": "nginx",
200 "resources": {},
201 "volumeMounts": [
202 {
203 "name": "kube-api-access-hnmnl",
204 "readOnly": true,
205 "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
206 }
207 ],
208 "terminationMessagePath": "/dev/termination-log",
209 "terminationMessagePolicy": "File",
210 "imagePullPolicy": "Always",
211 "securityContext": {
212 "privileged": true
213 }
214 }
215 ],
216 "restartPolicy": "Always",
217 "terminationGracePeriodSeconds": 30,
218 "dnsPolicy": "ClusterFirst",
219 "serviceAccountName": "default",
220 "serviceAccount": "default",
221 "hostNetwork": true,
222 "securityContext": {},
223 "schedulerName": "default-scheduler",
224 "tolerations": [
225 {
226 "key": "node.kubernetes.io/not-ready",
227 "operator": "Exists",
228 "effect": "NoExecute",
229 "tolerationSeconds": 300
230 },
231 {
232 "key": "node.kubernetes.io/unreachable",
233 "operator": "Exists",
234 "effect": "NoExecute",
235 "tolerationSeconds": 300
236 }
237 ],
238 "priority": 0,
239 "enableServiceLinks": true,
240 "preemptionPolicy": "PreemptLowerPriority"
241 },
242 "status": {
243 "phase": "Pending",
244 "qosClass": "BestEffort"
245 }
246 },
247 "requestReceivedTimestamp": "2024-05-31T20:49:52.847213Z",
248 "stageTimestamp": "2024-05-31T20:49:52.866921Z",
249//
250// END RequestResponse Information
251//
252 "annotations": {
253 "authorization.k8s.io/decision": "allow",
254 "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"kubeadm:cluster-admins\" of ClusterRole \"cluster-admin\" to Group \"kubeadm:cluster-admins\"",
255 "pod-security.kubernetes.io/enforce-policy": "privileged:latest"
256 }
257}
Wonderful. Whats the big deal, can’t we just log everything? Not quite. Due to the sheer amount of logs the AuditPolicy can generate in a real cluster, you can have WAY more logs than you’ll ever be able to parse (and probably store…). The best way to configure audit policy is to tune it for what you’re looking for.
Are you very worried that someone is going to create a privileged pod but for some reason you can’t deploy an admission controller to stop them? AuditPolicy can at least give you the logs needed to alert on such behavior. It’s only as powerful as you make it through configuring the AuditPolicy rules.
How to configure AuditPolicy
Unfortunately configuring a AuditPolicy is a little more complicated to setup than simply running kubectl apply -f auditpolicy.yaml
like you can with most kubernetes resources.
At this point, you should choose your audit backend. There are two that you can choose from: - Webhooks: Logs are sent to an external server - Log: Logs are written to a user defined directory on the node
In this example we’ll be simply writing the logs to a a directory on the node. These can later be collected and sent to a SIEM with a logging agent.
The first thing we need to do is create a simple AuditPolicy manifest to get started with. The following policy is what I used in the above examples to demonstrate metadata logging for pod creation. It’s fairly simple, it just looks for pod creation events and drops all other logs.
1# AuditPolicy.yaml located in /etc/kubernetes/audit/
2apiVersion: audit.k8s.io/v1
3kind: Policy
4rules:
5
6- level: Metadata
7 # Look for creation events...
8 verbs: ["create"]
9 resources:
10 - group: ""
11 # For these resources
12 resources: ["pods", "pods/status"]
13
14# Do not log anything else
15- level: None
The AuditPolicy.yaml
needs to be placed on the node in a location accessible by the kube-apiserver
pod. The tricky part about this is that for AuditPolicy.yaml
to be visible to the kube-apiserver
Pod, we need to do a few things.
- Define a few new flags in our
/etc/kubernetes/manifests/kube-apiserver.yaml
manifest (this is what defines the parameters for our API server, which is launched as a static pod), specifically:
1# Define the file the audit policy should be read from
2- --audit-policy-file=/etc/kubernetes/audit/policy.yaml
3# Define where to log the files to
4- --audit-log-path=/etc/kubernetes/audit/audit.log
5# Define the max size (in MB) of the audit log.
6# For a production cluster you're gonna need way more than 500MB
7- --audit-log-maxsize=500
8# Define how many times the log will rotate before being overwritten
9- --audit-log-maxbackup=3
- We need to create and mount a volume into our pod. Remember how I said that the
AuditPolicy.yaml
needs to accessible to thekube-apiserver
? Well thekube-apiserver
runs as a pod which means (by default), it cannot access anything on the Node. This presents a problem. We need it to both read the/etc/kubernetes/audit/AuditPolicy.yaml
and have the ability to write the log files to the node’s file system, otherwise, if thekube-apiserver
pod died or otherwise got recreated, we would lose our logs. Creating a VolumeMount and hostPath mount is fairly straightforward. Add the following lines to/etc/kubernetes/manifests/kube-apiserver.yaml
1# Place under the volumes section of /etc/kubernetes/manifests/kube-apiserver.yaml
2- mountPath: /etc/kubernetes/audit
3 name: audit
4
5# Place under the volumeMounts section of /etc/kubernetes/manifests/kube-apiserver.yaml
6- hostPath:
7 path: /etc/kubernetes/audit
8 type: DirectoryOrCreate
9 name: audit
Note: You should be as specific as possible with your mounts. You should NOT simply mount the entire filesystem or mount
/etc/kubernetes/
. For more info see the kubenomicon.
Your /etc/kubernetes/manifests/kube-apiserver.yaml
should now look roughly akin to this:
1# /etc/kubernetes/manifests/kube-apiserver.yaml with AuditPolicy Configured
2# To log to /etc/kubernetes/audit/audit.log
3apiVersion: v1
4kind: Pod
5metadata:
6 annotations:
7 kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.1.201:6443
8 creationTimestamp: null
9 labels:
10 component: kube-apiserver
11 tier: control-plane
12 name: kube-apiserver
13 namespace: kube-system
14spec:
15 containers:
16 - command:
17 - kube-apiserver
18 - --advertise-address=192.168.1.201
19 - --allow-privileged=true
20 - --authorization-mode=Node,RBAC
21 - --client-ca-file=/etc/kubernetes/pki/ca.crt
22 - --enable-admission-plugins=NodeRestriction
23 - --enable-bootstrap-token-auth=true
24 - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
25 - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
26 - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
27 - --etcd-servers=https://127.0.0.1:2379
28 - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
29 - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
30 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
31 - --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
32 - --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
33 - --requestheader-allowed-names=front-proxy-client
34 - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
35 - --requestheader-extra-headers-prefix=X-Remote-Extra-
36 - --requestheader-group-headers=X-Remote-Group
37 - --requestheader-username-headers=X-Remote-User
38 - --secure-port=6443
39 - --service-account-issuer=https://kubernetes.default.svc.cluster.local
40 - --service-account-key-file=/etc/kubernetes/pki/sa.pub
41 - --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
42 - --service-cluster-ip-range=10.96.0.0/12
43 - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
44 - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
45 # Notice the new audit flags below
46 - --audit-policy-file=/etc/kubernetes/audit/policy.yaml
47 - --audit-log-path=/etc/kubernetes/audit/audit.log
48 - --audit-log-maxsize=500
49 - --audit-log-maxbackup=3
50 image: registry.k8s.io/kube-apiserver:v1.30.1
51 imagePullPolicy: IfNotPresent
52 livenessProbe:
53 failureThreshold: 8
54 httpGet:
55 host: 192.168.1.201
56 path: /livez
57 port: 6443
58 scheme: HTTPS
59 initialDelaySeconds: 10
60 periodSeconds: 10
61 timeoutSeconds: 15
62 name: kube-apiserver
63 readinessProbe:
64 failureThreshold: 3
65 httpGet:
66 host: 192.168.1.201
67 path: /readyz
68 port: 6443
69 scheme: HTTPS
70 periodSeconds: 1
71 timeoutSeconds: 15
72 resources:
73 requests:
74 cpu: 250m
75 startupProbe:
76 failureThreshold: 24
77 httpGet:
78 host: 192.168.1.201
79 path: /livez
80 port: 6443
81 scheme: HTTPS
82 initialDelaySeconds: 10
83 periodSeconds: 10
84 timeoutSeconds: 15
85 volumeMounts:
86 # Notice the new volumeMounts info we added
87 - mountPath: /etc/kubernetes/audit
88 name: audit
89 - mountPath: /etc/ssl/certs
90 name: ca-certs
91 readOnly: true
92 - mountPath: /etc/ca-certificates
93 name: etc-ca-certificates
94 readOnly: true
95 - mountPath: /etc/pki
96 name: etc-pki
97 readOnly: true
98 - mountPath: /etc/kubernetes/pki
99 name: k8s-certs
100 readOnly: true
101 - mountPath: /usr/local/share/ca-certificates
102 name: usr-local-share-ca-certificates
103 readOnly: true
104 - mountPath: /usr/share/ca-certificates
105 name: usr-share-ca-certificates
106 readOnly: true
107 hostNetwork: true
108 priority: 2000001000
109 priorityClassName: system-node-critical
110 securityContext:
111 seccompProfile:
112 type: RuntimeDefault
113 volumes:
114 # Notice the new hostPath mounts
115 - hostPath:
116 path: /etc/kubernetes/audit
117 type: DirectoryOrCreate
118 name: audit
119 - hostPath:
120 path: /etc/ssl/certs
121 type: DirectoryOrCreate
122 name: ca-certs
123 - hostPath:
124 path: /etc/ca-certificates
125 type: DirectoryOrCreate
126 name: etc-ca-certificates
127 - hostPath:
128 path: /etc/pki
129 type: DirectoryOrCreate
130 name: etc-pki
131 - hostPath:
132 path: /etc/kubernetes/pki
133 type: DirectoryOrCreate
134 name: k8s-certs
135 - hostPath:
136 path: /usr/local/share/ca-certificates
137 type: DirectoryOrCreate
138 name: usr-local-share-ca-certificates
139 - hostPath:
140 path: /usr/share/ca-certificates
141 type: DirectoryOrCreate
142 name: usr-share-ca-certificates
143status: {}
Now we need to configure the actual audit policy. The AuditPolicy I was using in the above examples looks like this.
1# AuditPolicy.yaml located in /etc/kubernetes/audit/
2apiVersion: audit.k8s.io/v1
3kind: Policy
4rules:
5
6- level: Metadata
7 verbs: ["create"]
8 resources:
9 - group: ""
10 resources: ["pods", "pods/status"]
11
12- level: None
The only rule this policy contains is a pod creation event (which is something you should be auditing on, as many privilege escalation techniques require an attacker to create a pod). Notice the rules work similar to a firewall, they’re evaluated from the top down and the final rule is saying “Log all other requests not specified at the level None
” Which is saying don’t log them.
Ready for the dumb part? The cluster I’m working with (create using kubeadm and running containerd), there is not a great way to tell the API server to respect our new configuration parameters. It would be nice if there was a command to restart the API server. Unfortunately deleting the pod and waiting for it to be recreated and running touch /etc/kubernetes/manifests/kube-apiserver.yaml
doesn’t seem to always get the new configuration applied.
The only reliable way I’ve found to get the kube-apiserver
pod to be recreated with the updated configuration options is to run mv kube-apiserver.yaml /tmp
from /etc/kubernetes/manifests/
, waiting a few seconds and moving it back with mv /tmp/kube-apiserver.yaml .
If you know of a better way of doing this, please let me know!
Anyway. Your /etc/kubernetes
directory should now contain the following information.
Let’s test out our audit policy to ensure we can catch a pod creation event by creating a new pod. To do so, we can tail our log file on the Node and pipe it to jq
so the formatting is a little easier to read: tail -f audit.log | jq
and then run kubectl apply -f <pod_manifest>.yaml
or kubectl run policytest --image=nginx
. We immediately see that our log file has been populated with details about the pod creation.
If we take a closer look at this log, we can see that two different stages are being captured. We can verify this by running cat audit.log | jq '.stage'
Four different stages can be recorded:
- RequestReceived: Generated as soon as the API server recieves the request
- ResponseStarted: Generated for repsonses like
watch
that may take some time - ResponseComplete: Generated when the response body has been sent.
- Panic: Events generated when something goes wrong
This is great, but if we’re tuning our logs, we might only be interested in seeing if the API server responded with RepsonseComplete, indicating something happened. Additionally, we also want to update our AuditPolicy to indicate when someone gets secrets from the cluster. Lets modify our policy.yaml
file to reflect that:
🚨 Note that we’ll also be introducing a gap in detection that we’ll discuss later, see if you can spot it
1kind: Policy
2# Here we're explictly saying don't log when the API server recieves the request
3omitStages:
4 - "RequestReceived"
5rules:
6
7- level: Metadata
8 verbs: ["create"]
9 resources:
10 - group: ""
11 resources: ["pods", "pods/status"]
12
13
14# Here we're logging activity associated with running something like `kubectl get secret secret123`
15- level: Metadata
16 verbs: ["get"]
17 resources:
18 - group: ""
19 resources: ["secrets"]
20
21- level: None
Now, when we create a pod we will only see the log for it when at the ServerComplete stage, and not both stages, reducing the amount of logs by half. (Note that there are still reasons you might want both, but if you’re struggling with log volume, this may make things a bit more bearable)
We can also see that our log shows get requests for secrets if someone were to run kubectl get secret <secret_name>
, amazing, right?
Unfortunately, there is a large detection gap here that is VERY easy to overlook. Remember how in our AuditPolicy file we specified that we wanted to log any requests with the get
verb for the secret
resource? In Kubernetes, there is this weird quirk: running the command kubectl get secret <secret_name>
is indeed a get
verb as you would expect, however, running the command kubectl get secrets
is NOT technically a get
action, it’s a list
action because it’s listing all the secrets and you’ll notice that our audit policy does not collect the list verb. This means that running kubectl get sercets
will not be logged at all under our current AuditPolicy:
1# Here we're logging activity associated with running something like kubectl get secret secret123
2- level: Metadata
3 # We are not collecting "list" actions
4 verbs: ["get"]
5 resources:
6 - group: ""
7 resources: ["secrets"]
Running the command kubectl get secrets
doesn’t seem like it is something that should be logged because it only lists the secrets, it doesn’t provide the actual sensitive data, right? Well…. actually no. You can very easily see the data if you run kubectl get secrets -o yaml
(or -o json
). Despite this, since we didn’t specify the list
verb in our AuditPolicy, this action will NOT be logged even though we’ve just accessed every secret in this namespace.
Luckily this is a very simple fix if you’re aware of this odd quirk. All we need to do is update our AuditPolicy to reflect our desire to capture the list
verbs on secrets:
1apiVersion: audit.k8s.io/v1
2kind: Policy
3omitStages:
4 - "RequestReceived"
5rules:
6
7- level: Metadata
8 verbs: ["create"]
9 resources:
10 - group: ""
11 resources: ["pods", "pods/status"]
12
13- level: Metadata
14 verbs: ["get", "list"] # Added "list"
15 resources:
16 - group: ""
17 resources: ["secrets"]
18
19- level: None
Now once we run kubectl get secrets
, our log file contains that request. Depending on what you’re running in your cluster, this may be very noisy as lots of things list secrets in clusters (which… is a whole different topic…….).
1{
2 "kind": "Event",
3 "apiVersion": "audit.k8s.io/v1",
4 "level": "Metadata",
5 "auditID": "36d2e2dd-3348-4ce1-ad9b-d364c871324a",
6 "stage": "ResponseComplete",
7 "requestURI": "/api/v1/namespaces/default/secrets?limit=500",
8 //
9 // Notice the list verb has been logged
10 //
11 "verb": "list",
12 "user": {
13 "username": "kubernetes-admin",
14 "groups": [
15 "kubeadm:cluster-admins",
16 "system:authenticated"
17 ]
18 },
19 "sourceIPs": [
20 "192.168.1.167"
21 ],
22 "userAgent": "kubectl/v1.28.9 (linux/amd64) kubernetes/587f5fe",
23 "objectRef": {
24 "resource": "secrets",
25 "namespace": "default",
26 "apiVersion": "v1"
27 },
28 "responseStatus": {
29 "metadata": {},
30 "code": 200
31 },
32 "requestReceivedTimestamp": "2024-06-02T00:39:37.042236Z",
33 "stageTimestamp": "2024-06-02T00:39:37.123574Z",
34 "annotations": {
35 "authorization.k8s.io/decision": "allow",
36 "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"kubeadm:cluster-admins\" of ClusterRole \"cluster-admin\" to Group \"kubeadm:cluster-admins\""
37 }
38}
AuditLogs are extremely powerful but only if you tune them correctly. Keep an eye on The Kubenomicon for ideas on what you should be auditing on to catch attackers.
Kubernetes Events
Kubernetes events are viewed by running kubectl get events
. These events are created whenever something changes at the cluster level. For example, if I create a Pod that is attempting to pull a nginx
image, events will document that process. In this case, the pod was successfully created so there were no issues.
If there was an error pulling the image, these logs would show that. In this example, I’ve made a typo in the image name and thus Kubernetes (or really the container runtime), can’t pull the image.
By default, Kubernetes events are stored for 1 hour (although this can be configured). Kubernetes events are great for troubleshooting a Kubernetes cluster, but if you’re relying on your SOC to investigate information using kubectl
, you should probably rethink your logging architecture.
4. Cloud Level Logging
Logging at the cloud level is anything “above” the cluster level. This mostly means the log artifacts are generated by your cloud provider. I’m not going to cover this in much detail because it’s different for each provider and I typically work with non-cloud provider managed clusters so I don’t have too much to say on the topic that you can’t just read from the documentation which you can find here:
- GCP: Cloud logging
- Azure: Azure Monitor
- AWS: CloudTrail
Tail -f
Phew that was a lot of logs we just waded through. As you can see, it’s a little less straightforward to collect logs from all the layers of a Kubernetes cluster than it is to collect logs from just a normal virtual machine but it’s certainly possible.
I hope the main idea you take away from this is that collecting ALL the logs is not super useful in most instances, it’s very important to understand what you want to collect from each layer and tune your logging to align with it.
Here are some things you should look into if you’re interested in learning more about Kubernetes logging: