HostPath volumes and it's problems

Kubernetes HostPath volume good way to nuke your Kubernetes Nodes

Suraj Deshmukh

8 minute read

This post will demonstrate how Kubernetes HostPath volumes can help you get access to the Kubernetes nodes. Atleast you can play with the filesystem of the node on which you pod is scheduled on. You can get access to other containers running on the host, certificates of the kubelet, etc.

I have a 3-master and 3-node cluster and setup using this script, running in a Vagrant environment.

All the nodes are in ready state:

$ kubectl get nodes -o wide
NAME       STATUS    ROLES     AGE       VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
worker-0   Ready     <none>    24m       v1.11.2   192.168.199.20   <none>        Ubuntu 18.04.1 LTS   4.15.0-33-generic   cri-o://1.11.2
worker-1   Ready     <none>    23m       v1.11.2   192.168.199.21   <none>        Ubuntu 18.04.1 LTS   4.15.0-33-generic   cri-o://1.11.2
worker-2   Ready     <none>    21m       v1.11.2   192.168.199.22   <none>        Ubuntu 18.04.1 LTS   4.15.0-33-generic   cri-o://1.11.2

The deployment looks like this:

$ cat deployment.yaml 
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  labels:
    run: web
  name: web
spec:
  replicas: 1
  selector:
    matchLabels:
      run: web
  template:
    metadata:
      labels:
        run: web
    spec:
      containers:
      - image: centos/httpd
        name: web
        volumeMounts:
        - mountPath: /web
          name: test-volume
      volumes:
      - name: test-volume
        hostPath:
          path: /

Above you can see we are mounting / of the host inside pod at /web. This is our gateway to host’s file system. Let’s deploy this:

$ kubectl apply -f deployment.yaml
deployment.apps/web created

And now that pod has started:

$ kubectl get pods -o wide
NAME                   READY     STATUS    RESTARTS   AGE       IP          NODE       NOMINATED NODE
web-66cdf67bbc-44zhj   1/1       Running   0          4m        10.38.0.1   worker-2   <none>

Getting inside the pod and checking out the mounted directory:

$ kubectl exec -it web-66cdf67bbc-44zhj bash
[root@web-66cdf67bbc-44zhj /]# cd /web
[root@web-66cdf67bbc-44zhj web]# ls
bin  boot  dev  etc  home  initrd.img  initrd.img.old  lib  lib64  lost+found  media  mnt  opt  proc  root  run  sbin  snap  srv  sys  tmp  usr  vagrant  var  vmlinuz  vmlinuz.old

Now we can either chroot into this and see the output of ps.

[root@web-66cdf67bbc-44zhj ~]# chroot /web
# ps aufx
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         2  0.0  0.0      0     0 ?        S    05:15   0:00 [kthreadd]
root         4  0.0  0.0      0     0 ?        I<   05:15   0:00  \_ [kworker/0:0H]
root         6  0.0  0.0      0     0 ?        I<   05:15   0:00  \_ [mm_percpu_wq]
root         7  0.0  0.0      0     0 ?        S    05:15   0:00  \_ [ksoftirqd/0]
root         8  0.0  0.0      0     0 ?        I    05:15   0:00  \_ [rcu_sched]
root         9  0.0  0.0      0     0 ?        I    05:15   0:00  \_ [rcu_bh]
root        10  0.0  0.0      0     0 ?        S    05:15   0:00  \_ [migration/0]
root        11  0.0  0.0      0     0 ?        S    05:15   0:00  \_ [watchdog/0]
root        12  0.0  0.0      0     0 ?        S    05:15   0:00  \_ [cpuhp/0]

Or you can just delete the entire root, a.k.a. nuking the node.

# rm -rf --no-preserve-root /

Now if you look at the nodes:

$ kubectl get nodes -o wide
NAME       STATUS     ROLES     AGE       VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
worker-0   Ready      <none>    30m       v1.11.2   192.168.199.20   <none>        Ubuntu 18.04.1 LTS   4.15.0-33-generic   cri-o://1.11.2
worker-1   Ready      <none>    29m       v1.11.2   192.168.199.21   <none>        Ubuntu 18.04.1 LTS   4.15.0-33-generic   cri-o://1.11.2
worker-2   NotReady   <none>    27m       v1.11.2   192.168.199.22   <none>        Unknown              4.15.0-33-generic   cri-o://Unknown

The last node worker-2 is in NotReady state. We have successfully made one node unusable. Now that one node is gone your pod will be scheduled on another node, where you can do similar stuff.

$ kubectl get pods -o wide
NAME                   READY     STATUS    RESTARTS   AGE       IP          NODE       NOMINATED NODE
web-66cdf67bbc-44zhj   0/1       Unknown   0          20m       10.38.0.1   worker-2   <none>
web-66cdf67bbc-b22xn   1/1       Running   0          8m        10.32.0.2   worker-0   <none>

As you can see above the pod is re-scheduled on node worker-0. And we can do same set of steps to make worker-0 unusable.

Deleting the deployment to cleanup.

$ kubectl delete deployment web
deployment.extensions "web" deleted

Stopping this attack using PodSecurityPolicy

Now as a cluster admin how can you prevent this from happening? You can create something called as PodSecurityPolicy. This let’s you define what kind of pods be created. Or what permissions pod can request. Enable admission controller for this, read about it here.

Here is an example PodSecurityPolicy:

$ cat podsecuritypolicy.yaml 
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: example
spec:
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  runAsUser:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  volumes:
  - '*'
  privileged: false  # Don't allow privileged pods!
  allowedHostPaths:
  - pathPrefix: /foo
    readOnly: true

In above example, we are restricting access to hostPath a pod can request. Here the path that is allowed is /foo and has readOnly access to the underlying file system.

Create PodSecurityPolicy using above file:

$ kubectl apply -f podsecuritypolicy.yaml 
podsecuritypolicy.policy/example created

To enable this policy we need to create few more objects, a Role and RoleBinding.

Role:

$ cat role.yaml 
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: authorize-hostpath
rules:
- apiGroups: ['policy']
  resources: ['podsecuritypolicies']
  verbs:     ['use']
  resourceNames:
  - example

This Role will allow usage of policy that we created above.

Create Role:

$ kubectl apply -f role.yaml 
role.rbac.authorization.k8s.io/authorize-hostpath created

RoleBinding:

$ cat rolebinding.yaml 
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: auth-hostpath
roleRef:
  kind: Role
  name: authorize-hostpath
  apiGroup: rbac.authorization.k8s.io
subjects:

# Authorize all service accounts in a namespace:
- kind: Group
  apiGroup: rbac.authorization.k8s.io
  name: system:serviceaccounts

This RoleBinding will bind the Role above and all the ServiceAccounts in current namespace.

Create RoleBinding:

$ kubectl create -f rolebinding.yaml 
rolebinding.rbac.authorization.k8s.io/auth-hostpath created

Now that we have required permissions in place, try to re-create the deployment:

$ kubectl apply -f deployment.yaml 
deployment.apps/web created

The pod is not created and in events you can see an error as Error creating: pods "web-66cdf67bbc-" is forbidden: unable to validate against any pod security policy: [spec.volumes[0].hostPath.pathPrefix: Invalid value: "/": is not allowed to be used]:

$ kubectl get events 
LAST SEEN   FIRST SEEN   COUNT     NAME                              KIND         SUBOBJECT   TYPE      REASON                    SOURCE                  MESSAGE
...
6s          17s          12        web-66cdf67bbc.15530e83fd4f8592   ReplicaSet               Warning   FailedCreate              replicaset-controller   Error creating: pods "web-66cdf67bbc-" is forbidden: unable to validate against any pod security policy: [spec.volumes[0].hostPath.pathPrefix: Invalid value: "/": is not allowed to be used]

This error is due to the fact that we have allowed hostPath to be only under /foo and in the original file it is set to /.

Now change in deployment.yaml file at path deployment.spec.template.spec.volumes[0].hostPath.path from / to /foo and apply again:

$ kubectl apply -f deployment.yaml 
deployment.apps/web configured

You can see another error Error creating: pods "web-85cb548b47-" is forbidden: unable to validate against any pod security policy: [spec.containers[0].volumeMounts[0].readOnly: Invalid value: false: must be read-only]:

$ kubectl get events 
LAST SEEN   FIRST SEEN   COUNT     NAME                              KIND         SUBOBJECT   TYPE      REASON                    SOURCE                  MESSAGE
...
5s          15s          12        web-85cb548b47.15530eb0c27c6122   ReplicaSet               Warning   FailedCreate              replicaset-controller   Error creating: pods "web-85cb548b47-" is forbidden: unable to validate against any pod security policy: [spec.containers[0].volumeMounts[0].readOnly: Invalid value: false: must be read-only]

This is because in the volumeMount’s readOnly we have used in container has no value defined, which means it defaults to false and in PodSecurityPolicy we have defaulted the hostPath to be readOnly.

So change deployment.spec.template.spec.containers[0].volumeMounts[0].readOnly to true. And manifest should look like following:

$ cat deployment.yaml 
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  labels:
    run: web
  name: web
spec:
  replicas: 1
  selector:
    matchLabels:
      run: web
  template:
    metadata:
      labels:
        run: web
    spec:
      containers:
      - image: centos/httpd
        name: web
        volumeMounts:
        - mountPath: /web
          name: test-volume
          readOnly: true
      volumes:
      - name: test-volume
        hostPath:
          path: /foo

Now if you re-deploy the app:

$ kubectl apply -f deployment.yaml 
deployment.apps/web configured

The pod is scheduled and created:

$ kubectl get pods
NAME                   READY     STATUS    RESTARTS   AGE
web-7b77459d6b-kbqm8   1/1       Running   0          7s

Now if you exec into this pod and try to do something, you can see nothing happens:

$ kubectl exec -it web-7b77459d6b-kbqm8 bash
[root@web-7b77459d6b-kbqm8 /]# cd /web/
[root@web-7b77459d6b-kbqm8 web]# ls
[root@web-7b77459d6b-kbqm8 web]# touch file.txt
touch: cannot touch 'file.txt': Read-only file system

So this is really good feature you can use to stop someone from nuking your cluster.

Stopping this attack using SELinux

Above setup of the Kubernetes cluster had a Ubuntu based machines, now I have a Kubernetes cluster that is setup on Fedora which supports SELinux.

You can setup this cluster following steps in this post.

Note: This is a simple cluster setup without PodSecurityPolicy.

Once you have the cluster running:

$ kubectl get nodes -o wide
NAME      STATUS    ROLES     AGE       VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                    KERNEL-VERSION           CONTAINER-RUNTIME
fedora    Ready     master    23m       v1.11.2   10.0.2.15     <none>        Fedora 28 (Cloud Edition)   4.16.3-301.fc28.x86_64   docker://1.13.1

Lets follow the same set of steps of creating the deployment:

$ kubectl apply -f deployment.yaml 
deployment.apps/web created

The pod is created:

$ kubectl get pods
NAME                   READY     STATUS    RESTARTS   AGE
web-66cdf67bbc-t9n7m   1/1       Running   0          28s

Getting into the machine:

$ kubectl exec -it web-66cdf67bbc-t9n7m bash
[root@web-66cdf67bbc-t9n7m /]#
[root@web-66cdf67bbc-t9n7m /]# cd /web/
[root@web-66cdf67bbc-t9n7m web]# ls
bin  boot  dev  etc  home  lib  lib64  lost+found  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  vagrant  var
[root@web-66cdf67bbc-t9n7m web]# touch file.txt
touch: cannot touch 'file.txt': Permission denied

As you can see if you go to the root of host which is mounted at /web and try to create new file, it fails. This event will be logged into the SELinux audit logs. On host if you run following you will see logs:

# ausearch -m avc
----
time->Tue Sep 11 06:41:54 2018
type=AVC msg=audit(1536648114.522:985): avc:  denied  { write } for  pid=15775 comm="touch" name="/" dev="sda1" ino=2 scontext=system_u:system_r:container_t:s0:c230,c784 tcontext=system_u:object_r:root_t:s0 tclass=dir permissive=0

Here simple SELinux has stopped the container from writing into places it shouldn’t. Compared to PodSecurityPolicy (which is a beta feature in k8s 1.11), SELinux can help you right away if you are using older Kubernetes cluster and are on CentOS or RHEL.

References

comments powered by Disqus