Here are simple steps that you can follow to prove that the root
user inside container is also root
on the host. And how to mitigate this.
Root in container, root on host
I have a host with docker daemon running on it. I start a normal container on it with sleep
process as PID1. See in the following output that the container clever_lalande
started with sleep
process.
$ docker run -d --rm alpine sleep 9999
6c541cf8f7b315783d2315eebc2f7dddd1f7b26f427e182f8597b10f2746ab0b
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6c541cf8f7b3 alpine "sleep 9999" 12 seconds ago Up 11 seconds clever_lalande
Now let’s find out the process sleep
on the host. Here in the following output you can see that the process sleep is running as user root
.
$ ps aufx | grep sleep
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 4826 0.3 0.0 1552 4 ? Ss 07:34 0:00 \_ sleep 9999
core 4864 0.0 0.0 6864 964 pts/0 S+ 07:34 0:00 \_ grep --colour=auto sleep
Also the sleep process is root inside the container.
$ docker exec -it clever_lalande id
uid=0(root) gid=0(root)
Non-root inside the container, non-root on host
The user I am logged into this machine is called core
with user id 500.
$ whoami
core
$ id
uid=500(core) gid=500(core) groups=500(core),10(wheel),233(docker),248(systemd-journal),250(portage),251(rkt) context=system_u:system_r:kernel_t:s0
Let’s start the container in the same way but with additional flag --user
. Here I started a new container and forced it to run under the security context of the user core
. Here I specify the UID same as the UID on the host. The documentation of the flag says that:
-u, –user="".
Sets the username or UID used and optionally the groupname or GID for the specified command.
Here is the container that is started with sleep
process named wonderful_proskuriakova
.
$ docker run -d --rm --user ${UID}:${UID} alpine sleep 9999
1cdc11a449e4e62a9557a4d7b586aa320f5512f2746f4a8e1cac7b9e6d2e1225
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1cdc11a449e4 alpine "sleep 9999" 25 seconds ago Up 25 seconds wonderful_proskuriakova
If I try to find the same process on the host here you can clearly see that the process is not running as root
but as user core
.
$ ps aufx | grep sleep
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
core 4607 0.0 0.0 1552 4 ? Ss 07:30 0:00 \_ sleep 9999
core 4648 0.0 0.0 6864 900 pts/0 S+ 07:30 0:00 \_ grep --colour=auto sleep
Also inside the container I am running as UID 500
.
$ docker exec -it wonderful_proskuriakova id
uid=500 gid=500
This is just one way to mitigate the user to be non-root. You should also drop all the capabilities and whitelist the capabilities that are absolutely needed. Also provide a seccomp profile that is locked down which only allows the syscalls that are needed by your application. There are many times where you want to run as root inside the container in such situations you should use user namespaces, which is what we are looking in the next section.
Root inside container, non-root on host
Now I have docker daemon which is started with docker user namespace enabled. Look at the flag --userns-remap=default
being used to start the docker daemon with user namespace. To know more about enabling user namespace, follow docs here.
$ systemctl status docker-userns
● docker-userns.service - Docker Application Container Engine
Loaded: loaded (/etc/systemd/system/docker-userns.service; disabled; vendor preset: disabled)
Active: active (running) since Tue 2019-06-25 07:07:37 UTC; 1h 11min ago
Docs: http://docs.docker.com
Main PID: 4011 (dockerd)
Tasks: 10
Memory: 61.8M
CGroup: /system.slice/docker-userns.service
└─4011 /run/torcx/bin/dockerd --host=fd:// --host=tcp://127.0.0.1:2376 --containerd=/var/run/docker/libcontainerd/docker-containerd.sock --userns-remap=default --pidfile /var/run/docker-userns.pid --selinux-enabled=true
...
Now start the container like we did for the first time without any --user
flag. In the following output you can see that the container started with name sad_pasteur
.
$ docker run --rm -d alpine sleep 9999
05290a7088b3e7c0e4e80cbb3a63c0d63a49627b8d31ec9f75f44b9a57b717f4
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
05290a7088b3 alpine "sleep 9999" 2 seconds ago Up 1 second sad_pasteur
Now if we see the sleep
process on host, the process has started with different user 100000
on host.
$ ps aufx | grep sleep
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
100000 5467 0.2 0.0 1552 4 ? Ss 08:16 0:00 \_ sleep 9999
core 5511 0.0 0.0 6864 988 pts/0 S+ 08:16 0:00 \_ grep --colour=auto sleep
But inside the container the user is still root
.
$ docker exec -it sad_pasteur id
uid=0(root) gid=0(root)
This is because of the user namespace enabled on the docker daemon that we see user 100000
on host. This mapping of the user id on host and inside the container can be found in the following files:
$ cat /etc/subuid
dockremap:100000:65536
$ cat /etc/subgid
dockremap:100000:65536