A Common Vulnerability Exposure (CVE-2019-5736) was released on February 11, 2019, regarding Docker, the well-known containerization platform. This vulnerability allows to escape from a container and get a root access on its host machine. That could be done by overwriting a binary file (runC) on a host machine from one of its containers. The flaw was also present for privileged Linux Containers (LXC), a “userspace interface for the Linux kernel containment features”.
The goal of the present article is to explain in detail how an attacker could manage to exploit the flaw and how it was corrected to prevent such an attack. Below elements will be covered in this article:
1. What is runC?
2. What is the proc filesystem?
3. How the proc filesystem of a docker container can be used to overwrite the runC program?
4. How a malicious image can be built ? (Shared libraries)
5. How do the vulnerability was fixed?
According to the Open Container Initiative (OCI), runC is a lightweight universal run time container. It is used by containerd as a CLI tool for spawning and running containers. The binary file of runC can be found at the following path /usr/sbin/runc on linux systems.
The program containerd is a container runtime available as a daemon for Linux and Windows. It is used by docker to manage the complete container lifecycle of its host system such as image transfer and storage, container execution and supervision, low-level storage and network attachments.
The CVE-2019-5736 bulletin revealed that due to a flaw in runC, it was possible to overwrite the host machine’s runC binary in /usr/sbin/runc using the proc filesystem of one of its containers.
On Linux, the proc filesystem is a pseudo filesystem in which a lot of data about kernel and processes can be found. It is mounted at /proc. Moreover, every process is represented by a directory in /proc/[pid_of_the_process]. All information about processes are stored in files located in those directories. For example, there are files and subdirectories such as the following :
Using the same structure, the directory /proc/self refers to the process accessing the /proc filesystem and is identical to the /proc directory named by the process ID of the same process. This means that the exe file and the fd/ subdirectory can also be found in the /proc/self directory as well as every other file contained in /proc/[pid] directories.
When someone executes the commands docker run and docker exec, the runC binary is called by containerd to spawn and run a container. During this operation, the process which /proc/self refers to is the runC’s one. In other words, at this time, /proc/self/exe is a symbolic link which points to /proc/[runC_pid]/exe which points to /usr/sbin/runc file on the host machine.
However, as the runC process is running,
overwriting runC binary is not allowed. Moreover, if the attacker waits for the
end of the process to overwrite it, /proc/self/exe
will not point to the runC binary anymore. Nevertheless, the runC file
overwriting can be done with a trick using the runC’s shared libraries.
The runC binary dynamically loads several shared libraries when it is executed. The specificity of shared libraries is that, unlike static libraries, they are loaded by the executable (or other shared libraries) at the runtime. runC’s shared libraries can be shown by executing the command below:
$ ldd /user/sbin/runc linux-vdso.so.1 (0x00007ffc9d526000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fed38ddf000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fed38bdb000) libseccomp.so.2 => /lib/x86_64-linux-gnu/libseccomp.so.2 (0x00007fed38995000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fed385f6000) /lib64/ld-linux-x86-64.so.2 (0x00007fed39de0000)
Overwriting one of those libraries would allow us to make runC calling our code. This will give us the opportunity to be part of the runC process and perform the trick described below to “overwrite a binary used to launch a running process”.
The scenario introduced here is to craft a malicious Docker image and send it to a victim by anyway (by using a Docker registry for example). To craft it, a Dockerfile performing below actions needs to be created:
Now, let’s describe actions of the reader.c file.
The runC executable has to not be used
by a process anymore to be overwritten. Nevertheless, to do it, the file
descriptor opened in read-only has to remain open.
According the man page of the execve() function, it “executes the program pointed to by filename. This causes the program that is currently being run by the calling process to be replaced with a new program, with newly initialized stack, heap, and (initialized and uninitialized) data segments.” Another point is important in the man page: “By default, file descriptors remain open across an execve().”
The execve function is therefore perfect to end runC usage and launch the executable (overwriter) which will overwrite the runC binary. Note that the path /proc/self/fd/[fd_for_reading] will be given to overwriter.
Finally, the overwriter binary will follow the steps below:
After the overwriter process ended, the runC binary (the overwritten one) will be called a second time to end the container. This will execute the malicious crafted runC binary.
Let’s see the exploitation in real life.
In order to prevent the runC file to be overwritten, runC’s maintainers make some changes in its mechanism:
However, this fix seems to increase the use of memory to launch containers, according to an issue opened on the runC Github repository. Nevertheless, a maintainer is currently implementing a solution to fix this unfortunate effect. (see runC’s Github repository for more details).
The Common Vulnerability Exposure CVE-2019-5736 found on Docker’s runC library has been explained in this article. An introduction of runC and proc filesystem has been provided in order to have the basic knowledge to understand why the vulnerability was present.
An exploitation of the vulnerability using one of the runC’s shared libraries has been introduced to understand with deep details how attackers could manage to get a root access on your system (which uses a version of docker < 18.09.2) using a malicious image.
Finally, an explanation of the performed vulnerability fix has been provided in order to figure out how runC’s maintainers prevented attackers to overwrite it.
I hope I caught your attention with this explanation and that you found it interesting. Note that the fix is available in the new release of Docker Engine 18.09.2. Feel free to contact me if you have any question regarding this article at firstname.lastname@example.org