bpf check_alu_op vulnerability analysis (CVE-2017–16995)
Around December 2017, CVE-2017–16995 was published:
The check_alu_op function in kernel/bpf/verifier.c in the Linux kernel through 4.14.8 allows local users to cause a denial of service (memory corruption) or possibly have unspecified other impact by leveraging incorrect sign extension.
I have come across this kernel exploit when rooting a machine on HackTheBox, and used it to gain root privileges. This was what led me to try to understand it.
First, we’ll want to understand, what is bpf exactly?
Berkeley Packet Filter (BPF) mostly allows you to run filters against packets at the kernel space. Filters are a set of instructions running inside the BPF virtual machine. Since Linux version 3.18, eBPF (extended BPF) was introduced. eBPF then allowed to attach those filters to sockets, tracepoints (for debugging purposes), and more. For example, tcpdump uses BPF bytecode to filter packets. You can view the BPF instructions if you add the
The BPF Verifier
The bpf syscalls are essentially allowing to run code at the kernel space from the user space. That’s a security risk, hence we have the BPF verifier. Its purpose is to make sure the instructions are safe to run. Here are some of the checks it performs:
- Verifies the program isn’t too big
- Forbids loops (DAG check)
- Forbids unreachable instructions from existing in the code
- Forbids reading from uninitialized registers
- Forbids exiting the filter without setting a return value first
- Verifies the program isn’t accessing invalid memory
The vulnerable function
check_alu_op is responsible for checking the validity of arithmetic instructions. When the static code analyzer wants to MOV an immediate value into a register, the following code will execute in the verifier:
Let’s look at the
bpf_insn struct, and the signature of
By looking at the struct, we can see that the immediate value is a signed 32-bit integer, and the
imm parameter in
__mark_reg_known is an unsigned 64-bit integer. Therefore, an implicit type conversion occurs.
But what happens when we convert a signed integer into an unsigned one? According to this document, the conversion happens this way:
Let’s say we have this BPF instruction:
The immediate is a signed 32-bit int, so it’s actually -1 in decimal. We perform the above calculation with a
MAX_UNSIGNED_INT of 64-bit:
This is essentially sign extension, the most significant bit was extended throughout the 64-bit “register”. What should’ve happened is zero-padding of the upper 32-bits, since this is a MOV32 operation.
This resulted in
0xFFFFFFFFFFFFFFFF being saved to the hypothetical register state in the static code analyzer. But in reality, when the instructions will run, the register would be zero-padded:
0x00000000FFFFFFFF . Now we’re going to try and understand why the verifier is different than the actual code execution.
Let’s take a look at the bpf core code, specifically
What we’re interested in is the
BPF_ALU | BPF_MOV | BPF_K . For reference:
BPF_K means the source operand is an immediate value.
BPF_X means the source operand is a register. Let’s look at the operation:
What differs here from the simulated MOV operation at the verifier, is that the immediate value is cast to an unsigned integer. This prevents the incorrect sign extension from occurring, and the result would be
0xFFFFFFFF . Casting to unsigned before moving is also the fix that was deployed for this issue.
We learned that the static code analyzer thinks we have a different immediate value in the register than what would’ve been during “real” code execution. The documentation in
bpf_check() is a static code analyzer that walks eBPF program instruction by instruction and updates register/stack state. All paths of conditional branches are analyzed until 'bpf_exit' insn.
The common way to exploit this vulnerability is to trick the analyzer into thinking the code always exits with a conditional:
mov32 r2, 0xFFFFFFFF // r2 gets sign-extended in the verifier jne r2, 0xFFFFFFFF, +2 // if r2!=0xFFFFFFFF, jmp 2 instruction ahead mov64 r0, 0x0 // set return code to 0 bpf_exit // stop execution - verifier stops here
Let’s take a look at a part of the code that handles conditional jumps:
tnum_equals_const is responsible for comparing the register state to the immediate value. We already know that
insn->imm is a 32-bit signed integer, but inside the
tnum.value is an unsigned 64-bit integer.
So what happens when we compare a signed and an unsigned integer? Simply put, the compiler converts the signed integer to an unsigned, repeating the same sign extension we’ve looked at previously. Can you guess the output of this code?
The output would be true. After understanding this, it’s clear that the aforementioned code would continue in the fall-through branch, going to the
bpf_exit call, and leaving any code coming after the exit call unverified.
Leveraging this issue, an attacker could insert malicious code after the exit call, and have arbitrary R/W access to the kernel memory.