为了减少内核或跨进程内存泄露(Spectre攻击),Linux kernel1将使用一个新选项进行编译,-mindirect-branch= unk-extern引入gcc,通过所谓的retpoline执行间接调用。
这似乎是一个新发明的术语,因为谷歌搜索最近才出现(一般都是在2018年)。
什么是retpoline ?它如何防止最近的内核信息泄露攻击?
1然而,这并不是Linux特有的——类似或相同的结构似乎被用作其他操作系统的缓解策略的一部分。
为了减少内核或跨进程内存泄露(Spectre攻击),Linux kernel1将使用一个新选项进行编译,-mindirect-branch= unk-extern引入gcc,通过所谓的retpoline执行间接调用。
这似乎是一个新发明的术语,因为谷歌搜索最近才出现(一般都是在2018年)。
什么是retpoline ?它如何防止最近的内核信息泄露攻击?
1然而,这并不是Linux特有的——类似或相同的结构似乎被用作其他操作系统的缓解策略的一部分。
当前回答
sgbj在谷歌的Paul Turner的评论中提到的文章更详细地解释了以下内容,但我还是会尝试一下:
就我目前所能从有限的信息中拼凑出来的而言,retpoline是一种返回蹦床,它使用了一个永远不会执行的无限循环,以防止CPU推测间接跳转的目标。
基本方法可以在Andi Kleen的内核分支中看到,以解决这个问题:
它引入了新的__x86。调用indirect_thunk,该调用加载内存地址(我将称为ADDR)存储在堆栈顶部的调用目标,并使用RET指令执行跳转。然后使用NOSPEC_JMP/CALL宏调用thunk本身,该宏用于替换许多(如果不是全部)间接调用和跳转。宏只是将调用目标放在堆栈上,并在必要时正确设置返回地址(注意非线性控制流):
.macro NOSPEC_CALL target
jmp 1221f /* jumps to the end of the macro */
1222:
push \target /* pushes ADDR to the stack */
jmp __x86.indirect_thunk /* executes the indirect jump */
1221:
call 1222b /* pushes the return address to the stack */
.endm
将调用放置在最后是必要的,这样当间接调用完成时,控制流将在使用NOSPEC_CALL宏之后继续执行,因此可以使用NOSPEC_CALL宏代替常规调用
坦克本身看起来如下:
call retpoline_call_target
2:
lfence /* stop speculation */
jmp 2b
retpoline_call_target:
lea 8(%rsp), %rsp
ret
这里的控制流可能有点令人困惑,所以让我澄清一下:
Call将当前指令指针(标签2)推入堆栈。 lea向堆栈指针添加8,有效地丢弃了最近推入的四字,这是最后一个返回地址(标签2)。在这之后,堆栈的顶部再次指向真正的返回地址ADDR。 ret跳转到*ADDR并将堆栈指针重置到调用堆栈的开头。
最后,整个行为实际上等同于直接跳转到*ADDR。我们得到的一个好处是,用于返回语句(返回堆栈缓冲区,RSB)的分支预测器在执行调用指令时,假设相应的ret语句将跳转到标签2。
The part after the label 2 actually never gets executed, it's simply an infinite loop that would in theory fill the instruction pipeline with JMP instructions. By using LFENCE,PAUSE or more generally an instruction causing the instruction pipeline to be stall stops the CPU from wasting any power and time on this speculative execution. This is because in case the call to retpoline_call_target would return normally, the LFENCE would be the next instruction to be executed. This is also what the branch predictor will predict based on the original return address (the label 2)
引用英特尔的架构手册:
LFENCE之后的指令可以在LFENCE之前从内存中提取,但它们在LFENCE完成之前不会执行。
但是请注意,规范从未提到LFENCE和PAUSE会导致管道暂停,所以我在这里读到了一些字里行间的意思。
现在回到你最初的问题: 内核内存信息的公开是可能的,因为结合了两个思想:
Even though speculative execution should be side-effect free when the speculation was wrong, speculative execution still affects the cache hierarchy. This means that when a memory load is executed speculatively, it may still have caused a cache line to be evicted. This change in the cache hierarchy can be identified by carefully measuring the access time to memory that is mapped onto the same cache set. You can even leak some bits of arbitrary memory when the source address of the memory read was itself read from kernel memory. The indirect branch predictor of Intel CPUs only uses the lowermost 12 bits of the source instruction, thus it is easy to poison all 2^12 possible prediction histories with user-controlled memory addresses. These can then, when the indirect jump is predicted within the kernel, be speculatively executed with kernel privileges. Using the cache-timing side-channel, you can thus leak arbitrary kernel memory.
更新:在内核邮件列表中,有一个正在进行的讨论,让我相信retpoline并不能完全缓解分支预测问题,因为当返回堆栈缓冲区(RSB)运行空时,最新的英特尔架构(Skylake+)会退回到脆弱的分支目标缓冲区(BTB):
Retpoline作为一种缓解策略,用间接分支换取回报, 避免使用来自BTB的预测,因为它们可能是 被攻击者毒死了。 Skylake+的问题是RSB下流回落到使用 BTB预测,允许攻击者控制推测。
其他回答
A retpoline is designed to protect against the branch target injection (CVE-2017-5715) exploit. This is an attack where an indirect branch instruction in the kernel is used to force the speculative execution of an arbitrary chunk of code. The code chosen is a "gadget" that is somehow useful to attacker. For example code can be chosen so that will leak kernel data through how it affects the cache. The retpoline prevents this exploit by simply replacing all indirect branch instructions with a return instruction.
I think what's key about the retpoline is just the "ret" part, that it replaces the indirect branch with a return instruction so that the CPU uses the return stack predictor instead of the exploitable branch predictor. If a simple push and a return instruction was used instead then the code that would be speculatively executed would be the code the function will eventually return to anyways, not some gadget useful to the attacker. The main benefit of the trampoline part seems to be to maintain the return stack so when the function actually does return to its caller this is predicted correctly.
分支目标注入背后的基本思想很简单。它利用了CPU不会在分支目标缓冲区中记录分支源和目标的完整地址这一事实。因此,攻击者可以在自己的地址空间中使用跳转来填充缓冲区,当在内核地址空间中执行特定的间接跳转时,这将导致预测命中。
请注意,retpoline并不直接阻止内核信息的公开,它只是防止间接分支指令被用于推测性地执行将公开信息的小工具。如果攻击者可以找到一些其他的方法来投机地执行小工具,那么retpoline就不能阻止攻击。
论文《幽灵攻击:利用投机执行》由Paul Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp、Stefan Mangard、Thomas Prescher、Michael Schwarz和Yuval Yarom概述了如何利用间接分支:
Exploiting Indirect Branches. Drawing from return oriented programming (ROP), in this method the attacker chooses a gadget from the address space of the victim and influences the victim to execute the gadget speculatively. Unlike ROP, the attacker does not rely on a vulnerability in the victim code. Instead, the attacker trains the Branch Target Buffer (BTB) to mispredict a branch from an indirect branch instruction to the address of the gadget, resulting in a speculative execution of the gadget. While the speculatively executed instructions are abandoned, their effects on the cache are not reverted. These effects can be used by the gadget to leak sensitive information. We show how, with a careful selection of a gadget, this method can be used to read arbitrary memory from the victim. To mistrain the BTB, the attacker finds the virtual address of the gadget in the victim’s address space, then performs indirect branches to this address. This training is done from the attacker’s address space, and it does not matter what resides at the gadget address in the attacker’s address space; all that is required is that the branch used for training branches to use the same destination virtual address. (In fact, as long as the attacker handles exceptions, the attack can work even if there is no code mapped at the virtual address of the gadget in the attacker’s address space.) There is also no need for a complete match of the source address of the branch used for training and the address of the targetted branch. Thus, the attacker has significant flexibility in setting up the training.
Project Zero团队在谷歌上发表的一篇题为“使用侧通道读取特权内存”的博客文章提供了如何使用分支目标注入创建有效漏洞的另一个示例。
sgbj在谷歌的Paul Turner的评论中提到的文章更详细地解释了以下内容,但我还是会尝试一下:
就我目前所能从有限的信息中拼凑出来的而言,retpoline是一种返回蹦床,它使用了一个永远不会执行的无限循环,以防止CPU推测间接跳转的目标。
基本方法可以在Andi Kleen的内核分支中看到,以解决这个问题:
它引入了新的__x86。调用indirect_thunk,该调用加载内存地址(我将称为ADDR)存储在堆栈顶部的调用目标,并使用RET指令执行跳转。然后使用NOSPEC_JMP/CALL宏调用thunk本身,该宏用于替换许多(如果不是全部)间接调用和跳转。宏只是将调用目标放在堆栈上,并在必要时正确设置返回地址(注意非线性控制流):
.macro NOSPEC_CALL target
jmp 1221f /* jumps to the end of the macro */
1222:
push \target /* pushes ADDR to the stack */
jmp __x86.indirect_thunk /* executes the indirect jump */
1221:
call 1222b /* pushes the return address to the stack */
.endm
将调用放置在最后是必要的,这样当间接调用完成时,控制流将在使用NOSPEC_CALL宏之后继续执行,因此可以使用NOSPEC_CALL宏代替常规调用
坦克本身看起来如下:
call retpoline_call_target
2:
lfence /* stop speculation */
jmp 2b
retpoline_call_target:
lea 8(%rsp), %rsp
ret
这里的控制流可能有点令人困惑,所以让我澄清一下:
Call将当前指令指针(标签2)推入堆栈。 lea向堆栈指针添加8,有效地丢弃了最近推入的四字,这是最后一个返回地址(标签2)。在这之后,堆栈的顶部再次指向真正的返回地址ADDR。 ret跳转到*ADDR并将堆栈指针重置到调用堆栈的开头。
最后,整个行为实际上等同于直接跳转到*ADDR。我们得到的一个好处是,用于返回语句(返回堆栈缓冲区,RSB)的分支预测器在执行调用指令时,假设相应的ret语句将跳转到标签2。
The part after the label 2 actually never gets executed, it's simply an infinite loop that would in theory fill the instruction pipeline with JMP instructions. By using LFENCE,PAUSE or more generally an instruction causing the instruction pipeline to be stall stops the CPU from wasting any power and time on this speculative execution. This is because in case the call to retpoline_call_target would return normally, the LFENCE would be the next instruction to be executed. This is also what the branch predictor will predict based on the original return address (the label 2)
引用英特尔的架构手册:
LFENCE之后的指令可以在LFENCE之前从内存中提取,但它们在LFENCE完成之前不会执行。
但是请注意,规范从未提到LFENCE和PAUSE会导致管道暂停,所以我在这里读到了一些字里行间的意思。
现在回到你最初的问题: 内核内存信息的公开是可能的,因为结合了两个思想:
Even though speculative execution should be side-effect free when the speculation was wrong, speculative execution still affects the cache hierarchy. This means that when a memory load is executed speculatively, it may still have caused a cache line to be evicted. This change in the cache hierarchy can be identified by carefully measuring the access time to memory that is mapped onto the same cache set. You can even leak some bits of arbitrary memory when the source address of the memory read was itself read from kernel memory. The indirect branch predictor of Intel CPUs only uses the lowermost 12 bits of the source instruction, thus it is easy to poison all 2^12 possible prediction histories with user-controlled memory addresses. These can then, when the indirect jump is predicted within the kernel, be speculatively executed with kernel privileges. Using the cache-timing side-channel, you can thus leak arbitrary kernel memory.
更新:在内核邮件列表中,有一个正在进行的讨论,让我相信retpoline并不能完全缓解分支预测问题,因为当返回堆栈缓冲区(RSB)运行空时,最新的英特尔架构(Skylake+)会退回到脆弱的分支目标缓冲区(BTB):
Retpoline作为一种缓解策略,用间接分支换取回报, 避免使用来自BTB的预测,因为它们可能是 被攻击者毒死了。 Skylake+的问题是RSB下流回落到使用 BTB预测,允许攻击者控制推测。
这个问题是很久以前提出的,应该有一个新的答案。
执行概要:
“Retpoline”序列是一种软件结构,它允许间接分支与推测执行隔离。这可以用于保护敏感的二进制文件(如操作系统或管理程序实现)免受针对其间接分支的分支目标注入攻击。
“retpoline”这个词是“return”和“trampoline”两个词的合成词,就像“relpoline”这个词是由“relative call”和“trampoline”组合而成的一样。这是一个使用返回操作构建的蹦床结构,也形象地确保任何相关的投机执行将无休止地“反弹”。
为了减少内核或跨进程内存泄露(Spectre攻击),Linux内核[1]将使用一个新选项进行编译,-mindirect-branch= unk-extern引入gcc,通过所谓的retpoline执行间接调用。 [1]它不是Linux特有的,但是-类似或相同的结构似乎被用作其他操作系统上的缓解策略的一部分。
使用此编译器选项仅可在具有CVE-2017-5715所需微码更新的受影响处理器中防止Spectre V2。它可以在任何代码(不仅仅是内核)上“工作”,但只有包含“秘密”的代码才值得攻击。
这似乎是一个新发明的术语,因为谷歌搜索最近才出现(一般都是在2018年)。
LLVM编译器自2018年1月4日之前就有了-mretpoline开关。该日期是该漏洞首次被公开报道的时间。GCC于2018年1月7日发布了补丁。
CVE日期表明,该漏洞是在2017年被“发现”的,但它影响了过去20年生产的一些处理器(因此它很可能很久以前就被发现了)。
什么是retpoline ?它如何防止最近的内核信息泄露攻击?
首先是几个定义:
Trampoline - Sometimes referred to as indirect jump vectors trampolines are memory locations holding addresses pointing to interrupt service routines, I/O routines, etc. Execution jumps into the trampoline and then immediately jumps out, or bounces, hence the term trampoline. GCC has traditionally supported nested functions by creating an executable trampoline at run time when the address of a nested function is taken. This is a small piece of code which normally resides on the stack, in the stack frame of the containing function. The trampoline loads the static chain register and then jumps to the real address of the nested function. Thunk - A thunk is a subroutine used to inject an additional calculation into another subroutine. Thunks are primarily used to delay a calculation until its result is needed, or to insert operations at the beginning or end of the other subroutine Memoization - A memoized function "remembers" the results corresponding to some set of specific inputs. Subsequent calls with remembered inputs return the remembered result rather than recalculating it, thus eliminating the primary cost of a call with given parameters from all but the first call made to the function with those parameters.
非常粗略地说,retpoline是一个带有返回值的蹦床,在间接分支预测器中“破坏”记忆。
retpoline包括Intel的PAUSE指令,但AMD的LFENCE指令是必要的,因为在处理器上PAUSE指令不是序列化指令,所以PAUSE /jmp循环将使用多余的能量,因为它被推测为等待返回而错误地预测到正确的目标。
Arstechnica对这个问题有一个简单的解释:
"Each processor has an architectural behavior (the documented behavior that describes how the instructions work and that programmers depend on to write their programs) and a microarchitectural behavior (the way an actual implementation of the architecture behaves). These can diverge in subtle ways. For example, architecturally, a program that loads a value from a particular address in memory will wait until the address is known before trying to perform the load. Microarchitecturally, however, the processor might try to speculatively guess at the address so that it can start loading the value from memory (which is slow) even before it's absolutely certain of which address it should use. If the processor guesses wrong, it will ignore the guessed-at value and perform the load again, this time with the correct address. The architecturally defined behavior is thus preserved. But that faulty guess will disturb other parts of the processor—in particular the contents of the cache. These microarchitectural disturbances can be detected and measured by timing how long it takes to access data that should (or shouldn't) be in the cache, allowing a malicious program to make inferences about the values stored in memory.".
摘自英特尔的论文:“Retpoline:分支目标注入缓解”(.PDF):
"A retpoline sequence prevents the processor’s speculative execution from using the "indirect branch predictor" (one way of predicting program flow) to speculate to an address controlled by an exploit (satisfying element 4 of the five elements of branch target injection (Spectre variant 2) exploit composition listed above).". Note, element 4 is: "The exploit must successfully influence this indirect branch to speculatively mispredict and execute a gadget. This gadget, chosen by the exploit, leaks the secret data via a side channel, typically by cache-timing.".