Meltdown(熔断漏洞)- Reading Kernel Memory from User Space/KASLR | 原文+中文翻译

原文资源:《Meltdown- Reading Kernel Memory from User Space

或者本文文末看原文截图。

目录

Meltdown

Abstract

1 Introduction

2 Background

2.1 Out-of-order execution

2.2 Address Spaces

2.3 Cache Attacks

3 A Toy Example

4 Building Blocks of the Attack

4.1 Executing Transient Instructions

Exception handling

Exception suppression

4.2 Building a Covert Channel

5 Meltdown

Attack setting.

5.1 Attack Description

Step 1: Reading the secret

Step 2: Transmitting the secret

Step 3: Receiving the secret

Dumping the entire physical memory

5.2 Optimizations and Limitations

Single-bit transmission

Exception Suppression using Intel TSX

Dealing with KASLR

6 Evaluation

6.1 Information Leakage and Environments

6.1.1 Linux

6.1.2 Linux with KAISER Patch

6.1.3 Microsoft Windows

6.1.4 Containers

6.2 Meltdown Performance

6.2.1 Exception Handling

6.2.2 Exception Suppression

6.3 Meltdown in Practice

6.4 Limitations on ARM and AMD

7 Countermeasures

7.1 Hardware

7.2 KAISER

8 Discussion

9 Conclusion

Acknowledgment

References

英文原文


Meltdown


熔断

Moritz Lipp1, Michael Schwarz1, Daniel Gruss1, Thomas Prescher2, Werner Haas2,

Stefan Mangard1, Paul Kocher3, Daniel Genkin4, Yuval Yarom5, Mike Hamburg6

  • 1 Graz University of Technology
  • 2 Cyberus Technology GmbH
  • 3 Independent
  • 4 University of Pennsylvania and University of Maryland
  • 5 University of Adelaide and Data61
  • 6 Rambus, Cryptography Research Division

 

Abstract


摘要

The security of computer systems fundamentally relies on memory isolation, e.g., kernel address ranges are marked as non-accessible and are protected from user access. In this paper, we present Meltdown. Meltdown exploits side effects of out-of-order execution on modern processors to read arbitrary kernel-memory locations including personal data and passwords. Out-of-order execution is an indispensable performance feature and present in a wide range of modern processors. The attack is independent of the operating system, and it does not rely on any software vulnerabilities. Meltdown breaks all security assumptions given by address space isolation as well as paravirtualized environments and, thus, every security mechanism building upon this foundation. On affected systems, Meltdown enables an adversary to read memory of other processes or virtual machines in the cloud without any permissions or privileges, affecting millions of customers and virtually every user of a personal computer. We show that the KAISER defense mechanism for KASLR [8] has the important (but inadvertent) side effect of impeding Meltdown. We stress that KAISER must be deployed immediately to prevent large-scale exploitation of this severe information leakage.

计算机系统的安全性根本上依赖于内存隔离,例如,被保护的内核地址范围将被标记为用户不可访问的。在本篇论文中,我们将介绍“熔断”漏洞。熔断利用乱序执行对现代处理器带来的副作用,从而可以读取任意内核内存地址,包括隐私数据和密码。乱序执行是不可或缺的性能特征,并且广泛运用于绝大多数现代处理器当中。这类攻击独立于操作系统,并且不依赖于任何软件漏洞。熔断破坏了所有由地址空间隔离、半虚拟化环境提供的安全假设,和所有以此为基础所构建的安全措施。在受影响的系统上,熔断能够在没有任何允许和权限的云上读取其他进程或虚拟机的内存,触及数百万用户和几乎每个人的个人电脑。我们发现对KASLR的KAISER防御机制[8],在防御熔断上有非常重要(但不经意)的的额外效果。我们强烈建议,必须立刻部署KAISER以防止这样严重的信息泄露BUG的被大规模利用。

 

1 Introduction


1 绪论

One of the central security features of today’s operating systems is memory isolation. Operating systems ensure that user applications cannot access each others memories and prevent user applications from reading or writing kernel memory. This isolation is a cornerstone of our computing environments and allows running multiple applications on personal devices or executing processes of multiple users on a single machine in the cloud.

对于今天的操作系统而言,最重要的安全特征之一就是内存隔离。操作系统保证用户程序彼此之间无法访问对方的内存,并且阻止用户系统对内核内存进行读写操作。这样的隔离机制是个人机器上允许并行多个应用程序、在云上的单机器能够支持多用户同时执行进程,以及整个计算环境的基石。

On modern processors, the isolation between the kernel and user processes is typically realized by a supervisor bit of the processor that defines whether a memory page of the kernel can be accessed or not. The basic idea is that this bit can only be set when entering kernel code and it is cleared when switching to user processes. This hardware feature allows operating systems to map the kernel into the address space of every process and to have very efficient transitions from the user process to the kernel, e.g., for interrupt handling. Consequently, in practice, there is no change of the memory mapping when switching from a user process to the kernel.

在现代处理器上,内核和用户进程之间的隔离通常由处理器的一个标记位实现,该标记位定义了内核的内存页面是否可以访问。其基本思想是只有当进入内核代码时才能设置该位,而切换到用户进程时该位被清除。这样的硬件特性允许操作系统将内核映射到每个进程的地址空间,并且从用户进程到内核的转换非常高效,例如中断处理。因此,在实际操作中,从用户进程切换到内核时不会改变内存映射。

In this work, we present Meltdown. Meltdown is a novel attack that allows overcoming memory isolation completely by providing a simple way for any user process to read the entire kernel memory of the machine it executes on, including all physical memory mapped in the kernel region. Meltdown does not exploit any software vulnerability, i.e., it works on all major operating systems. Instead, Meltdown exploits side-channel information available on most modern processors, e.g., modern Intel micro-architectures since 2010 and potentially on other CPU’s of other vendors.

在这项工作中,我们将介绍熔断。熔断是一种是一种新颖的攻击方式,它允许任何用户进程读取执行机器上的整个内核内存,包括映射到内核区域的所有物理内存,从而完全攻克内存隔离。熔断不利用任何软件漏洞,所以它几乎能够在所有主流的操作系统上起作用。相反,它利用了大多数现代处理器(例如,自2010年以来的现代英特尔微体系结构,可能还有其他供应商提供的CPU)上的旁路通道信息

While side-channel attacks typically require very specific knowledge about the target application and are tailored to only leak information about its secrets, Meltdown allows an adversary who can run code on the vulnerable processor to obtain a dump of the entire kernel address space, including any mapped physical memory. The root cause of the simplicity and strength of Meltdown are side effects caused by out-of-order execution.

虽然旁路通道攻击需要目标应用程序非常具体的相关信息,并且还要针对泄露的隐私数据进行特化处理,但熔断能让那些能够在有漏洞的处理器上执行代码的黑客,获取整个内核地址空间的转储,包括任何映射的物理内存。熔断简易且强悍的根本原因正是乱序执行所带来的副作用。

Out-of-order execution is an important performance feature of today’s processors in order to overcome latencies of busy execution units, e.g., a memory fetch unit needs to wait for data arrival from memory. Instead of stalling the execution, modern processors run operations out-of-order i.e., they look ahead and schedule subsequent operations to idle execution units of the processor. However, such operations often have unwanted side-effects, e.g., timing differences [28, 35, 11] can leak information from both sequential and out-of-order execution.

乱序执行是现代处理器一个重要的性能特性,用以克服过于繁忙的执行单元在等待上的延迟,比如一个内存读取单元需要等待内存数据的到达。现代处理器将不会拖延整个程序的执行,而是不按顺序执行操作,即向前看,并将后续操作安排到处理器的空闲执行单元。可是,有些操作通常会带来非预期的副作用,例如时序差异[28][35][11],会从顺序执行和乱序执行之中泄露信息。

From a security perspective, one From a security perspective, one observation is particularly significant: Out-of-order; vulnerable CPUs allow an unprivileged process to load data from a privileged (kernel or physical) address into a temporary CPU register. Moreover, the CPU even performs further computations based on this register value, e.g., access to an array based on the register value. The processor ensures correct program execution, by simply discarding the results of the memory look-ups (e.g., the modified register states), if it turns out that an instruction should not have been executed. Hence, on the architectural level (e.g., the abstract definition of how the processor should perform computations), no security problem arises.

从安全角度来看,一个观察是特别重要的:乱序;有漏洞的CPU允许非特权进程将数据从特权(内核或物理)地址加载到临时CPU寄存器中。此外,CPU甚至能根据该寄存器值进行更多的计算,例如,基于寄存器值访问数组。处理器通过简单地丢弃存储器查找的结果(例如,修改的寄存器状态)来确保正确的程序执行,如果查询结果是某条指令不应该执行的话。 因此,在架构级别(例如,处理器如何执行计算的抽象定义),不会出现安全问题。

However, we observed that out-of-order memory look-ups influence the cache, which in turn can be detected through the cache side channel. As a result, an attacker can dump the entire kernel memory by reading privileged memory in an out-of-order execution stream, and transmit the data from this elusive state via a micro-architectural covert channel (e.g., Flush+Reload) to the outside world. On the receiving end of the covert channel, the register value is reconstructed. Hence, on the micro-architectural level (e.g., the actual hardware implementation), there is an exploitable security problem.

但是,我们观察到乱序的内存查找会影响缓存,而缓存又可以从缓存侧通道被窥视。最终,通过读取乱序执行流中的特权存储器,攻击者可以转储出整个内核内存,并且通过微体系结构隐蔽信道(例如Flush + Reload),将这些原本处于难以捉摸状态下的数据传输到外界。在隐蔽通道的接收端,寄存器值被重构。 因此,在微体系结构层面(例如,实际的硬件实现层面),存在可利用的安全问题。

Meltdown breaks all security assumptions given by the CPU’s memory isolation capabilities. We evaluated the attack on modern desktop machines and laptops, as well as servers in the cloud. Meltdown allows an unprivileged process to read data mapped in the kernel address space, including the entire physical memory on Linux and OS X, and a large fraction of the physical memory on Windows. This may include physical memory of other processes, the kernel, and in case of kernel-sharing sandbox solutions (e.g., Docker, LXC) or Xen in paravirtualization mode, memory of the kernel (or hypervisor), and other co-located instances. While the performance heavily depends on the specific machine, e.g., processor speed, TLB and cache sizes, and DRAM speed, we can dump kernel and physical memory with up to 503KB/s. Hence, an enormous number of systems are affected.

熔断破坏了所有由CPU内存隔离能力提供的安全假设。我们评估了此攻击对台式机、笔记本以及云服务器的影响。熔断允许非特权进程读取映射在内核地址空间内的数据,包括Linux和OS X上的整个物理内存以及Windows上的大部分物理内存。这可能包括其他进程的物理内存,内核,内核共享沙盒解决方案(例如,Docker,LXC)和半虚拟化Xen,内核(或虚拟机监视器)的存储器以及其他共享地址的实例。尽管转储的性能在很大程度上取决于特定的机器,例如处理器速度,TLB和缓存大小,以及DRAM速度,我们仍然可以以高达503KB/s的速度转储内核和物理内存。因此,大批量的系统将受到影响。KAISER[8]对策最初是为了防止针对KASLR的旁路攻击而展开的,但它也无意中抵御了熔断。 我们的评估表明,KAISER在很大程度上抵御了熔断。 因此,我们强调,立即在所有操作系统上部署KAISER是非常重要的举措。幸运的是,在负责任的揭开漏洞的同时,三大操作系统(Windows,Linux和OS X)也实现了KAISER的变体,并将在不久的将来推出这些补丁。

Meltdown is distinct from the Spectre Attacks [19] in several ways, notably that Spectre requires tailoring to the victim process’s software environment, but applies more broadly to CPUs and is not mitigated by KAISER.

熔毁在几个方面与“幽灵”攻击 [19]截然不同,值得注意的是,幽灵需要根据受害者进程的软件环境进行特化,但是更广泛地作用于CPU,而KAISER无法对其进行缓解。

 

Contributions. The contributions of this work are:

贡献。这篇论文的贡献在于:

  • 1. We describe out-of-order execution as a new, extremely powerful, software-based side channel.
  • 1. 我们将乱序执行定义为一种新的,非常强大的,基于软件的旁路通道;
  • 2. We show how out-of-order execution can be combined with a micro-architectural overt channel to transfer the data from an elusive state to a receiver on the outside.
  • 2. 我们展示了如何将乱序执行与微架构隐秘通道结合起来将数据从混乱的状态转移到外部的接收器上;
  • 3. We present an end-to-end attack combining out-of-order execution with exception handlers or TSX, to read arbitrary physical memory without any permissions or privileges, on laptops, desktop machines, and on public cloud machines.
  • 3. 在笔记本电脑,台式计算机以及云服务器上,我们提出了一种端到端的攻击,将乱序执行与异常处理或TSX组合在一起,无需任何权限或特权即可读取任意物理内存;
  • 4. We evaluate the performance of Meltdown and the effects of KAISER on it.
  • 4. 我们评估了熔断的性能以及KAISER对它的影响。

 

Outline. The remainder of this paper is structured as follows:

大纲。 本文的其余部分结构如下:

In Section 2, we describe the fundamental problem which is introduced with out-of-order execution. In Section 3, we provide a toy example illustrating the side channel Meltdown exploits. In Section 4, we describe the building blocks of the full Meltdown attack. In Section 5, we present the Meltdown attack. In Section 6, we evaluate the performance of the Meltdown attack on several different systems. In Section 7, we discuss the effects of the software-based KAISER countermeasure and propose solutions in hardware. In Section 8, we discuss related work and conclude our work in Section 9.

在第2节中,我们描述了乱序执行引入的基本问题。在第3节中,我们提供了一个简单示例,说明了旁路熔断漏洞。 在第4节中,我们描述了熔断攻击的完整组成结构。 在第5节中,我们将介绍熔断攻击。在第6节中,我们评估了几种不同系统上熔断攻击的性能。在第7节中,我们讨论基于软件的KAISER策略的效果,并提出硬件解决方案。在第8节中,我们讨论相关的论文,并在第9节结束本篇论文。

 

2 Background


2 背景

In this section, we provide background on out-of-order execution, address translation, and cache attacks.

在本节中,我们将讨论有关乱序执行、地址转换和缓存攻击的背景。

 

2.1 Out-of-order execution


2.1 乱序执行

Out-of-order execution is an optimization technique that allows to maximize the utilization of all execution units of a CPU core as exhaustive as possible. Instead of processing instructions strictly in the sequential program order, the CPU executes them as soon as all required resources are available. While the execution unit of the current operation is occupied, other execution units can run ahead. Hence, instructions can be run in parallel as long as their results follow the architectural definition.

乱序执行是一种优化技术,它尽最可能最大限度的利用CPU内核的所有执行单元。对比于程序严格的按照顺序执行指令,CPU需要所有必需的资源准备完毕才可执行。当前操作的执行单元被占用时,其他的执行单元可以继续向前。因此,只要指令执行的结果符合架构定义,那么它们就能够运行。

In practice, CPUs supporting out-of-order execution support running operations speculatively to the extent that the processor’s out-of-order logic processes instructions before the CPU is certain whether the instruction will be needed and committed. In this paper, we refer to speculative execution in a more restricted meaning, where it refers to an instruction sequence following a branch, and use the term out-of-order execution to refer to any way of getting an operation executed before the processor has committed the results of all prior instructions.

在实际情况中,支持乱序执行的CPU允许进行推测性的执行操作,某种程度下,在CPU明确知道是否需要和提交指令之前,处理器的乱序逻辑就会处理该指令。在此篇论文中,我们将推测执行解释成更加具有限制性的含义,它专指某一分支之后的指令序列,并且使用乱序执行这一术语去指代任何在处理器提交前一指令的执行结果之前执行的操作。

In 1967, Tomasulo [33] developed an algorithm [33] that enabled dynamic scheduling of instructions to allow out-of-order execution. Tomasulo [33] introduced a unified reservation station that allows a CPU to use a data value as it has been computed instead of storing it to a register and re-reading it. The reservation station renames registers to allow instructions that operate on the same physical registers to use the last logical one to solve read-after-write (RAW), write-after-read (WAR) and write-after-write (WAW) hazards. Furthermore, the reservation unit connects all execution units via a common data bus (CDB). If an operand is not available, the reservation unit can listen on the CDB until it is available and then directly begin the execution of the instruction.

1967年,Tomasulo[33]发明了一种能够动态调度指令以允许乱序执行的算法[33]。他引入了一个统一的保留段,允许CPU使用已经计算过的数值,而不是将其存储到寄存器中并重新读取。这一保留段重命名了寄存器,以允许在相同的物理寄存器执行的指令使用最后一个逻辑寄存器来解决读后写(RAW)、写后读(WAR)、写后写(WAW)的风险。此外,保留段通过公共数据总线(CDB)链接所有的执行单元。如果一个指令不可用,则保留段可以在CDB上监听直到其可用,然后直接开始执行。

On the Intel architecture, the pipeline consists of the front-end, the execution engine (back-end) and the memory subsystem [14]. x86 instructions are fetched by the front-end from the memory and decoded to micro-operations (μOPs) which are continuously sent to the execution engine. Out-of-order execution is implemented within the execution engine as illustrated in Figure 1. The Reorder Buffer is responsible for register allocation, register renaming and retiring. Additionally, other optimizations like move elimination or the recognition of zeroing idioms are directly handled by the reorder buffer. The μOPs are forwarded to the Unified Reservation Station that queues the operations on exit ports that are connected to Execution Units. Each execution unit can perform different tasks like ALU operations, AES operations, address generation units (AGU) or memory loads and stores. AGUs as well as load and store execution units are directly connected to the memory subsystem to process its requests.

在Intel的结构中,流水线由前端,执行引擎(后端)和内存子系统[14]组成。 x86指令由前端从存储器中提取并解码为连续发送给执行引擎的微操作(μOPs)。乱序执行将落地到执行引擎中,如图1所示。重排序缓冲器负责寄存器分配,寄存器重命名和注销。另外,其他优化措施如移动消除或归零成语的识别直接由重排序缓冲器处理。 μOPs被转发到统一保留段,将连接到执行单元的出端口上的操作排入队列。每个执行单元可以执行不同的任务,如ALU操作,AES操作,地址生成单元(AGU)或内存加载和存储。 AGUs以及加载和存储执行单元将直连到内存子系统去处理其请求。

Figure 1: Simplified illustration of a single core of the Intel’s Skylake microarchitecture. Instructions are decoded into mOPs and executed out-of-order in the execution engine by individual execution units.
图1:英特尔Skylake微体系结构的单核简图。 指令被解码为微操作并由执行引擎通过单独的执行单元乱序执行。

Since CPUs usually do not run linear instruction streams, they have branch prediction units that are used to obtain an educated guess of which instruction will be executed next. Branch predictors try to determine which direction of a branch will be taken before its condition is actually evaluated. Instructions that lie on that path and do not have any dependencies can be executed in advance and their results immediately used if the prediction was correct. If the prediction was incorrect, the reorder buffer allows to rollback by clearing the reorder buffer and re-initializing the unified reservation station.

由于CPU通常不运行线性指令流,因此它们具有分支预测单元,用于获得接下来将执行哪条指令的有根据的猜测。分支预测器会尝试在实际评估条件之前确定分支的哪个方向将被采用。如果预测正确,则可以预先执行位于该路径上且没有任何依赖关系的指令,并立即使用它们的结果。如果预测不正确,重排序缓冲区允许通过清除重排序缓冲并重新初始化统一预留段来回滚。

Various approaches to predict the branch exist: With static branch prediction [12], the outcome of the branch is solely based on the instruction itself. Dynamic branch prediction [2] gathers statistics at run-time to predict the outcome. One-level branch prediction uses a 1-bit or 2-bit counter to record the last outcome of the branch [21]. Modern processors often use two-level adaptive predictors [36] that remember the history of the last n outcomes allow to predict regularly recurring patterns. More recently, ideas to use neural branch prediction [34, 18, 32] have been picked up and integrated into CPU architectures [3].

预测分支的各种方法:静态分支预测[12],分支预测的结果完全基于指令本身;动态分支预测[2],在运行时收集统计数据来预测结果;一级分支预测,使用1位或2位计数器记录分支的最后结果[21];现代处理器通常使用两级自适应预测[36],通过记忆最后n个历史结果来预测规律性循环模式。最近,神经网络分支预测[34][18][32]理念已网罗并集成到CPU架构中[3]。

 

2.2 Address Spaces


2.2 地址空间

To isolate processes from each other, CPUs support virtual address spaces where virtual addresses are translated to physical addresses. A virtual address space is divided into a set of pages that can be individually mapped to physical memory through a multi-level page translation table. The translation tables define the actual virtual to physical mapping and also protection properties that are used to enforce privilege checks, such as readable,writable, executable and user-accessible. The currently used translation table that is held in a special CPU register. On each context switch, the operating system updates this register with the next process’translation table address in order to implement per process virtual address spaces. Because of that, each process can only reference data that belongs to its own virtual address space. Each virtual address space itself is split into a user and a kernel part. While the user address space can be accessed by the running application, the kernel address space can only be accessed if the CPU is running in privileged mode. This is enforced by the operating system disabling the user-accessible property of the corresponding translation tables. The kernel address space does not only have memory mapped for the kernel’s own usage, but it also needs to perform operations on user pages, e.g., filling them with data. Consequently, the entire physical memory is typically mapped in the kernel. On Linux and OS X, this is done via a direct-physical map, i.e., the entire physical memory is directly mapped to a pre-defined virtual address (cf. Figure 2).

为了将进程彼此隔离,CPU支持将虚拟地址转换为物理地址的虚拟地址空间。虚拟地址空间被分成一组页面,可以通过多级页面转换表单独映射到物理内存。转换表定义了实际的虚拟到物理映射,以及用于强制权限校验的保护属性,如可读,可写,可执行和用户可访问。当前使用的转换表保存在特殊的CPU寄存器中。在每个上下文切换时,操作系统用下一个进程的转换表地址更新该寄存器,以便实现每个进程的虚拟地址空间。因此,每个进程只能引用属于自己虚拟地址空间的数据每个虚拟地址空间本身被分割成用户部分和内核部分。虽然用户地址空间可以被正在运行的应用程序访问,但只有当CPU以特权模式运行时才能访问内核地址空间。这通过操作系统禁用相应转换表的用户可访问属性来强制执行。内核地址空间不仅具有为内核自身使用而映射的内存,还需要在用户页面上执行操作,例如填充数据。因此,整个物理内存通常映射到内核中。在Linux和OS X上,这是通过直接物理映射完成的,即整个物理内存直接映射到预定义的虚拟地址(参见图2)。

Figure 2: The physical memory is directly mapped in the kernel at a certain offset. A physical address (blue) which is mapped accessible to the user space is also mapped in the kernel space through the direct mapping.
图2:物理内存以一定的偏移量直接映射到内核中。用户空间可访问的物理地址(蓝色)也通过直接映射映射到内核空间。
 

Instead of a direct-physical map, Windows maintains a multiple so-called paged pools, non-paged pools, and the system cache. These pools are virtual memory regions in the kernel address space mapping physical pages to virtual addresses which are either required to remain in the memory (non-paged pool) or can be removed from the memory because a copy is already stored on the disk (paged pool). The system cache further contains mappings of all file-backed pages. Combined, these memory pools will typically map a large fraction of the physical memory into the kernel address space of every process.

Windows不使用直接物理映射,而是维护多个所谓的分页池,非分页池和系统高速缓存。这些“池”是内核地址空间中的虚拟内存区域,将物理页面映射到那些要求保留在内存中的(非分页池),或者因为副本已存储在磁盘上,而可以从内存中删除的(分页池)虚拟地址上。系统缓存还包含所有文件支持页面的映射。综合来看,这些内存池通常会将大部分物理内存映射到每个进程的内核地址空间。

The exploitation of memory corruption bugs often requires the knowledge of addresses of specific data. In order to impede such attacks, address space layout randomization (ASLR) has been introduced as well as non-executable stacks and stack canaries. In order to protect the kernel, KASLR randomizes the offsets where drivers are located on every boot, making attacks harder as they now require to guess the location of kernel data structures. However, side-channel attacks allow to detect the exact location of kernel data structures [9, 13, 17] or derandomize ASLR in JavaScript [6]. A combination of a software bug and the knowledge of these addresses can lead to privileged code execution.

内存损坏漏洞的利用一般需要特定数据的地址信息。为了阻止这种攻击,人们已经引入了地址空间布局随机化(ASLR)以及非可执行堆栈和堆栈金丝雀类比于煤矿中的用于探测有毒气体的金丝雀,用于在恶意代码执行之前检测堆栈缓冲区溢出,译者注)。为了保护内核,KASLR随机化每次启动时驱动程序所在的偏移量,使得黑客们需要猜测内核数据结构的位置,从而让攻击变得困难。 然而,旁路攻击允许检测内核数据结构的确切位置[9][13][17],或者使用JavaScript使得ASLR去随机化[6]。软件bug和这些地址信息相结合,就可以导致特权代码的执行。

 

2.3 Cache Attacks


2.3 缓存攻击

In order to speed-up memory accesses and address translation, the CPU contains small memory buffers, called caches, that store frequently used data. CPU caches hide slow memory access latencies by buffering frequently used data in smaller and faster internal memory. Modern CPUs have multiple levels of caches that are either private to its cores or shared among them. Address space translation tables are also stored in memory and are also cached in the regular caches.

为了加速内存访问和地址转换,CPU包含小内存缓冲区,称为缓存,用于存储经常使用的数据。 CPU缓存通过在较小和较快的内部存储器中缓存常用数据来规避低速内存访问延迟。 现代CPU具有多个级别的缓存,这些缓存对于其内核是私有的,或者是在内核之间共享的。地址空间转换表存储在内存中,也被缓存在常规缓存中。

Cache side-channel attacks exploit timing differences that are introduced by the caches. Different cache attack techniques have been proposed and demonstrated in the past, including Evict+Time [28], Prime+Probe [28, 29], and Flush+Reload [35]. Flush+Reload attacks work on a single cache line granularity. These attacks exploit the shared, inclusive last-level cache. An attacker frequently flushes a targeted memory location using the clflush instruction. By measuring the time it takes to reload the data, the attacker determines whether data was loaded into the cache by another process in the meantime. The Flush+Reload attack has been used for attacks on various computations, e.g., cryptographic algorithms [35, 16, 1], web server function calls [37], user input [11, 23, 31], and kernel addressing information [9].

缓存旁路攻击利用了高速缓存引入的时序差异。这些年来,人们已经提出并演示了许多不同的缓存攻击手段,包括Evict + Time[28],Prime + Probe [28][29]和Flush + Reload [35]。Flush + Reload攻击在单个缓存线粒度上工作。这些攻击利用共享的,包含最后一级的缓存。攻击者通常使用clflush指令刷新目标内存位置。通过测量重新加载数据所需的时间,攻击者可以确定数据是否由另一个进程同时加载到缓存中。Flush + Reload攻击已被用于各种计算的攻击,例如加密算法[35][16][1],Web服务器函数调用[37],用户输入[11][23][31]和内核寻址信息[9]。

A special use case are covert channels. Here the attacker controls both, the part that induces the side effect, and the part that measures the side effect. This can be used to leak information from one security domain to another, while bypassing any boundaries existing on the architectural level or above. Both Prime+Probe and Flush+ Reload have been used in high-performance covert channels [24, 26, 10].

一个特殊攻击手段是隐蔽信道:攻击者在此控制导致副作用的部分和测量副作用的部分。这可用于将信息从一个安全域泄漏到另一个安全域,同时绕过架构级或更高级别上存在的任何边界。 Prime + Probe和Flush + Reload攻击手段都利用了高性能隐蔽通道[24][26][10]。

 

3 A Toy Example


3 一个玩具示例

In this section, we start with a toy example, a simple code snippet, to illustrate that out-of-order execution can change the microarchitecturall state in a way that leaks information. However, despite its simplicity, it is used as a basis for Section 4 and Section 5, where we show how this change in state can be exploited for an attack.

在本节中,我们从一个玩具示例,一个简单的代码片断开始,来说明乱序执行可能会以某种方式改变微架构状态,从而泄露信息。然而,尽管它简单,但它是第4节和第5节的基础,在其中我们展示了这种状态变化是如何被利用来进行攻击的。

Listing 1 shows a simple code snippet first raising an (unhandled) exception and then accessing an array. The property of an exception is that the control flow does not continue with the code after the exception, but jumps to an exception handler in the operating system. Regardless of whether this exception is raised due to a memory access, e.g., by accessing an invalid address, or due to any other CPU exception, e.g., a division by zero, the control flow continues in the kernel and not with the next user space instruction.

清单1列出了一个简单的代码片段,首先引发一个(未处理的)异常,然后访问一个数组。异常的特性是,控制流不会在异常之后继续执行代码,而是跳转到操作系统中的异常处理程序。无论是由于内存访问(例如,通过访问无效地址)还是由于任何其他CPU异常(例如除以零)而引起此异常,控制流都继续在内核中并且不会处理下一个用户空间中的指令。

raise_exception();
// the line below is never reached
access(probe_array[data * 4096]);

Listing 1: A toy example to illustrate side-effects of outof-order execution.

清单1:一个玩具示例,用于说明无序执行的副作用。

Thus, our toy example cannot access the array in theory, as the exception immediately traps to the kernel and terminates the application. However, due to the out-of-order execution, the CPU might have already executed the following instructions as there is no dependency on the exception. This is illustrated in Figure 3. Due to the exception, the instructions executed out of order are not retired and, thus, never have architectural effects.

因此,我们的玩具示例在理论上不能访问数组,因为异常将立即陷入内核并终止应用程序。但是,由于乱序执行,CPU可能已经执行了以下访问数组的指令,因为它们不存在对异常的依赖。这在图3中进行了说明。由于异常,乱序执行的指令不会正常退出,因此绝不会有完整执行的程序结果。

Figure 3: If an executed instruction causes an exception, diverting the control flow to an exception handler, the subsequent instruction must not be executed. Due to outof- order execution, the subsequent instructions may already have been partially executed, but not retired. However, architectural effects of the execution are discarded.
图3:如果执行的指令导致异常,使得控制流转移到异常处理程序,那么就不能再执行后续的指令。但是由于乱序执行,后续的指令可能已经部分执行但没有退出,则执行出的程序结果将被丢弃。

Although the instructions executed out of order do not have any visible on registers or memory, they have microarchitecturall side effects. During the out-of-order execution, the referenced memory is fetched into a register and is also stored in the cache. If the out-of-order execution has to be discarded, the register and memory contents are never committed. Nevertheless, the cached memory contents are kept in the cache. We can leverage a microarchitecturall side-channel attack such as Flush+Reload [35], which detects whether a specific memory location is cached, to make this microarchitecturall state visible. There are other side channels as well which also detect whether a specific memory location is cached, including Prime+Probe [28, 24, 26], Evict+ Reload [23], or Flush+Flush [10]. However, as Flush+ Reload is the most accurate known cache side channel and is simple to implement, we do not consider any other side channel for this example.

尽管乱序执行的指令对寄存器或存储器没有任何可见的架构效应,但它们具有微观架构的副作用。在乱序执行期间,引用的内存被提取到一个寄存器中,并且也存储在缓存中。如果乱序执行必须被丢弃,寄存器和存储器内容不会被提交。尽管如此,缓存的内容仍保存在缓存中。我们可以利用微观架构的旁路攻击,比如Flush + Reload [35]来检测特定的内存位置是否被缓存,以使这个微观架构状态可见。还有其他的旁路信道也可以检测是否缓存了特定的内存位置,包括Prime + Probe [28,24,26],Evict + Reload [23]或Flush + Flush [10]。但是,由于Flush + Reload是最准确的已知缓存旁路信道,并且易于实现,所以本例中我们不考虑任何其他旁路。

Based on the value of data in this toy example, a different part of the cache is accessed when executing the memory access out of order. As data is multiplied by 4096, data accesses to probe array are scattered over the array with a distance of 4 KB (assuming an 1 B data type for probe array). Thus, there is an injective mapping from the value of data to a memory page, i.e., there are no two different values of data which result in an access to the same page. Consequently, if a cache line of a page is cached, we know the value of data. The spreading over different pages eliminates false positives due to the prefetcher, as the prefetcher cannot access data across page boundaries [14].

根据这个玩具示例中的数据值,乱序执行时内存将访问高速缓存的不同部分。当数据乘以4096时,对探针数组的数据访问分散在阵列上,距离为4KB(假设探针数组为1 B数据类型)。因此,存在一个从数值到内存页的单射映射,即不存在导致访问相同页面的两个不同数据值。所以如果页内的缓存行被缓存,我们将由此得知数据的具体值。由于预取程序不能跨越页面边界访问数据,因此将数据分布在不同的页面上能消除由于预取程序造成的误报[14]。

Figure 4 shows the result of a Flush+Reload measurement iterating over all pages, after executing the out-of-order snippet with data = 84. Although the array access should not have happened due to the exception, we can clearly see that the index which would have been accessed is cached. Iterating over all pages (e.g., in the exception handler) shows only a cache hit for page 84 This shows that even instructions which are never actually executed, change the microarchitecturall state of the CPU. Section 4 modifies this toy example to not read a value, but to leak an inaccessible secret.

图4显示了在使用data=84的值执行乱序代码片段之后,Flush + Reload遍历所有页面的测量结果。尽管由于异常而不应该发生数组访问,但是我们可以清楚地看到索引将被访问的内容被缓存。例如,在异常处理程序中遍历所有页面,可以看到仅有第84页的高速缓存命中。这表明即使是从未实际执行的指令也会改变CPU的微架构状态。第4节中将修改这个玩具例子为不读取任何值,但却依然泄漏了一个本来无法访问的保密值。

Figure 4: Even if a memory location is only accessed during out-of-order execution, it remains cached. Iterating over the 256 pages of probe array shows one cache hit, exactly on the page that was accessed during the outof-order execution.
图4:即使一个内存位置只在乱序执行期间被访问,它仍然被缓存。对256个探针数组进行迭代显示一个缓存命中,正好在乱序执行期间访问的页面上。

 

4 Building Blocks of the Attack


4 构建攻击模块

The toy example in Section 3 illustrated that side-effects of out-of-order execution can modify the microarchitecturall state to leak information. While the code snippet reveals the data value passed to a cache-side channel, we want to show how this technique can be leveraged to leak otherwise inaccessible secrets. In this section, we want to generalize and discuss the necessary building blocks to exploit out-of-order execution for an attack.

第3节中的玩具示例说明了无序执行的副作用可以修改微架构状态从而泄漏信息。虽然代码片段揭示了传递给缓存端通道的数据值,但我们想要展示如何利用这种技术来泄露绝密信息。在本节中,我们想概括和讨论必要的利用乱序执行的攻击模块。

The adversary targets a secret value that is kept somewhere in physical memory. Note that register contents are also stored in memory upon context switches, i.e.,they are also stored in physical memory. As described in Section 2.2, the address space of every process typically includes the entire user space, as well as the entire kernel space, which typically also has all physical memory (in-use) mapped. However, these memory regions are only accessible in privileged mode (cf. Section 2.2).

对手的目标是保存在物理内存某处的保密值。注意,寄存器内容也在上下文切换时被存储在存储器中,即,它们也被存储在物理内存中。如第2.2节所述,每个进程的地址空间通常包含整个用户空间以及整个内核空间,这些空间通常也映射了所有(正在使用中的)物理内存。但是,这些内存区域只能在特权模式下访问(参见第2.2节)。

In this work, we demonstrate leaking secrets by bypassing the privileged-mode isolation, giving an attacker full read access to the entire kernel space including any physical memory mapped, including the physical memory of any other process and the kernel. Note that Kocher et al. [19] pursue an orthogonal approach, called Spectre Attacks, which trick speculative executed instructions into leaking information that the victim process is authorized to access. As a result, Spectre Attacks lack the privilege escalation aspect of Meltdown and require tailoring to the victim process’s software environment, but apply more broadly to CPUs that support speculative execution and are not stopped by KAISER.

在这项工作中,我们通过绕过特权模式隔离来证明保密的泄露,给攻击者完整的读取权限去访问整个内核空间,包括任何物理内存映射和任何其他进程和内核的物理内存。值得关注的是,Kocher 等人[19]研究一种正交性的攻击方法,称为“幽灵”,它通过诱骗执行指令来泄露受害者进程有权访问的信息。因此,幽灵攻击没有熔断的权限越级的特性,需要针对受害者进程的软件环境进行特化,但是更广泛地应用于支持推测性执行的CPU,并且不会被KAISER阻止。

The full Meltdown attack consists of two building blocks, as illustrated in Figure 5. The first building block of Meltdown is to make the CPU execute one or more instructions that would never occur in the executed path. In the toy example (cf. Section 3), this is an access to an array, which would normally never be executed, as the previous instruction always raises an exception. We call such an instruction, which is executed out of order, leaving measurable side effects, a transient instruction.Furthermore, we call any sequence of instructions containing at least one transient instruction a transient instruction sequence.

完整的熔断攻击由两个模块组成,如图5所示。熔断的第一个模块是让CPU执行一条或多条在执行路径中永远不会发生的指令。在玩具的例子中(参见第3节),这是对一个数组的访问,通常不会执行,因为前面的指令总是引发一个异常。我们把这样的指令称为乱序执行,留下了可测量的副作用,一个瞬态指令。此外,我们把任何包含至少一个瞬态指令的指令序列称为瞬态指令序列

Figure 5: The Meltdown attack uses exception handling or suppression, e.g., TSX, to run a series of transient instructions. These transient instructions obtain a (persistent) secret value and change the microarchitectural state of the processor based on this secret value. This forms the sending part of a microarchitectural covert channel. The receiving side reads the microarchitectural state, making it architectural and recovers the secret value.
图5:熔断攻击使用异常处理或抑制(例如TSX)来运行一系列瞬态指令。这些瞬态指令获得(永久的)保密值,并基于该保密值改变处理器的微架构状态。这构成了微架构隐蔽通道的发送部分。接收端读取微架构状态,从而构建和复原保密值。

In order to leverage transient instructions for an attack, the transient instruction sequence must utilize a secret value that an attacker wants to leak. Section 4.1 describes building blocks to run a transient instruction sequence with a dependency on a secret value.

为了利用瞬态指令进行攻击,瞬态指令序列必须利用攻击者想要使之泄漏的保密值。4.1节中介绍了用以运行一个依赖于保密值的瞬态指令序列的模块。

The second building block of Meltdown is to transfer the microarchitecturall side effect of the transient instruction sequence to an architectural state to further process the leaked secret. Thus, the second building described in Section 4.2 describes building blocks to transfer a microarchitecturall side effect to an architectural state using a covert channel.

熔断的第二个模块是将瞬态指令序列的微架构副作用转移到架构状态,以进一步处理泄露的保密值。 4.2节中介绍了使用隐蔽通道将微架构副作用转移到架构状态的第二个模块。

 

4.1 Executing Transient Instructions


4.1 执行瞬态指令

The first building block of Meltdown is the execution of transient instructions. Transient instructions basically occur all the time, as the CPU continuously runs ahead of the current instruction to minimize the experienced latency and thus maximize the performance (cf. Section 2.1). Transient instructions introduce an exploitable side channel if their operation depends on a secret value. We focus on addresses that are mapped within the attacker’s process, i.e., the user-accessible user space addresses as well as the user-inaccessible kernel space addresses. Note that attacks targeting code that is executed within the context (i.e., address space) of another process are possible [19], but out of scope in this work, since all physical memory (including the memory of other processes) can be read through the kernel address space anyway.

熔断的第一个模块块就是执行瞬态指令。瞬态指令基本上每时每刻都在发生,因为CPU会在当前指令之前连续运行,以尽量减少等待执行的延迟,从而最大限度地提高性能(参见第2.1节)。如果瞬态指令的操作依赖于秘密值,则会引入可利用的旁路。我们主要关注那些映射到攻击者进程内的地址,即用户可访问的用户空间地址以及用户不可访问的内核空间地址。值得注意的是,针对在另一个进程的上下文(即地址空间)内执行的代码进行攻击是有可能的[19],但是这超出了本篇论文的范围,因为所有物理内存(包括其他进程的内存)无论如何都可以通过内核地址空间读取。

Accessing user-inaccessible pages, such as kernel pages, triggers an exception which generally terminates the application. If the attacker targets a secret at a user-inaccessible address, the attacker has to cope with this exception. We propose two approaches: With exception handling, we catch the exception effectively occurring after executing the transient instruction sequence, and with exception suppression, we prevent the exception from occurring at all and instead redirect the control flow after executing the transient instruction sequence. We discuss these approaches in detail in the following.

访问用户不可访问的页面(例如内核页面)通常会触发一个终止该应用程序的异常。如果攻击者以用户不可访问的地址为目标,那么他们必须应对这个异常。我们提出了两种方法:通过异常处理,在执行完瞬态指令序列之后捕捉到有效的异常;通过异常抑制,完全阻止异常发生,而在执行完瞬态指令序列之后重定向控制流。我们接下来将详细讨论这些方法。

 

Exception handling


异常处理

A trivial approach is to fork the attacking application before accessing the invalid memory location that terminates the process, and only access the invalid memory location in the child process. The CPU executes the transient instruction sequence in the child process before crashing. The parent process can then recover the secret by observing the microarchitecturall state, e.g., through a side-channel.

一个简单的方法是在攻击程序访问终止进程的无效内存位置之前就对进程克隆,然后只访问子进程中的无效内存位置。CPU在崩溃之前执行子进程中的瞬态指令序列。然后父进程可以通过观察微架构状态(例如通过旁路通道)来回复保密值。

It is also possible to install a signal handler that will be executed if a certain exception occurs, in this specific case a segmentation fault. This allows the attacker to issue the instruction sequence and prevent the application from crashing, reducing the overhead as no new process has to be created.

另一个方法则可以安装一个信号处理程序,如果发生某些异常,这个信号处理程序将被执行,在这个特定的情况下是分段错误。这允许攻击者发出指令序列并防止应用程序崩溃,从而减少了开销,因为这样不需要创建新的进程。

 

Exception suppression


异常抑制

A different approach to deal with exceptions is to prevent them from being raised in the first place. Transactional memory allows to group memory accesses into one seemingly atomic operation, giving the option to roll-back to a previous state if an error occurs. If an exception occurs within the transaction, the architectural state is reset, and the program execution continues without disruption.

处理异常的另一种方法是在第一时间阻止它们抛出。事务性内存允许将内存访问划分成为一个伪原子操作,如果发生错误,可以选择回滚到以前的状态。如果在事务中发生异常,架构状态将被重置,程序继续运行而不会中断。

Furthermore, speculative execution issues instructions that might not occur on the executed code path due to a branch misprediction. Such instructions depending on a preceding conditional branch can be speculatively executed. Thus, the invalid memory access is put within a speculative instruction sequence that is only executed if a prior branch condition evaluates to true. By making sure that the condition never evaluates to true in the executed code path, we can suppress the occurring exception as the memory access is only executed speculatively. This technique may require a sophisticated training of the branch predictor. Kocher et al. [19] pursue this approach in orthogonal work, since this construct can frequently be found in code of other processes.

此外,由于分支预测错误,推测执行可能会在执行的代码路径上执行原本不会发生的指令。这类依赖于之前的条件分支的指令可以被推测性地执行。因此,无效的内存访问被放置在一个仅当之前的分支条件评估为真时才执行的推测指令序列内。通过保证分支条件在执行的代码路径中从不计算为真,我们就可以抑制将要发生的异常,因为内存访问只是推测性地执行。这种技术可能需要对分支预测器进行复杂的训练。Kocher等人[19]在正交性工作中研究这个方法,因为这个构造经常可以在其他进程的代码中找到。

 

4.2 Building a Covert Channel


4.2建立一个隐蔽通道

The second building block of Meltdown is the transfer of the microarchitecturall state, which was changed by the transient instruction sequence, into an architectural state (cf. Figure 5). The transient instruction sequence can be seen as the sending end of a microarchitecturall covert channel. The receiving end of the covert channel receives the microarchitecturall state change and deduces the secret from the state. Note that the receiver is not part of the transient instruction sequence and can be a different thread or even a different process e.g., the parent process in the fork-and-crash approach.

熔断的第二个模块是通过瞬态指令序列将微架构状态转换为架构状态(参见图5)。瞬态指令序列可以看作微架构隐蔽通道的发送端。隐蔽通道的接收端接收微架构状态变化,并从此状态推导保密值。值得注意的是,接收器不是瞬态指令序列的一部分,它可以是不同的线程,甚至是不同的进程,例如fork-and-crash方法中的父进程。

We leverage techniques from cache attacks, as the cache state is a microarchitecturall state which can be reliably transferred into an architectural state using various techniques [28, 35, 10]. Specifically, we use Flush+ Reload [35], as it allows to build a fast and low-noise covert channel. Thus, depending on the secret value, the transient instruction sequence (cf. Section 4.1) performs a regular memory access, e.g., as it does in the toy example (cf. Section 3).

我们利用的技术来自于缓存攻击,因为缓存状态是一种微架构状态,可以使用各种技术将其可靠地转换为架构状态[28][35][10]。具体而言,我们使用Flush + Reload[35]方法,因为它允许建立一个快速,低噪声的隐蔽通道。因此,取决于保密值,瞬态指令序列(参见4.1节)执行规律的内存访问,就好像在玩具示例中展示的一样(参见第3节)。

After the transient instruction sequence accessed an accessible address, i.e., this is the sender of the covert channel; the address is cached for subsequent accesses. The receiver can then monitor whether the address has been loaded into the cache by measuring the access time to the address. Thus, the sender can transmit a ‘1’-bit by accessing an address which is loaded into the monitored cache, and a ‘0’-bit by not accessing such an address.

在瞬态指令序列访问一个可访问地址之后,即,这是隐蔽通道的发送器; 该地址被缓存以用于随后的访问。之后接收器通过测量地址的访问时间来监视地址是否已经加载到缓存中。因此,发送器可以通过访问被加载到所监视的高速缓存中的地址来发送“1”位,并且通过不访问这样的地址来发送“0”位。

Using multiple different cache lines, as in our toy example in Section 3, allows to transmit multiple bits at once. For every of the 256 different byte values, the sender accesses a different cache line. By performing a Flush+Reload attack on all of the 256 possible cache lines, the receiver can recover a full byte instead of just one bit. However, since the Flush+Reload attack takes much longer (typically several hundred cycles) than the transient instruction sequence, transmitting only a single bit at once is more efficient. The attacker can simply do that by shifting and masking the secret value accordingly.

使用多个不同的缓存行,就像我们在第3节中的玩具示例一样,我们可以一次传输多个位数据。对于256个不同字节值中的每一个,发送器都访问不同的缓存行。通过对所有256个可能的缓存行执行Flush + Reload攻击,接收者可以恢复一个完整的字节而不是一个位。然而,由于Flush + Reload攻击比瞬态指令序列花费更长的时间(通常为几百个循环),所以一次仅发送一个位就更有效率。攻击者可以简单地通过对保密值进行相应地移位和掩码来做到这一点。

Note that the covert channel is not limited to microarchitecturall states which rely on the cache. Any microarchitecturall state which can be influenced by an instruction (sequence) and is observable through a side channel can be used to build the sending end of a covert channel. The sender could, for example, issue an instruction (sequence) which occupies a certain execution port such as the ALU to send a ‘1’-bit. The receiver measures the latency when executing an instruction (sequence) on the same execution port. A high latency implies that the sender sends a ‘1’-bit, whereas a low latency implies that sender sends a ‘0’-bit. The advantage of the Flush+ Reload cache covert channel is the noise resistance and the high transmission rate [10]. Furthermore, the leakage can be observed from any CPU core [35], i.e., rescheduling events do not significantly affect the covert channel.

请注意,隐蔽通道不仅限于依赖缓存的微架构状态。任何可以被指令(序列)影响并且可以通过旁路信道观察到的微观架构状态都可以用来构建隐蔽信道的发送端。例如,发送器可以发出占用某个执行端口(如ALU)的指令(序列)来发送“1”位。接收器在同一执行端口上执行指令(序列)时测量延迟。高延迟意味着发送者发送“1”位,而低延迟意味着发送者发送“0”位。Flush + Reload缓存隐蔽通道的优点是抗噪声和高传输速率[10]。此外,泄漏可以从任何CPU内核[35]观察到,即重调度事件不会显着影响隐蔽信道。

 

5 Meltdown


5 熔断

In this section, present Meltdown, a powerful attack allowing to read arbitrary physical memory from an unprivileged user program, comprised of the building blocks presented in Section 4. First, we discuss the attack setting to emphasize the wide applicability of this attack. Second, we present an attack overview, showing how Meltdown can be mounted on both Windows and Linux on personal computers as well as in the cloud. Finally, we discuss a concrete implementation of Meltdown allowing to dump kernel memory with up to 503KB/s.

在本节中,如今的熔断,是一个强悍的攻击方式,允许从一个无权限的用户程序中读取任意的物理内存,这个程序由第四部分提供的模块组成。首先,我们讨论攻击配置以强调这种攻击的广泛适用性。其次,我们展示了一个攻击概述,展示了如何将熔断安装在Windows和Linux系统的个人电脑上,以及云服务器中。最后,我们讨论熔断的具体实现,它将允许以高达503KB / s的速度转储内核内存。

 

Attack setting.


攻击配置

In our attack, we consider personal computers and virtual machines in the cloud. In the attack scenario, the attacker has arbitrary unprivileged code execution on the attacked system, i.e., the attacker can run any code with the privileges of a normal user. However, the attacker has no physical access to the machine. Further, we assume that the system is fully protected with state-of-the-art software-based defenses such as ASLR and KASLR as well as CPU features like SMAP, SMEP, NX, and PXN. Most importantly, we assume a completely bug-free operating system, thus, no software vulnerability exists that can be exploited to gain kernel privileges or leak information. The attacker targets secret user data, e.g., passwords and private keys, or any other valuable information.

在此次攻击中,我们安排的受害者是个人电脑和云虚拟机。在此次攻击场景中,攻击者能够在被攻击的系统上执行任意非特权的代码,即攻击者可以以普通用户的权限运行任何代码,但攻击者没有实际的物理访问权限。此外,我们假设系统拥有诸如ASLR和KASLR等先进软件防御手段以及SMAP,SMEP,NX和PXN等CPU特性的充分保护。最重要的是,我们假设这是一个完全没有bug的操作系统,因此没有任何软件漏洞可以被利用来获得内核权限或泄露信息。攻击者针对保密用户数据,例如密码和私钥或任何其他有价值的信息。

 

5.1 Attack Description


5.1 攻击描述

Meltdown combines the two building blocks discussed in Section 4. First, an attacker makes the CPU execute a transient instruction sequence which uses an inaccessible secret value stored somewhere in physical memory (cf. Section 4.1). The transient instruction sequence acts as the transmitter of a covert channel (cf. Section 4.2), ultimately leaking the secret value to the attacker.

熔断包括在第4节中已讨论过的两个模块。首先,攻击者使CPU执行一个瞬态指令序列,使用物理内存中某处存储的不可访问的保密值(参见4.1节)。瞬态指令序列充当隐蔽信道的发送器(参见4.2节),最终将保密值泄露给攻击者。

Meltdown consists of 3 steps:

熔断包括了三个步骤:

  • Step 1 The content of an attacker-chosen memory location, which is inaccessible to the attacker, is loaded into a register.
  • 步骤1攻击者选择的,原本其无权访问的,内存地址的内容将被加载到寄存器中。
  • Step 2 A transient instruction accesses a cache line based on the secret content of the register.
  • 步骤2瞬态指令根据寄存器的保密内容访问高速缓存行。
  • Step 3 The attacker uses Flush+Reload to determine the accessed cache line and hence the secret stored at the chosen memory location.
  • 步骤3攻击者使用Flush + Reload来确定所访问的缓存行,从而确定存储在所选内存位置的保密值。

By repeating these steps for different memory locations, the attacker can dump the kernel memory, including the entire physical memory.

通过针对不同的内存位置重复这些步骤,攻击者可以转储内核内存,包括整个物理内存。

Listing 2 shows the basic implementation of the transient instruction sequence and the sending part of the covert channel, using x86 assembly instructions. Note that this part of the attack could also be implemented entirely in higher level languages like C. In the following, we will discuss each step of Meltdown and the corresponding code line in Listing 2.

清单2列出了使用x86汇编指令的瞬态指令序列和隐蔽通道发送端的基本实现。请注意,这部分攻击也可以完全在C这样的高级语言中实现。下面我们将讨论熔断的每一步以及清单2中相应的代码行。

Listing 2: The core of Meltdown. An inaccessible kernel address is moved to a register, raising an exception. Subsequent instructions are executed out of order before the exception is raised, leaking the data from the kernel address through the indirect memory access.
清单2:熔断的核心指令序列。一个无法访问的内核地址被移动到一个寄存器,引发一个异常中断。后续指令在引发异常之前已经被乱序执行,通过间接内存访问泄漏内核地址的内容。
; rcx = kernel address, rbx = probe array
xor rax, rax
retry:
mov al, byte [rcx]
shl rax, 0xc
jz retry
mov rbx, qword [rbx + rax]

 

Step 1: Reading the secret


步骤1:读取保密值

To load data from the main memory into a register, the data in the main memory is referenced using a virtual address. In parallel to translating a virtual address into a physical address, the CPU also checks the permission bits of the virtual address, i.e., whether this virtual address is user accessible or only accessible by the kernel. As already discussed in Section 2.2, this hardware-based isolation through a permission bit is considered secure and recommended by the hardware vendors. Hence, modern operating systems always map the entire kernel into the virtual address space of every user process.

为了将数据从主内存加载到寄存器中,主内存中的数据通过一个虚拟地址被引用。在将虚拟地址转换成物理地址的同时,CPU还将检查虚拟地址的许可位,即该虚拟地址是用户可访问的还是只能由内核访问的。正如2.2节中所讨论的,通过权限位进行的基于硬件的隔离被认为是安全的,并被硬件厂商推荐。所以现代操作系统总是将整个内核映射到每个用户进程的虚拟地址空间。

As a consequence, all kernel addresses lead to a valid physical address when translating them, and the CPU can access the content of such addresses. The only difference to accessing a user space address is that the CPU raises an exception as the current permission level does not allow to access such an address. Hence, the user space cannot simply read the contents of such an address. However, Meltdown exploits the out-of-order execution of modern CPUs, which still executes instructions in the small time window between the illegal memory access and the raising of the exception.

因此,所有内核地址在被翻译时都会生成一个有效的物理地址,并且CPU可以访问这些地址的内容。与访问用户空间地址的唯一区别是,CPU会抛出异常,因为当前的权限级别不允许访问这样的地址。所以用户空间不能简单地读取这样的地址内容。然而,熔断利用了现代CPU的乱序执行特性,它仍然可以在非法内存访问和异常引发之间的短暂空挡中执行指令。

In line 4 of Listing 2, we load the byte value located at the target kernel address, stored in the RCX register, into the least significant byte of the RAX register represented by AL. As explained in more detail in Section 2.1, the MOV instruction is fetched by the core, decoded into μOPs, allocated, and sent to the reorder buffer. There, architectural registers (e.g., RAX and RCX in Listing 2) are mapped to underlying physical registers enabling out-of-order execution. Trying to utilize the pipeline as much as possible, subsequent instructions (lines 5-7) are already decoded and allocated as μOPs as well. The μOPs are further sent to the reservation station holding the μOPs while they wait to be executed by the corresponding execution unit. The execution of a μOPs can be delayed if execution units are already used to their corresponding capacity or operand values have not been calculated yet.

在清单2的第4行中,我们将存储在RCX寄存器中的目标内核地址处的字节值加载到由AL表示的RAX寄存器的最低有效字节中。正如在第2.1节中更详细的解释,MOV指令由内核提取,解码成微指令,分配并发送到重排序缓冲器。在那里,架构寄存器(例如清单2中的RAX和RCX)被映射到底层物理寄存器,从而实现乱序执行。为了试图尽可能多地利用流水线,随后的指令(第5-7行)已经被解码并分配成微指令。在等待被相应的执行单元执行的时候,微指令进一步被发送到保留段中保存。如果执行单元已经被用作对应的容器或是操作数的值还没有被计算完毕,那么微指令的执行可能被延迟。

When the kernel address is loaded in line 4, it is likely that the CPU already issued the subsequent instructions as part of the out-or-order execution, and that their corresponding μOPs wait in the reservation station for the content of the kernel address to arrive. As soon as the fetched data is observed on the common data bus, the μOPs can begin their execution.

当内核地址在第4行被加载时,作为乱序执行的一部分,CPU很可能已经发出后续指令,并且它们相应的微指令在保留站中等待内核地址的内容到达。只要在公共数据总线上观察到读取的数据,微指令就可以开始执行。

When the μOPs finish their execution, they retire in-order, and, thus, their results are committed to the architectural state. During the retirement, any interrupts and exception that occurred during the execution of the instruction are handled. Thus, if the MOV instruction that loads the kernel address is retired, the exception is registered and the pipeline is flushed to eliminate all results of subsequent instructions which were executed out of order. However, there is a race condition between raising this exception and our attack step 2 which we describe below.

当微指令执行完毕时,就会按顺序退出,因此执行结果会提交到架构状态。在清退期间,任何在执行指令期间发生的中断和异常将被处理。所以如果加载内核地址的MOV指令被退出,则引发异常并刷新流水线以消除乱序执行的后续指令的所有结果。然而,引发这个异常和与我们下面描述的攻击步骤2之间,存在了竞争条件。

As reported by Gruss et al. [9], prefetching kernel addresses sometimes succeeds. We found that prefetching the kernel address can slightly improve the performance of the attack on some systems.

正如Gruss等人[9]所提出的那样,预取内核地址有时会成功。我们发现预取内核地址可以稍微提高在某些系统上的攻击性能。

 

Step 2: Transmitting the secret


步骤2:传递保密值

The instruction sequence from step 1 which is executed out of order has to be chosen in a way that it becomes a transient instruction sequence. If this transient instruction sequence is executed before the MOV instruction is retired (i.e., raises the exception), and the transient instruction sequence performed computations based on the secret, it can be utilized to transmit the secret to the attacker.

步骤1中的乱序指令序列不得不被选择成为瞬态指令序列。如果在MOV指令退出之前执行该瞬态指令序列(即,引发异常),并且瞬态指令序列基于保密值执行运算,则可以利用该序列将保密值发送给攻击者。

As already discussed, we utilize cache attacks that allow to build fast and low-noise covert channel using the CPU’s cache. Thus, the transient instruction sequence has to encode the secret into the microarchitecturall cache state, similarly to the toy example in Section 3.

正如我们已经讨论过的,我们利用缓存攻击,从而可以利用CPU的缓存构建快速,低噪声的隐蔽通道。因此,与第3节中的玩具示例类似,瞬态指令序列必须将保密值编码为微架构缓存状态。

We allocate a probe array in memory and ensure that no part of this array is cached. To transmit the secret, the transient instruction sequence contains an indirect memory access to an address which is calculated based on the secret (inaccessible) value. In line 5 of Listing 2 the secret value from step 1 is multiplied by the page size, i.e., 4 KB. The multiplication of the secret ensures that accesses to the array have a large spatial distance to each other. This prevents the hardware prefetcher from loading adjacent memory locations into the cache as well. Here, we read a single byte at once, hence our probe array is 256×4096 bytes, assuming 4KB pages.

我们在内存中分配一个探针数组,并确保这个数组的任何部分都不被缓存。为了传输保密值,瞬态指令序列包含对基于(不可访问的)保密值计算的地址的间接内存访问。在清单2的第5行中,来自步骤1的保密值乘以页面大小,即4KB。对保密值的乘法操作保证对数组各个数据的访问之间有很大的空间距离,这可以防止硬件预取器将相邻的内存位置加载到缓存中。在这里,我们一次读取一个字节,假设页面大小为4KB,那么我们的探针数组是256×4096字节。

Note that in the out-of-order execution we have a noise-bias towards register value ‘0’. We discuss the reasons for this in Section 5.2. However, for this reason, we introduce a retry-logic into the transient instruction sequence. In case we read a ‘0’, we try to read the secret again (step 1). In line 7, the multiplied secret is added to the base address of the probe array, forming the target address of the covert channel. This address is read to cache the corresponding cache line. Consequently, our transient instruction sequence affects the cache state based on the secret value that was read in step 1.

值得注意的是,在乱序执行中,我们对寄存器值“0”有一个噪声偏差。我们将在在5.2节讨论原因。出于此我们在瞬态指令序列中引入了一个重试逻辑:如果我们读取“0”,我们将尝试再次读取保密值(步骤1)。在第7行中,将保密值的乘积添加到探针数组的基础地址,构成隐蔽信道的目标地址。这个地址将被读取以缓存相应的缓存行。因此,我们的瞬态指令序列会影响那些在步骤1中读取的保密值的缓存状态。

Since the transient instruction sequence in step 2 races against raising the exception, reducing the runtime of step 2 can significantly improve the performance of the attack. For instance, taking care that the address translation for the probe array is cached in the TLB increases the attack performance on some systems.

由于步骤2中的瞬态指令序列会阻止抛出异常,因此减少步骤2的运行时间可以显着提高攻击的性能。例如,注意探针阵列的地址转换被缓存在TLB中会增加在某些系统上的攻击性能。

 

Step 3: Receiving the secret


步骤3:接收保密值

In step 3, the attacker recovers the secret value (step 1) by leveraging a microarchitectural side-channel attack (i.e., the receiving end of a microarchitectural covert channel) that transfers the cache state (step 2) back into an architectural state. As discussed in Section 4.2, Meltdown relies on Flush+ Reload to transfer the cache state into an architectural state.

在步骤3中,攻击者通过利用将高速缓存状态(步骤2)转移回架构状态的微体系结构旁路信道攻击(即,微体系结构隐蔽通道的接收端)来恢复保密值(步骤1)。正如在第4.2节中讨论的那样,熔断依靠Flush + Reload将缓存状态转换为体系结构状态。

When the transient instruction sequence of step 2 is executed, exactly one cache line of the probe array is cached. The position of the cached cache line within the probe array depends only on the secret which is read in step 1. Thus, the attacker iterates over all 256 pages of the probe array and measures the access time for every first cache line (i.e., offset) on the page. The number of the page containing the cached cache line corresponds directly to the secret value.

当步骤2的瞬态指令序列被执行时,恰好探针数组一个缓存行被缓存。探针数组中被缓存的缓存行的位置仅取决于在步骤1中读取的保密值。因此,攻击者遍历探针数组的所有256个页面并测量页面上首条缓存行的访问时间(即,偏移量 )。包含缓存的缓存行的页面编号直接对应于保密值。

 

Dumping the entire physical memory


转储整个物理内存

By repeating all 3 steps of Meltdown, the attacker can dump the entire memory by iterating over all different addresses. However, as the memory access to the kernel address raises an exception that terminates the program, we use one of the methods described in Section 4.1 to handle or suppress the exception.

通过重复熔断的所有3个步骤,攻击者可以遍历所有不同的地址来转储整个内存。但是,由于对内核地址的内存访问引发了终止程序的异常,我们使用4.1节中描述的方法之一来处理或抑制异常。

As all major operating systems also typically map the entire physical memory into the kernel address space (cf. Section 2.2) in every user process, Meltdown is not only limited to reading kernel memory but it is capable of reading the entire physical memory of the target machine.

由于所有主要的操作系统通常也将整个物理内存映射到每个用户进程的内核地址空间(参见2.2节),所以熔断不仅限于读取内核内存,还能够读取目标机器的整个物理内存。

 

5.2 Optimizations and Limitations


5.2 优化和限制

The case of 0

有关于0

If the exception is triggered while trying to read from an inaccessible kernel address, the register where the data should be stored, appears to be zeroed out. This is reasonable because if the exception is unhandled, the user space application is terminated, and the value from the inaccessible kernel address could be observed in the register contents stored in the core dump of the crashed process. The direct solution to fix this problem is to zero out the corresponding registers. If the zeroing out of the register is faster than the execution of the subsequent instruction (line 5 in Listing 2), the attacker may read a false value in the third step. To prevent the transient instruction sequence from continuing with a wrong value, i.e., ‘0’, Meltdown retries reading the address until it encounters a value different from ‘0’ (line 6). As the transient instruction sequence terminates after the exception is raised, there is no cache access if the secret value is 0. Thus, Meltdown assumes that the secret value is indeed ‘0’ if there is no cache hit at all.

如果异常是在尝试读取不可访问的内核地址时触发,应将存储数据的寄存器输出为零。这是合理的,因为如果异常未处理,则用户空间应用程序被终止,并且来自不可访问的内核地址的值可以在保存在崩溃进程的内核转储中的寄存器内容中被观察到。解决这个问题的直接方法是将相应的寄存器清零。如果寄存器清零快于后续指令的执行(清单2中的第5行),则攻击者可能会在第三步中读取一个错误的值。为了防止瞬态指令序列持有一个错误的值继续执行,即'0',熔断重试读取地址,直到它遇到一个不为'0'(第6行)的值。当瞬态指令序列在引发异常之后终止时,如果保密值为0,则不存在高速缓存访问。因此,如果根本没有高速缓存命中,则熔断假定保密值确实为“0”。

The loop is terminated by either the read value not being ‘0’ or by the raised exception of the invalid memory access. Note that this loop does not slow down the attack measurably, since, in either case, the processor runs ahead of the illegal memory access, regardless of whether ahead is a loop or ahead is a linear control flow. In either case, the time until the control flow returned from exception handling or exception suppression remains the same with and without this loop. Thus, capturing read ‘0’s beforehand and recovering early from a lost race condition vastly increases the reading speed.

当读取值不为“0”或无效内存访问引发的异常都将导致循环终止。请注意,此循环不会显著减慢攻击,因为在任何情况下,处理器都会在非法内存访问之前运行,无论之前是一个循环还是线性控制流。在任何一种情况下,无论有无此循环,异常处理或异常抑制之后重回控制流的时间都保持不变。因此,事先捕获读取“0”并且从耗损的高速状态中提前恢复,这大大提高了读取速度。

 

Single-bit transmission


单比特传输

In the attack description in Section 5.1, the attacker transmitted 8 bits through the covert channel at once and performed 28 = 256 Flush+ Reload measurements to recover the secret. However, there is a clear trade-off between running more transient instruction sequences and performing more Flush+ Reload measurements. The attacker could transmit an arbitrary number of bits in a single transmission through the covert channel, by either reading more bits using a MOV instruction for a larger data value. Furthermore, the attacker could mask bits using additional instructions in the transient instruction sequence. We found the number of additional instructions in the transient instruction sequence to have a negligible influence on the performance of the attack.

在5.1节的攻击描述中,攻击者立即通过隐蔽信道发送8位数据,并执行28 = 256次Flush + Reload来恢复保密值。然而,在运行更多的瞬态指令序列和执行更多的Flush + Reload之间往往可以找到一个清晰的平衡点。攻击者可以通过隐藏信道在一次传输中传输任意数量的比特,通过使用MOV指令读取更多的比特来获取更多的数值。此外,攻击者可以使用瞬态指令序列中的附加指令来屏蔽位。我们发现,瞬态指令序列中的附加指令的数量对于攻击的性能的影响是微乎其微的。

The performance bottleneck in the generic attack description above is indeed, the time spent on Flush+ Reload measurements. In fact, with this implementation, almost the entire time will be spent on Flush+Reload measurements. By transmitting only a single bit, we can omit all but one Flush+Reload measurement, i.e., the measurement on cache line 1. If the transmitted bit was a ‘1’, then we observe a cache hit on cache line 1. Otherwise, we observe no cache hit on cache line 1.

上述通用攻击描述中的性能瓶颈确实是花费在Flush + Reload测量上的时间。实际上,在这个实现中,几乎全部时间都将花在Flush + Reload测量上。通过只发送单比特数据,我们可以省略除了一次Flush + Reload测量之外的所有测量,即在高速缓存行1上的测量。如果发送的位是“1”,那么我们可以观测到缓存行1上的缓存命中。否则就不会观测到缓存行1上的缓存命中。

Transmitting only a single bit at once also has drawbacks. As described above, our side channel has a bias towards a secret value of ‘0’. If we read and transmit multiple bits at once, the likelihood that all bits are ‘0’ may quite small for actual user data. The likelihood that a single bit is ‘0’ is typically close to 50 %. Hence, the number of bits read and transmitted at once is a trade-off between some implicit error-reduction and the overall transmission rate of the covert channel.

一次只传送单比特也有缺点。如上所述,我们的旁路信道对保密值“0”存在歧义。如果我们一次读取和传输多个位,则对于实际用户数据而言,所有位全为“0”的可能性相当的小。而单个位为“0”的可能性通常接近50%。因此,一次性读取和传输的位数,是隐式错误发生率和隐蔽信道总体传输速率之间的平衡点。

However, since the error rates are quite small in either case, our evaluation (cf. Section 6) is based on the single-bit transmission mechanics.

但是,由于这两种情况下的误码率都很小,所以我们的评估(参见第6节)是基于单比特传输机制。

 

Exception Suppression using Intel TSX


使用Intel TSX异常抑制

In Section 4.1, we discussed the option to prevent that an exception is raised due an invalid memory access in the first place. Using Intel TSX, a hardware transactional memory implementation, we can completely suppress the exception [17].

在4.1节中,我们首先讨论了防止由于无效内存访问而引发异常的选项。使用英特尔TSX,一个硬件事务存储器实现,我们完全可以抑制异常[17]。

With Intel TSX, multiple instructions can be grouped to a transaction, which appears to be an atomic operation, i.e., either all or no instruction is executed. If one instruction within the transaction fails, already executed instructions are reverted, but no exception is raised.

使用英特尔TSX时,可以将多条指令组合到一个事务中,这看起来是一个原子操作,即全部或全不执行指令。如果事务中的一条指令失败,已经执行的指令将被恢复,但不会产生异常。

If we wrap the code from Listing 2 with such a TSX instruction, any exception is suppressed. However, the microarchitectural effects are still visible, i.e., the cache state is persistently manipulated from within the hardware transaction [7]. This results in a higher channel capacity, as suppressing the exception is significantly faster than trapping into the kernel for handling the exception, and continuing afterwards.

如果我们用清单2中的代码封装这样的TSX指令,任何异常都会被抑制。然而,微体系结构的效果仍然是可见的,即高速缓存状态在硬件事务中被持久地持有[7]。这样的结果是更高的通道容量,因为抑制异常的速度明显快于陷入内核处理异常之后再继续。

 

Dealing with KASLR


使用KASLR

In 2013, kernel address space layout randomization (KASLR) had been introduced to the Linux kernel (starting from version 3.14 [4]) allowing to randomize the location of the kernel code at boot time. However, only as recently as May 2017, KASLR had been enabled by default in version 4.12 [27]. With KASLR also the direct-physical map is randomized and, thus, not fixed at a certain address such that the attacker is required to obtain the randomized offset before mounting the Meltdown attack. However, the randomization is limited to 40 bit.

在2013年,内核地址空间分配随机化(KASLR)已经引入到Linux内核(从版本3.14 [4]开始),允许在启动时随机化内核代码的位置。但是,直到2017年5月,KASLR才在4.12版本中默认启用[27]。使用KASLR时,直接物理映射也是随机的,因此不固定在某个地址,这样攻击者在启动熔毁攻击之前需要获得随机偏移量。但随机偏移量被限制在40位。

Thus, if we assume a setup of the target machine with 8GB of RAM, it is sufficient to test the address space for addresses in 8GB steps. This allows to cover the search space of 40 bit with only 128 tests in the worst case. If the attacker can successfully obtain a value from a tested address, the attacker can proceed dumping the entire memory from that location. This allows to mount Meltdown on a system despite being protected by KASLR within seconds.

因此,如果我们假设使用8GB RAM配置的目标机器,则以8GB的长度来测试地址空间是足够的。这允许在最坏的情况下仅用128次测试覆盖40位的搜索空间。如果攻击者能够成功地从测试地址获取值,则攻击者可以从该位置执行转储整个内存。这样即使在几秒钟之内受到KASLR的保护,也仍然可以在系统上启动熔毁。

 

6 Evaluation


6 评估

In this section, we evaluate Meltdown and the performance of our proof-of-concept implementation. Section 6.1 discusses the information which Meltdown can leak, and Section 6.2 evaluates the performance of Meltdown, including countermeasures. Finally, we discuss limitations for AMD and ARM in Section 6.4.

在本节中,我们将评估熔毁和概念实现的性能。6.1节讨论熔毁可能泄漏的信息,6.2节评估熔毁的性能,包括对策。最后,我们在6.4节讨论AMD和ARM的局限性。

Table 1 shows a list of configurations on which we successfully reproduced Meltdown. For the evaluation of Meltdown, we used both laptops as well as desktop PCs with Intel Core CPUs. For the cloud setup, we tested Meltdown in virtual machines running on Intel Xeon CPUs hosted in the Amazon Elastic Compute Cloud as well as on DigitalOcean. Note that for ethical reasons we did not use Meltdown on addresses referring to physical memory of other tenants.

表1展示了我们成功重现熔毁的配置列表。对于熔毁的评估,我们使用了笔记本电脑以及带有Intel Core CPU的台式电脑。对于云服务器,我们测试了运行在Amazon Elastic Compute Cloud(ECS)和DigitalOcean Intel Xeon CPU上虚拟机中的熔毁。请注意,出于道德方面的原因,我们并没有在涉及其他租户物理内存的地址上使用熔毁。

Table 1: Experimental setups.
表1:实验配置

6.1 Information Leakage and Environments


6.1信息泄漏和环境

We evaluated Meltdown on both Linux (cf. Section 6.1.1) and Windows 10 (cf. Section 6.1.3). On both operating systems, Meltdown can successfully leak kernel memory. Furthermore, we also evaluated the effect of the KAISER patches on Meltdown on Linux, to show that KAISER prevents the leakage of kernel memory (cf. Section 6.1.2). Finally, we discuss the information leakage when running inside containers such as Docker (cf. Section 6.1.4).

我们在Linux(参见6.1.1节)和Windows 10(参见6.1.3节)上评估了熔断攻击。在这两个操作系统上,熔断都可以成功泄漏内核内存。此外,我们还评估了KAISER补丁对Linux上的熔断漏洞的影响,表明KAISER可以防止内核内存的泄漏(参见6.1.2节)。最后,我们讨论在Docker等容器中运行时的信息泄漏(参见6.1.4节)。

 

6.1.1 Linux


6.1.1 Linux系统

We successfully evaluated Meltdown on multiple versions of the Linux kernel, from 2.6.32 to 4.13.0. On all these versions of the Linux kernel, the kernel address space is also mapped into the user address space. Thus, all kernel addresses are also mapped into the address space of user space applications, but any access is prevented due to the permission settings for these addresses. As Meltdown bypasses these permission settings, an attacker can leak the complete kernel memory if the virtual address of the kernel base is known. Since all major operating systems also map the entire physical memory into the kernel address space (cf. Section 2.2), all physical memory can also be read.

我们成功评估了熔断攻击在多个版本(从2.6.32到4.13.0)下对Linux内核的影响。在所有这些版本的Linux内核中,内核地址空间都映射到用户地址空间。因此所有的内核地址也映射到用户空间应用程序的地址空间,但是由于这些地址的权限设置,所有的访问都被阻止。当熔断试图绕过这些权限设置时,如果内核基准地址的虚拟地址已知,攻击者可以暴露整个内核内存。由于所有主要的操作系统也是将整个物理内存映射到内核地址空间(参见2.2节)的,所以所有的物理内存也可以被读取。

Before kernel 4.12, kernel address space layout randomization (KASLR) was not active by default [30]. If KASLR is active, Meltdown can still be used to find the kernel by searching through the address space (cf. Section 5.2). An attacker can also simply de-randomize the direct-physical map by iterating through the virtual address space. Without KASLR, the direct-physical map starts at address 0xffff 8800 0000 0000 and linearly maps the entire physical memory. On such systems, an attacker can use Meltdown to dump the entire physical memory, simply by reading from virtual addresses starting at 0xffff 8800 0000 0000.

在内核4.12之前,内核地址空间布局随机化(KASLR)在默认情况下是不启动的[30]。如果KASLR处于激活状态,则通过搜索地址空间仍然可以使用熔断来查找内核地址(参见第5.2节)。攻击者也可以简单地通过遍历虚拟地址空间来将直接物理映射去随机化。 没有KASLR,直接物理映射从地址0xffff 8800 0000 0000开始,并线性映射整个物理内存。在这样的系统上,攻击者可以使用熔断来转储整个物理内存,只需简单的读取从0xffff 8800 0000 0000开始的虚拟地址即可。

On newer systems, where KASLR is active by default, the randomization of the direct-physical map is limited to 40 bit. It is even further limited due to the linearity of the mapping. Assuming that the target system has at least 8GB of physical memory, the attacker can test addresses in steps of 8 GB, resulting in a maximum of 128 memory locations to test. Starting from one discovered location, the attacker can again dump the entire physical memory.

在较新的系统中,默认情况下KASLR处于激活状态,直接物理映射的随机化被限制为40位。但由于映射的线性,它甚至受到更多限制。假设目标系统至少有8GB物理内存,攻击者可以以8GB的长度测试地址,最多可以测试128个内存位置。从一个被发现的正确位置开始,攻击者仍然可以再次转储整个物理内存。

Hence, for the evaluation, we can assume that the randomization is either disabled, or the offset was already retrieved in a pre-computation step.

因此,为了评估性能,我们可以假设随机化被禁用,或是已经在预算步骤中检索到偏移量。

 

6.1.2 Linux with KAISER Patch


6.1.2 Linux与KAISER补丁

The KAISER patch by Gruss et al. [8] implements a stronger isolation between kernel and user space. KAISER does not map any kernel memory in the user space, except for some parts required by the x86 architecture (e.g., interrupt handlers). Thus, there is no valid mapping to either kernel memory or physical memory (via the direct-physical map) in the user space, and such addresses can therefore not be resolved. Consequently, Meltdown cannot leak any kernel or physical memory except for the few memory locations which have to be mapped in user space.

Gruss等人的KAISER补丁[8]实现了内核和用户空间之间更强的隔离。KAISER不会映射用户空间中的任何内核内存,除了x86架构所需的某些部分(例如中断处理程序)。因此,不论是内核内存还是物理内存(通过直接物理映射)在用户空间中都不存在有效的映射,所以这些地址不能被解析。因此,除了必须映射到用户空间的少数内存位置外,熔毁攻击不能泄漏任何内核或物理内存。

We verified that KAISER indeed prevents Meltdown, and there is no leakage of any kernel or physical memory.

我们证实KAISER确实可以防止熔毁,避免任何内核或物理内存的信息泄漏。

Furthermore, if KASLR is active, and the few remaining memory locations are randomized, finding these memory locations is not trivial due to their small size of several kilobytes. Section 7.2 discusses the implications of these mapped memory locations from a security perspective.

而且,如果KASLR处于激活状态,并且剩余的少量存储位置是随机的,由于这些存储位置的大小仅有几千字节,找到这些存储位置并不是毫无意义的。第7.2节从安全角度讨论了这些映射的内存位置的含义。

 

6.1.3 Microsoft Windows


6.1.3 微软Windows系统

We successfully evaluated Meltdown on an up-to-date Microsoft Windows 10 operating system. In line with the results on Linux (cf. Section 6.1.1), Meltdown also can leak arbitrary kernel memory on Windows. This is not surprising, since Meltdown does not exploit any software issues, but is caused by a hardware issue.

我们成功评估了最新的Microsoft Windows 10操作系统上的熔断漏洞。 根据Linux上的结果(参见6.1.1节),熔断也可以在Windows上泄漏任意内核内存。这并不奇怪,因为熔断不会利用任何软件问题,而是由硬件问题引起的。

In contrast to Linux, Windows does not have the concept of an identity mapping, which linearly maps the physical memory into the virtual address space. Instead, a large fraction of the physical memory is mapped in the paged pools, non-paged pools, and the system cache. Furthermore, Windows maps the kernel into the address space of every application too. Thus, Meltdown can read kernel memory which is mapped in the kernel address space, i.e., any part of the kernel which is not swapped out, and any page mapped in the paged and non-paged pool, and the system cache.

与Linux相比,Windows不具有将物理内存线性映射到虚拟地址空间的身份变换的概念。相反,很大一部分物理内存会映射到分页缓冲池,非分页缓冲池和系统缓存中。此外,Windows也将内核映射到每个应用程序的地址空间。因此,熔断可以读取内核地址空间中映射的内核存储,即任何部分都没有被交换的内核,分页和非分页池中映射的任何页面,以及系统缓存。

 

6.1.4 Containers


6.1.4 容器

We evaluated Meltdown running in containers sharing a kernel, including Docker, LXC, and OpenVZ, and found that the attack can be mounted without any restrictions. Running Meltdown inside a container allows to leak information not only from the underlying kernel, but also from all other containers running on the same physical host.

我们评估了在共享内核容器(包括Docker,LXC和OpenVZ)中运行的熔断漏洞,发现其可以毫无限制地发动攻击。在容器中运行熔断不仅会泄漏来自底层内核的信息,还会泄漏运行在同一物理主机上的所有其他容器的信息。

The commonality of most container solutions is that every container uses the same kernel, i.e., the kernel is shared among all containers. Thus, every container has a valid mapping of the entire physical memory through the direct-physical map of the shared kernel. Furthermore, Meltdown cannot be blocked in containers, as it uses only memory accesses. Especially with Intel TSX, only unprivileged instructions are executed without even trapping into the kernel.

大多数容器解决方案的共同之处在于每个容器使用相同的内核,即内核在所有容器之间共享。因此,通过对共享内核的直接物理映射,每个容器都具有整个物理内存的有效映射。此外,熔断在容器中不会被阻塞,因为它只访问内存。特别是对于Intel TSX,只有无特权的指令才会执行,甚至都不会陷入内核。

Thus, the isolation of containers sharing a kernel can be fully broken using Meltdown. This is especially critical for cheaper hosting providers where users are not separated through fully virtualized machines, but only through containers. We verified that our attack works in such a setup, by successfully leaking memory contents from a container of a different user under our control.

因此,在共享同一个内核的容器间的隔离将被熔断漏洞完全打破。对于便宜的托管服务提供商而言,这一缺陷尤为突出,因为用户不能通过完全虚拟化的机器来隔离,而只能通过容器。在这样的设置下,我们验证了攻击的结果:在刻意控制下,我们成功泄漏了来源于不同用户容器的内存内容。

 

6.2 Meltdown Performance


6.2 熔断的性能

To evaluate the performance of Meltdown, we leaked known values from kernel memory. This allows us to not only determine how fast an attacker can leak memory, but also the error rate, i.e., how many byte errors to expect. We achieved average reading rates of up to 503KB/s with an error rate as low as 0.02% when using exception suppression. For the performance evaluation, we focused on the Intel Core i7-6700K as it supports Intel TSX, to get a fair performance comparison between exception handling and exception suppression.

为了评估熔断的性能,我们将从内核内存泄露已知的值。这使我们不仅可以确定攻击者可以多快地泄漏内存,而且还可以确定错误率,即期望有多少字节的错误。当使用异常抑制的方法时,熔断的平均读取率高达503KB / s,且错误率低至0.02%。有关性能的评估,我们专注于英特尔酷睿i7-6700K,因为它支持英特尔TSX,在异常处理和异常抑制之间将会有公平的性能比较。

 

6.2.1 Exception Handling


6.2.1 异常处理

Exception handling is the more universal implementation, as it does not depend on any CPU extension and can thus be used without any restrictions. The only requirement for exception handling is operating system support to catch segmentation faults and continue operation afterwards. This is the case for all modern operating systems, even though the specific implementation differs between the operating systems. On Linux, we used signals, whereas, on Windows, we relied on the Structured Exception Handler.

异常处理是更通用的实现,因为它不依赖于任何CPU插件,因此可以没有任何限制地使用。异常处理的唯一要求是操作系统支持捕获分段错误并在之后继续操作。所有现代操作系统都是如此,即使它们的具体实现不同。在Linux上,我们使用信号量实现,而在Windows上,我们依靠结构化异常处理程序来实现。

With exception handling, we achieved average reading speeds of 123KB/s when leaking 12MB of kernel memory. Out of the 12MB kernel data, only 0.03%were read incorrectly. Thus, with an error rate of 0.03 %, the channel capacity is 122KB/s.

通过异常处理,当内核内存的泄露总量为12MB时,平均读取速度达到了123KB / s。在12MB内核数据中,读取错误率仅为0.03%。因此,当错误率为0.03%,信道容量为122KB / s。

 

6.2.2 Exception Suppression


6.2.2 异常抑制

Exception suppression can either be achieved using conditional branches or using Intel TSX. Conditional branches are covered in detail in Kocher et al. [19], hence we only evaluate Intel TSX for exception suppression. In contrast to exception handling, Intel TSX does not require operating system support, as it is an instruction-set extension. However, Intel TSX is a rather new extension and is thus only available on recent Intel CPUs, i.e., since the Broadwell microarchitecture.

异常抑制可以使用条件分支或Intel TSX来实现。条件分支实现已经在Kocher 等人的研究中[19],因此我们只评估英特尔TSX实现的异常抑制。与异常处理相比,英特尔TSX不需要操作系统支持,因为它是指令集扩展。但是,英特尔TSX是一个相当新的扩展,因此只能在最近的英特尔CPU上使用,也就是自Broadwell微架构以后的版本。

Again, we leaked 12MB of kernel memory to measure the performance. With exception suppression, we achieved average reading speeds of 503KB/s. Moreover, the error rate of 0.02% with exception suppression is even lower than with exception handling. Thus, the channel capacity we achieve with exception suppression is 502KB/s.

再次,我们泄漏了共12MB的内核内存来衡量性能。通过异常抑制,平均读取速度达503KB / s。此外,异常抑制的读取错误率为0.02%,比异常处理的错误率更低。因此,我们通过异常抑制实现的信道容量是502KB / s。

 

6.3 Meltdown in Practice


6.3 熔断的实践

Listing 3 shows a memory dump using Meltdown on an Intel Core i7-6700K running Ubuntu 16.10 with the Linux kernel 4.8.0. In this example, we can identify HTTP headers of a request to a web server running on the machine. The XX cases represent bytes where the side channel did not yield any results, i.e., no Flush+Reload hit. Additional repetitions of the attack may still be able to read these bytes.

清单3显示了在Ubuntu 16.10和Linux内核4.8.0的Intel Core i7-6700K上利用熔断漏洞进行内存转储。在这个例子中,我们可以识别运行在此机器上的Web服务器的HTTP请求头。XX情况代表那些旁路信道没有产生任何结果的字节,即未被Flush+Reload命中。但额外重复的攻击可能依旧能够读取这些字节。

79cbb80: 6c4c 48 32 5a 78 66 56 44 73 4b 57 39 34 68 6d |lLH2ZxfVDsKW94hm|
79cbb90: 3364 2f 41 4d 41 45 44 41 41 41 41 41 51 45 42 |3d/AMAEDAAAAAQEB|
79cbba0: 4141 41 41 41 41 3d 3d XX XX XX XX XX XX XX XX |AAAAAA==........|
79cbbb0: XXXX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
79cbbc0: XXXX XX 65 2d 68 65 61 64 XX XX XX XX XX XX XX |...e-head.......|
79cbbd0: XXXX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
79cbbe0: XXXX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
79cbbf0: XXXX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
79cbc00: XXXX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
79cbc10: XXXX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
79cbc20: XXXX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
79cbc30: XXXX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
79cbc40: XXXX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
79cbc50: XXXX XX XX 0d 0a XX 6f 72 69 67 69 6e 61 6c 2d |.......original-|
79cbc60: 7265 73 70 6f 6e 73 65 2d 68 65 61 64 65 72 73 |response-headers|
79cbc70: XX44 61 74 65 3a 20 53 61 74 2c 20 30 39 20 44 |.Date: Sat, 09 D|
79cbc80: 6563 20 32 30 31 37 20 32 32 3a 32 39 3a 32 35 |ec 2017 22:29:25|
79cbc90: 2047 4d 54 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 | GMT..Content-Le|
79cbca0: 6e67 74 68 3a 20 31 0d 0a 43 6f 6e 74 65 6e 74 |ngth: 1..Content|
79cbcb0: 2d54 79 70 65 3a 20 74 65 78 74 2f 68 74 6d 6c |-Type: text/html|
79cbcc0: 3b20 63 68 61 72 73 65 74 3d 75 74 66 2d 38 0d |; charset=utf-8.|

Listing (3) Memory dump showing HTTP Headers on Ubuntu 16.10 on a Intel Core i7-6700K

清单3:在Intel Core i7-6700K的Ubuntu 16.10上的HTTP 头部的内存转储

Listing 4 shows a memory dump of Firefox 56 using Meltdown on the same machine. We can clearly identify some of the passwords that are stored in the internal password manager shown in Figure 6, i.e., Dolphin18, insta 0203, and secretpwd0. The attack also recovered a URL which appears to be related to a Firefox addon.

清单4显示了在同一台机器上利用熔断漏洞对Firefox 56的内存转储。我们可以清楚地识别存储在图6所示的内部密码管理器中的一些密码,即Dolphin18,insta 0203和secretpwd0。该攻击还恢复了一个似乎与Firefox插件相关的URL。

f94b76f0: 12 XX e0 81 19 XX e0 81 44 6f 6c 70 68 69 6e 31 |........Dolphin1|
f94b7700: 38 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 |8...............|
f94b7710: 70 52 b8 6b 96 7f XX XX XX XX XX XX XX XX XX XX |pR.k............|
f94b7720: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
f94b7730: XX XX XX XX 4a XX XX XX XX XX XX XX XX XX XX XX |....J...........|
f94b7740: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
f94b7750: XX XX XX XX XX XX XX XX XX XX e0 81 69 6e 73 74 |............inst|
f94b7760: 61 5f 30 32 30 33 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 |a_0203..........|
f94b7770: 70 52 18 7d 28 7f XX XX XX XX XX XX XX XX XX XX |pR.}(...........|
f94b7780: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
f94b7790: XX XX XX XX 54 XX XX XX XX XX XX XX XX XX XX XX |....T...........|
f94b77a0: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
f94b77b0: XX XX XX XX XX XX XX XX XX XX XX XX 73 65 63 72 |............secr|
f94b77c0: 65 74 70 77 64 30 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 |etpwd0..........|
f94b77d0: 30 b4 18 7d 28 7f XX XX XX XX XX XX XX XX XX XX |0..}(...........|
f94b77e0: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
f94b77f0: XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX |................|
f94b7800: e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 e5 |................|
f94b7810: 68 74 74 70 73 3a 2f 2f 61 64 64 6f 6e 73 2e 63 |https://addons.c|
f94b7820: 64 6e 2e 6d 6f 7a 69 6c 6c 61 2e 6e 65 74 2f 75 |dn.mozilla.net/u|
f94b7830: 73 65 72 2d 6d 65 64 69 61 2f 61 64 64 6f 6e 5f |ser-media/addon_|

Listing 4: Memory dump of Firefox 56 on Ubuntu 16.10 on a Intel Core i7-6700K disclosing saved passwords (cf. Figure 6).

清单4:在Intel Core i7-6700K的Ubuntu 16.10上的Firefox 56的内存转储泄露了保存的密码(参见图6)。

Figure 6: Firefox 56 password manager showing the stored passwords that are leaked using Meltdown in Listing 4.
图6:Firefox 56密码管理器,上图的存储密码在清单4中被熔断攻击泄露。

6.4 Limitations on ARM and AMD


6.4 6.4 ARM和AMD的限制

We also tried to reproduce the Meltdown bug on several ARM and AMD CPUs. However, we did not manage to successfully leak kernel memory with the attack described in Section 5, neither on ARM nor on AMD. The reasons for this can be manifold. First of all, our implementation might simply be too slow and a more optimized version might succeed. For instance, a more shallow out-of-order execution pipeline could tip the race condition towards against the data leakage. Similarly, if the processor lacks certain features, e.g., no re-order buffer, our current implementation might not be able to leak data. However, for both ARM and AMD, the toy example as described in Section 3 works reliably, indicating that out-of-order execution generally occurs and instructions past illegal memory accesses are also performed.

我们也尝试在几个ARM和AMD CPU上重现熔断bug。但是,利用第5节中的攻击,我们没能成功地将内核内存泄漏,无论是ARM还是AMD。原因可能是多方面的。首先,可能仅仅是简单的因为我们的实现太慢,更优化的版本可能会成功。例如,一个更粗浅的乱序执行信道可能会转变竞争状态而阻止了数据泄露。类似地,如果处理器缺乏某些特征,例如没有重新排序缓冲器,那么我们当前的实现可能不能泄漏数据。然而,对于ARM和AMD来说,第3节中所描述的玩具例子都是可靠的,这表明乱序执行一般都会发生,并且执行访问非法内存的指令。

 

7 Countermeasures


7 对策

In this section, we discuss countermeasures against the Meltdown attack. At first, as the issue is rooted in the hardware itself, we want to discuss possible microcode updates and general changes in the hardware design.

在本节中,我们将讨论抵御熔断攻击的对策。首先,由于问题根源于硬件本身,所以我们想讨论可能的微码更新和硬件设计中的一般变化。

 

7.1 Hardware


7.1 硬件

Meltdown bypasses the hardware-enforced isolation of security domains. There is no software vulnerability involved in Meltdown. Hence any software patch (e.g., KAISER [8]) will leave small amounts of memory exposed (cf. Section 7.2). There is no documentation whether such a fix requires the development of completely new hardware, or can be fixed using a microcode update.

熔断会绕过硬件强制隔离的安全域且没有软件方面的弱点。因此,任何软件补丁(例如KAISER [8])都会暴露少量的内存(参见7.2节)。没有文档说明这样的修复是否需要开发全新的硬件,或者可以使用微码更新来修复。

As Meltdown exploits out-of-order execution, a trivial countermeasure would be to completely disable out-of-order execution. However, the performance impacts would be devastating, as the parallelism of modern CPUs could not be leveraged anymore. Thus, this is not a viable solution.

由于熔断利用乱序执行,一个几乎无用的对策就是完全禁止乱序执行。然而,这样做带来的性能影响将是毁灭性的,因为现代CPU的并行性不能再被利用。因此这不是一个可行的解决方案。

Meltdown is some form of race condition between the fetch of a memory address and the corresponding permission check for this address. Serializing the permission check and the register fetch can prevent Meltdown, as the memory address is never fetched if the permission check fails. However, this involves a significant overhead to every memory fetch, as the memory fetch has to stall until the permission check is completed.

熔断是获取内存地址和相应的地址权限检查之间的竞争条件的某种形式。序列化权限检查和注册获取可以防止熔毁,因为如果权限检查失败,将永远不会取得内存地址。但是,这对于每次内存读取都会带来很大的开销,因为在完成权限检查之前,必须阻塞内存读取。

A more realistic solution would be to introduce a hard split of user space and kernel space. This could be enabled optionally by modern kernels using a new hard-split bit in a CPU control register, e.g., CR4. If the hard-split bit is set, the kernel has to reside in the upper half of the address space, and the user space has to reside in the lower half of the address space. With this hard split, a memory fetch can immediately identify whether such a fetch of the destination would violate a security boundary, as the privilege level can be directly derived from the virtual address without any further lookups. We expect the performance impacts of such a solution to be minimal. Furthermore, the backwards compatibility is ensured, since the hard-split bit is not set by default and the kernel only sets it if it supports the hard-split feature.

更现实的解决方案是引入用户空间和内核空间的硬分割。这可以通过使用在CPU控制寄存器中启用一个新的硬分割位的现代内核来实现(例如CR4)。如果硬分割位被设置,内核必须保存在地址空间的上半部分,用户空间必须保存在地址空间的下半部分。通过这种硬分割,一次内存读取可以立即识别此次读取的目标是否会违反安全边界,因为特权级别可以直接从虚拟地址派生而无需进一步查找。我们评估这种解决方案对性能的影响是最小的。此外,必须确保向后兼容性,因为硬分割位没有被默认设置,并且内核只在它支持硬分割特征时才能设置它。

Note that these countermeasures only prevent Meltdown, and not the class of Spectre attacks described by Kocher et al. [19]. Likewise, several countermeasures presented by Kocher et al. [19] have no effect on Meltdown. We stress that it is important to deploy countermeasures against both attacks.

请注意,这些对策只能阻止熔断攻击,而不能抵御Kocher等人描述的幽灵攻击[19]。同样,Kocher等人提出的一些对策[19]对熔断无效。我们必须强调,对这两种攻击采取防御策略是非常重要的。

 

7.2 KAISER


7.2 KAISER

As hardware is not as easy to patch, there is a need for software workarounds until new hardware can be deployed. Gruss et al. [8] proposed KAISER, a kernel modification to not have the kernel mapped in the user space. This modification was intended to prevent side-channel attacks breaking KASLR [13, 9, 17]. However, it also prevents Meltdown, as it ensures that there is no valid mapping to kernel space or physical memory available in user space. KAISER will be available in the upcoming releases of the Linux kernel under the name kernel page-table isolation (KPTI) [25]. The patch will also be backported to older Linux kernel versions. A similar patch was also introduced in Microsoft Windows 10 Build 17035 [15]. Also, Mac OS X and iOS have similar features [22].

由于硬件不易修补,因此需要软件解决方法直到新版硬件被部署。Gruss等人[8]提出了KAISER方法,一个内核修改,使得内核不再映射到用户空间。这种修改原是为了防止打破KASLR的旁路攻击[13][9][17]。但是,它也可以防止熔断,因为它可以确保在用户空间中不会存在内核空间或物理内存的有效映射。 KAISER将搭载在即将发布的Linux内核版本中,并以内核页表隔离(KPTI)[25]为名。该补丁也将被移植到较旧的Linux内核版本。微软Windows 10 Build 17035[15]中也引入了类似的补丁程序。另外,Mac OS X和iOS也有类似的特性[22]。

Although KAISER provides basic protection against Meltdown, it still has some limitations. Due to the design of the x86 architecture, several privileged memory locations are required to be mapped in user space [8]. This leaves a residual attack surface for Meltdown, i.e., these memory locations can still be read from user space. Even though these memory locations do not contain any secrets, such as credentials, they might still contain pointers. Leaking one pointer can be enough to again break KASLR, as the randomization can be calculated from the pointer value.

尽管KAISER提供了针对熔断的基本保护,但它仍然有一些限制。由于x86架构的设计,需要在用户空间映射数个特权内存位置[8]。这为熔断留下了残留攻击漏洞,即,这些存储位置仍然可以从用户空间读取。即使这些内存位置不包含任何保密值(如凭据),它们仍可能包含指针。泄漏一个指针完全足以再次破坏KASLR,因为可以从指针值计算随机量。

Still, KAISER is the best short-time solution currently available and should therefore be deployed on all systems immediately. Even with Meltdown, KAISER can avoid having any kernel pointers on memory locations that are mapped in the user space which would leak information about the randomized offsets. This would require trampoline locations for every kernel pointer, i.e., the interrupt handler would not call into kernel code directly, but through a trampoline function. The trampoline function must only be mapped in the kernel. It must be randomized with a different offset than the remaining kernel. Consequently, an attacker can only leak pointers to the trampoline code, but not the randomized offsets of the remaining kernel. Such trampoline code is required for every kernel memory that still has to be mapped in user space and contains kernel addresses. This approach is a trade-off between performance and security which has to be assessed in future work.

不过,KAISER仍是目前可用的最佳临时解决方案,因此应立即部署在所有系统上。即使在熔断攻击的情况下,KAISER也可以保证映射在可能泄露有关随机偏移量信息的用户空间上的内存地址上不会持有任何内核指针。这将需要每个内核指针的跳板地址,即中断处理程序不会直接调用内核代码,而是通过跳板访问。跳板功能必须只能在内核中映射。它必须以不同于其余内核的偏移来随机化。因此,攻击者只能泄漏指向跳板代码的指针,而不能泄露剩余内核的随机偏移量。每个内核内存都需要使用这种跳板代码,这些代码仍然需要映射到用户空间并包含内核地址。但这是性能和安全性之间的折衷,必须在未来的工作中持续进行评估。

 

8 Discussion


8 讨论

Meltdown fundamentally changes our perspective on the security of hardware optimizations that manipulate the state of microarchitectural elements. The fact that hardware optimizations can change the state of microarchitectural elements, and thereby imperil secure soft-ware implementations, is known since more than 20 years [20]. Both industry and the scientific community so far accepted this as a necessary evil for efficient computing. Today it is considered a bug when a cryptographic algorithm is not protected against the microarchitectural leakage introduced by the hardware optimizations. Meltdown changes the situation entirely. Meltdown shifts the granularity from a comparably low spatial and temporal granularity, e.g., 64-bytes every few hundred cycles for cache attacks, to an arbitrary granularity, allowing an attacker to read every single bit. This is nothing any (cryptographic) algorithm can protect itself against. KAISER is a short-term software fix, but the problem we uncovered is much more significant.

熔断从根本上改变了我们对直接操纵微架构元件状态的硬件优化的安全性的观点。20多年来,人们都不认为,改变微架构元件状态的硬件优化会危害软件实现的安全性。到目前为止,工业界和科学界都认为这是高效计算所必需的。今天,当一个加密算法不能抵御由硬件优化引入的微体系结构泄漏时,它被认为是一个bug。是熔毁彻底改变了这种状况,它将粒度从相对较低的空间和时间粒度(例如,每隔几百个循环才能有64个字节用于缓存攻击)转换为任意的粒度,从而允许攻击者读取每一个比特。这使得没有什么(加密)算法可以保护并抵御熔毁攻击。KAISER是一个短期的软件修复程序,但我们发现的问题更为重要。

We expect several more performance optimizations in modern CPUs which affect the microarchitectural state in some way, not even necessarily through the cache. Thus, hardware which is designed to provide certain security guarantees, e.g., CPUs running untrusted code, require a redesign to avoid Meltdown- and Spectre-like attacks. Meltdown also shows that even error-free software, which is explicitly written to thwart side-channel attacks, is not secure if the design of the underlying hardware is not taken into account.

我们期望在现代CPU中进行更多的性能优化,以某种方式影响微架构状态,甚至不一定通过缓存。因此,被设计为提供某些安全保证的硬件(例如,运行不可信代码的CPU)需要重新设计以避免类似熔断和类似幽灵的攻击。熔断也向我们展示了,即使是那些为了防止旁路攻击而明确实现并且毫无漏洞的软件,如果它们设计并运行在没有将熔断考虑在内的硬件上,也不是安全的。

With the integration of KAISER into all major operating systems, an important step has already been done to prevent Meltdown. KAISER is also the first step of a paradigm change in operating systems. Instead of always mapping everything into the address space, mapping only the minimally required memory locations appears to be a first step in reducing the attack surface. However, it might not be enough, and an even stronger isolation may be required. In this case, we can trade flexibility for performance and security, by e.g., forcing a certain virtual memory layout for every operating system. As most modern operating system already use basically the same memory layout, this might be a promising approach.

通过将KAISER集成到所有主要操作系统中,抵御熔断攻击已经迈出了重要的一步。KAISER也是操作系统范式改变的第一步。不是总将所有的内容都映射到地址空间,而是只映射最少的所需的内存位置,这似乎是减少攻击面的第一步。但这可能是不够的,可能需要更强的隔离。在这种情况下,我们可以灵活的在性能和安全性之间权衡,例如,为每个操作系统强制一定的虚拟内存布局。由于大多数现代操作系统已经使用基本相同的内存布局,这可能是一个很有前途的方法。

Meltdown also heavily affects cloud providers, especially if the guests are not fully virtualized. For performance reasons, many hosting or cloud providers do not have an abstraction layer for virtual memory. In such environments, which typically use containers, such as Docker or OpenVZ, the kernel is shared among all guests. Thus, the isolation between guests can simply be circumvented with Meltdown, fully exposing the data of all other guests on the same host. For these providers, changing their infrastructure to full virtualization or using software workarounds such as KAISER would both increase the costs significantly.

熔断也严重影响云厂商,特别是如果用户没有完全虚拟化。出于性能原因,许多托管服务或云厂商没有虚拟内存的抽象层。在这种通常使用容器(如Docker或OpenVZ)的环境中,所有用户都共享内核。因此,用户之间的隔离可以简单地通过熔断来破坏,完全暴露同一主机上所有其他用户的数据。对于这些厂商而言,将其基础架构改为完全虚拟化或使用软件解决方案(如KAISER)会显着增加成本。

Even if Meltdown is fixed, Spectre [19] will remain an issue. Spectre [19] and Meltdown need different defenses. Specifically mitigating only one of them will leave the security of the entire system at risk. We expect that Meltdown and Spectre open a new field of research to investigate in what extent performance optimizations change the microarchitectural state, how this state can be translated into an architectural state, and how such attacks can be prevented.

即使熔断已经修复,但幽灵[19]仍然是一个难题。幽灵[19]和熔断需要不同的防御措施。特别是仅忽视其中一个都会使整个系统的安全性面临风险。我们期望熔断和幽灵开辟一个新的研究领域:性能优化在多大程度上改变了微体系结构状态,这个状态如何转化为体系结构状态,以及如何防止这样的攻击。

 

9 Conclusion


9 总结

In this paper, we presented Meltdown, a novel software-based side-channel attack exploiting out-of-order execution on modern processors to read arbitrary kernel- and physical-memory locations from an unprivileged user space program. Without requiring any software vulnerability and independent of the operating system, Meltdown enables an adversary to read sensitive data of other processes or virtual machines in the cloud with up to 503KB/s, affecting millions of devices. We showed that the countermeasure KAISER [8], originally proposed to protect from side-channel attacks against KASLR, inadvertently impedes Meltdown as well. We stress that KAISER needs to be deployed on every operating system as a short-term workaround, until Meltdown is fixed in hardware, to prevent large-scale exploitation of Meltdown.

在本文中,我们介绍了熔断攻击,这是一种新型的基于软件的旁路攻击,利用现代处理器上的乱序执行从非特权用户空间程序中读取任意内核和物理内存位置。不需要任何软件漏洞,独立于操作系统,熔断使攻击者能够以高达503KB / s的速度读取其他进程或云虚拟机的敏感数据,影响数百万设备。我们发现,KAISER [8],一个最初提出是为了防止对KASLR旁路攻击的对策无意中抵御了熔断攻击。我们强调需要在每个操作系统上安装KAISER补丁,作为一个短期的解决方案,直到熔断漏洞从硬件层面修复,以防止熔断漏洞的大规模利用。

 

Acknowledgment


感言

We would like to thank Anders Fogh for fruitful discussions at BlackHat USA 2016 and BlackHat Europe 2016, which ultimately led to the discovery of Meltdown. Fogh [5] already suspected that it might be possible to abuse speculative execution in order to read kernel memory in user mode but his experiments were not successful. We would also like to thank Jann Horn for comments on an early draft. Jann disclosed the issue to Intel in June. The subsequent activity around the KAISER patch was the reason we started investigating this issue. Furthermore, we would like Intel, ARM, Qualcomm, and Microsoft for feedback on an early draft.

我们要感谢Anders Fogh在2016年的BlackHat USA和2016年的BlackHat 上进行的卓有成效的讨论,最终助力了熔断的发现。Fogh[5]已经怀疑,为了在用户模式下读取内核内存,推测执行可能会被滥用,但他的实验并不成功。我们还要感谢Jann Horn对早期草案的评论。Jann在六月向英特尔披露了这个问题。随后的KAISER补丁活动是我们开始调查这个问题的原因。此外,我们希望英特尔,ARM,高通和微软就早期草案提供反馈意见。

We would also like to thank Intel for awarding us with a bug bounty for the responsible disclosure process, and their professional handling of this issue through communicating a clear timeline and connecting all involved researchers. Furthermore, we would also thank ARM for their fast response upon disclosing the issue.

我们还要感谢英特尔公司为有效的披露程序给我们奖励的bug赏金,和他们在沟通明确的修复时间期限和联系相关研究人员上的专业处理。此外,我们也要感谢ARM在披露这个问题时作出的快速反应。

This work was supported in part by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 681402).

这项工作得到了欧洲研究理事会(ERC)在欧盟“地平线2020年”研究和创新计划(赠款协议No 681402)下的部分支持。

 

References


引用

  • [1] BENGER, N., VAN DE POL, J., SMART, N. P., AND YAROM, Y. “Ooh Aah... Just a Little Bit”: A small amount of side channel can go a long way. In CHES’14 (2014).
  • [2] CHENG, C.-C. The schemes and performances of dynamic branch predictors. Berkeley Wireless Research Center, Tech. Rep (2000).
  • [3] DEVIES, A. M. AMD Takes Computing to a New Horizon with RyzenTMProcessors, 2016.
  • [4] EDGE, J. Kernel address space layout randomization, 2013.
  • [5] FOGH, A. Negative Result: Reading Kernel Memory From User Mode, 2017.
  • [6] GRAS, B., RAZAVI, K., BOSMAN, E., BOS, H., AND GIUFFRIDA, C. ASLR on the Line: Practical Cache Attacks on the MMU. In NDSS (2017).
  • [7] GRUSS, D., LETTNER, J., SCHUSTER, F., OHRIMENKO, O., HALLER, I., AND COSTA, M. Strong and Efficient Cache Side- Channel Protection using Hardware Transactional Memory. In USENIX Security Symposium (2017).
  • [8] GRUSS, D., LIPP, M., SCHWARZ, M., FELLNER, R., MAURICE, C., AND MANGARD, S. KASLR is Dead: Long Live KASLR. In International Symposium on Engineering Secure Software and Systems (2017), Springer, pp. 161–176.
  • [9] GRUSS, D., MAURICE, C., FOGH, A., LIPP, M., AND MANGARD, S. Prefetch Side-Channel Attacks: Bypassing SMAP and Kernel ASLR. In CCS (2016).
  • [10] GRUSS, D., MAURICE, C., WAGNER, K., AND MANGARD, S. Flush+Flush: A Fast and Stealthy Cache Attack. In DIMVA (2016).
  • [11] GRUSS, D., SPREITZER, R., AND MANGARD, S. Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches. In USENIX Security Symposium (2015).
  • [12] HENNESSY, J. L., AND PATTERSON, D. A. Computer architecture: a quantitative approach. Elsevier, 2011.
  • [13] HUND, R., WILLEMS, C., AND HOLZ, T. Practical Timing Side Channel Attacks against Kernel Space ASLR. In S&P (2013).
  • [14] INTEL. Intel R 64 and IA-32 Architectures Optimization Reference Manual, 2014.
  • [15] IONESCU, A. Windows 17035 Kernel ASLR/VA Isolation In Practice (like Linux KAISER)., 2017.
  • [16] IRAZOQUI, G., INCI, M. S., EISENBARTH, T., AND SUNAR, B. Wait a minute! A fast, Cross-VM attack on AES. In RAID’14 (2014).
  • [17] JANG, Y., LEE, S., AND KIM, T. Breaking Kernel Address Space Layout Randomization with Intel TSX. In CCS (2016).
  • [18] JIM´E NEZ, D. A., AND LIN, C. Dynamic branch prediction with perceptrons. In High-Performance Computer Architecture, 2001. HPCA. The Seventh International Symposium on (2001), IEEE, pp. 197–206.
  • [19] KOCHER, P., GENKIN, D., GRUSS, D., HAAS, W., HAMBURG, M., LIPP, M., MANGARD, S., PRESCHER, T., SCHWARZ, M., AND YAROM, Y. Spectre Attacks: Exploiting Speculative Execution.
  • [20] KOCHER, P. C. Timing Attacks on Implementations of Diffe-Hellman, RSA, DSS, and Other Systems. In CRYPTO (1996).
  • [21] LEE, B., MALISHEVSKY, A., BECK, D., SCHMID, A., AND LANDRY, E. Dynamic branch prediction. Oregon State University.
  • [22] LEVIN, J. Mac OS X and IOS Internals: To the Apple’s Core. John Wiley & Sons, 2012.
  • [23] LIPP, M., GRUSS, D., SPREITZER, R., MAURICE, C., AND MANGARD, S. ARMageddon: Cache Attacks on Mobile Devices. In USENIX Security Symposium (2016).
  • [24] LIU, F., YAROM, Y., GE, Q., HEISER, G., AND LEE, R. B. Last-Level Cache Side-Channel Attacks are Practical. In IEEE Symposium on Security and Privacy – SP (2015), IEEE Computer Society, pp. 605–622.
  • [25] LWN. The current state of kernel page-table isolation, Dec. 2017.
  • [26] MAURICE, C., WEBER, M., SCHWARZ, M., GINER, L., GRUSS, D., ALBERTO BOANO, C., MANGARD, S., AND R¨OMER, K. Hello from the Other Side: SSH over Robust Cache Covert Channels in the Cloud. In NDSS (2017).
  • [27] MOLNAR, I. x86: Enable KASLR by default, 2017. [28] OSVIK, D. A., SHAMIR, A., AND TROMER, E. Cache Attacks and Countermeasures: the Case of AES. In CT-RSA (2006).
  • [29] PERCIVAL, C. Cache missing for fun and profit. In Proceedings of BSDCan (2005).
  • [30] PHORONIX. Linux 4.12 To Enable KASLR By Default, 2017.
  • [31] SCHWARZ, M., LIPP, M., GRUSS, D., WEISER, S., MAURICE, C., SPREITZER, R., AND MANGARD, S. KeyDrown: Eliminating Software-Based Keystroke Timing Side-Channel Attacks. In NDSS’18 (2018).
  • [32] TERAN, E., WANG, Z., AND JIM´ENEZ, D. A. Perceptron learning for reuse prediction. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on (2016), IEEE, pp. 1–12.
  • [33] TOMASULO, R. M. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of research and Development 11, 1 (1967), 25–33.
  • [34] VINTAN, L. N., AND IRIDON, M. Towards a high performance neural branch predictor. In Neural Networks, 1999. IJCNN’99. International Joint Conference on (1999), vol. 2, IEEE, pp. 868– 873.
  • [35] YAROM, Y., AND FALKNER, K. Flush+Reload: a High Resolution, Low Noise, L3 Cache Side-Channel Attack. In USENIX Security Symposium (2014).
  • [36] YEH, T.-Y., AND PATT, Y. N. Two-level adaptive training branch prediction. In Proceedings of the 24th annual international symposium on Microarchitecture (1991), ACM, pp. 51–61.
  • [37] ZHANG, Y., JUELS, A., REITER, M. K., AND RISTENPART, T. Cross-Tenant Side-Channel Attacks in PaaS Clouds. In CCS’14 (2014).
  • https://blog.csdn.net/atomzhong/category_7399910.html

 

英文原文


 

 

  • 2
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值