MTE - 堆内存检测原理

Youth cowboy

已于 2024-07-22 09:09:22 修改

阅读量987

点赞数 16

分类专栏： stability 文章标签： android linux

于 2024-01-18 14:07:04 首次发布

本文链接：https://blog.csdn.net/youthcowboy/article/details/135672700

版权

stability 专栏收录该内容

12 篇文章 2 订阅

订阅专栏

0. 基本思路

1.Scudo 的内存分配方式

2.Primary Allocator 的 MTE 实现

2.1 生成 tag

2.2 UAF 的检测原理

2.3 奇偶数 mask

3.Secondary Allocator 的 MTE 实现

4.申请的内存不是16字节的整数倍

5.小结

MTE 堆内存检测的硬件（指令）基础和 MTE 栈内存检测是一样的~

不同的是 MTE 栈内存检测的用户空间实现是编译器（llvm）完成的~

而 MTE 堆内存检测的用户空间实现则是由内存分配器（scudo，llvm 的运行时库）完成的

(of course，如果不使用scudo而使用其他的内存分配器，应用程序也可以通过调用 MTE 指令或者封装的 MTE 相关 api 来实现堆内存检测，只是会比较麻烦，一般不会这么做 ~~~)

0. 基本思路

Scudo 分配内存后，调用 MTE 指令为刚分配的内存打上随机 tag，并返回带 tag 的指针给用户程序~

Scudo 释放内存后，重新调用 MTE 指令为刚释放的内存打上新的随机 tag~

1.Scudo 的内存分配方式

Scudo 有两种内存分配方式

Primary allocate，用于分配小内存，是会频繁使用的方式
Secondary allocate，用于分配大内存 (>256K)，使用相对不频繁

Primary allocate 是从现有的 vma 中分配出一个固定 size 的内存块，即Block（这里 Block 是 scudo 的概念，不是 linux 存储的块），包含header区和内容区，内容区才是应用程序真正能使用的部分~

Secondary allocate 是分配一个新的 vma，它包含两个header区和内容区，以及前后各一个 gaurd page~

Scudo 分配堆内存最终都会调用Allocator::allocate方法~

在内存分配出来后，调用如下代码为分配出来的内存打上 tag（MTE）~

//Primary Allocator  
  const uptr OddEvenMask =
    computeOddEvenMaskForPointerMaybe(Options, BlockUptr, ClassId);  
  TaggedPtr = prepareTaggedChunk(Ptr, Size, OddEvenMask, BlockEnd);
  storePrimaryAllocationStackMaybe(Options, Ptr);
...
//Secondary Allocator
} else {
  storeTags(reinterpret_cast<uptr>(Block), reinterpret_cast<uptr>(Ptr));
  storeSecondaryAllocationStackMaybe(Options, Ptr, Size);
}

2.Primary Allocator 的 MTE 实现

Primary Allocator 分配的 Block 结构如下图所示，有以下几点需要注意的

Block 是大小相同且连续的内存块，包含 Header 和 Content 两部分
Header 存储该内存块的一些元数据，以便于在释放的时候进行状态检测
Content 是用户程序真正申请到并能使用的内存，Ptr 为返回给用户的指针
Block 和 Content 是16字节对齐的

Primary Allocator 使用如下代码给 Block 以及返回地址（即 Ptr）打上 tag~

const uptr OddEvenMask =
    computeOddEvenMaskForPointerMaybe(Options, BlockUptr, BlockSize);
TaggedPtr = prepareTaggedChunk(Ptr, Size, OddEvenMask, BlockEnd);

2.1 生成 tag

OddEvenMask表示对随机 tag 选取范围的限制~

prepareTaggedChunk给 Block 以及返回地址（即 Ptr）打上 tag~

inline void *prepareTaggedChunk(void *Ptr, uptr Size, uptr ExcludeMask,
                                uptr BlockEnd) {
  // Prepare the granule before the chunk to store the chunk header by setting
  // its tag to 0. Normally its tag will already be 0, but in the case where a
  // chunk holding a low alignment allocation is reused for a higher alignment
  // allocation, the chunk may already have a non-zero tag from the previous
  // allocation.
  __asm__ __volatile__(".arch_extension memtag; stg %0, [%0, #-16]"
                       :
                       : "r"(Ptr)
                       : "memory");

  uptr TaggedBegin, TaggedEnd;
  setRandomTag(Ptr, Size, ExcludeMask, &TaggedBegin, &TaggedEnd);

  // Finally, set the tag of the granule past the end of the allocation to 0,
  // to catch linear overflows even if a previous larger allocation used the
  // same block and tag. Only do this if the granule past the end is in our
  // block, because this would otherwise lead to a SEGV if the allocation
  // covers the entire block and our block is at the end of a mapping. The tag
  // of the next block's header granule will be set to 0, so it will serve the
  // purpose of catching linear overflows in this case.
  uptr UntaggedEnd = untagPointer(TaggedEnd);
  if (UntaggedEnd != BlockEnd)
    __asm__ __volatile__(".arch_extension memtag; stg %0, [%0]"
                         :
                         : "r"(UntaggedEnd)
                         : "memory");
  return reinterpret_cast<void *>(TaggedBegin);
}

由于需要直接使用stg等汇编指令，prepareTaggedChunk方法内嵌了一些汇编代码（setRandomTag函数中也内嵌了汇编代码）~

prepareTaggedChunk的四个参数分别表示

Ptr，表明 chunk （即 Content）的起始地址，此时它还没有打上 tag，也即它的 56 ~ 59 位（高4位）均为 0~
Size，用户程序要求分配的大小~
ExcludeMask，对随机 tag 选取范围进行限制的掩码，ExcludeMask是个16位的2进制数，(ExcludeMask >> tag) & 0x1 的值不能为 1（即 ExcludeMask 的第 tag 位应该 0）。大致原理是：MTE 的 tag 为 4 bits，因此随机 tag 的值有 0 ~ 15 共 16 种可能；Android 默认不选用 0，因此还剩下15种可能；ExcludeMask 用于从 15 种可能中再删去一些选择~ 譬如 ExcludeMask 的值为 0x6，0x6 == 0b0110，其第 1、2 位的值为 1，则随机 tag 不会选择 1 或 2~ 这个参数是由OddEvenMask设置的~
BlockEnd，块的结束地址，由于 Scudo 中存储的都是大小相同的块，因此块大小可能会比要求分配的大小大很多~

prepareTaggedChunk中的setRandomTag方法会以 16 字节为单位，循环为 Content 内存打上 tag~ 最终的tag 情况如下图所示

Tag 生成之后，越界的内存访问就会因 tag 不匹配而发生 SIGSEGV~

注意：

Unused 内存只对第一个 16 字节打上了 tag，线性越界是可以 100% 检测出来的，但是非线性的跨越式越界则不一定能检测出来
之所以没有将 Unused 内存全部 tag 为 0，是基于性能的考虑，代价是可能会漏检一些跨越式的越界

2.2 UAF 的检测原理

上面提到了越界（OOB，Out Of Bound）的检测方法，UAF (Use After Free) 的检测原理则是：

当一块内存释放时，系统会去调用 Scudo 中的quarantineOrDeallocateChunk方法~

释放的内存会生成一个新的 tag，该 tag 有别于之前的 tag，因此可以保证 immediate UAF 被100%地检测出来~

不过长时间的 UAF 可能会因为该内存经历了多次分配/释放而发生漏检（因为随机的 tag 也可能会重复）~

2.3 奇偶数 mask

上文提到OddEvenMask这个mask会限制随机 tag 从哪些数中选取~

对于虚拟地址连续的内存块(Block)，scudo 会将其 OddEvenMask 间隔地赋值为 0xaaaa 和 0x5555（0x5555 左移 1 位即 0xaaaa，0xa=0b1010，0x5=0b0101）~

0xaaaa 和 0x5555 是完全互斥的tag 集合，OddEvenMask 为 0xaaaa，则 tag 只能选择奇数，反之 tag 只能选择非 0 的偶数~

uptr computeOddEvenMaskForPointerMaybe(Options Options, uptr Ptr,
                                       uptr ClassId) {
  if (!Options.get(OptionBit::UseOddEvenTags))
    return 0;

  // If a chunk's tag is odd, we want the tags of the surrounding blocks to be
  // even, and vice versa. Blocks are laid out Size bytes apart, and adding
  // Size to Ptr will flip the least significant set bit of Size in Ptr, so
  // that bit will have the pattern 010101... for consecutive blocks, which we
  // can use to determine which tag mask to use.
  return 0x5555U << ((Ptr >> SizeClassMap::getSizeLSBByClassId(ClassId)) & 1);
}

对下面这行代码的解释如下：

0x5555U << ((Ptr >> SizeClassMap::getSizeLSBByClassId(ClassId)) & 1);

首先，通过SizeClassMap::getSizeLSBByClassId(ClassId) 获取与指针相关的内存块的Size值的最低有效位（说人话，一般块 size 是2k~16k，以块size 2k为例，2k = 2^11，返回值就是11）~
然后，通过右移操作(Ptr >> SizeClassMap::getSizeLSBByClassId(ClassId))将Ptr中的最低有效位移动到最右边，再通过与操作& 1来获取该位的值（0或1）~
最后，通过左移操作0x5555U << ((Ptr >> SizeClassMap::getSizeLSBByClassId(ClassId)) & 1)，将0x5555U左移0位或1位，重新构造一个标签掩码（0x5555 或 0xaaaa）并返回~

这样一来，相邻的两个 Block 一定不会使用相同的 tag，保证了相邻的越界可以 100% 被检测出来~

不过，由于每个内存块 tag 可选择的范围缩小一半，因此UAF的漏检率 (false-negatives) 反倒会提高~

该特性可以通过mallopt 方法设置 M_MEMTAG_TUNING 选项进行选择~

int mallopt(M_MEMTAG_TUNING, level)
where level is:
● M_MEMTAG_TUNING_BUFFER_OVERFLOW   （OddEvenMask打开，默认值）
● M_MEMTAG_TUNING_UAF               （OddEvenMask关闭）

3.Secondary Allocator 的 MTE 实现

Secondary Allocator 通过 mmap 分配出新的 vma 区域~

与 Primary Allocator 一样，上图中的 Content 也是用户真实数据存放的位置，它的结束地址是按页对齐的~

与 Primary Allocator 不同的地方是

Content 起始地址 Ptr 前存放两个 Header，一个是 Chunk Header，与 Primary Allocator 的作用相同；另一个是 LargeBlock Header，属于 Secondary 独有的设计，其中主要存储前后vma的指针（链表结构）~
新 vma 前后各有一个不可访问的保护页，保护页是按页对齐的，因此在保护页与 LargeBlock Header 之间一般会有补齐的内存~

Secondary Allocator 分配出的内存，其 tag 策略也与 Primary Allocator 不同：

当 MTE 开启后，分配器不会为 Content 设置 tag，因此它的 tag 保持默认值 0~

Chunk Header 对应的 tag 设置为固定值 2~

LargeBlock Header 和 Padding 对应的 tag 设置为固定值 1~

这样一来，前后溢出就都可以被检测到了：

线性 Overflow 会直接访问到 vma 尾部的 Guard Page，由于其不可访问，因此会直接触发 SIGSEGV~
线性 Underflow 如果访问到 Chunk Header/LargeBlock Header/Padding，由于其 tag 不为 0（而指针 tag 为0），因此会产生 SIGSEGV 的错误；如果访问到头部的Guard Page，也会触发 SIGSEGV~

这种 tag 策略相比循环为 Content 打 tag 效率更高！

4.申请的内存不是16字节的整数倍

上面的讨论都基于一个前提，即动态分配的内存大小是 16 字节的整数倍，如果用户程序申请分配的内存不是 16 字节的整数倍，则可能会漏检！

比如下面的代码：

char *p = (char *)malloc(88);
*(p + 89) = 'n';

Scudo MTE 却无法检测出该错误，不过好在Scudo分配出来的Block都是按16字节对齐的，所以即便发生了这种溢出，也不会踩踏有效数据。

5.小结

Scudo 中的 MTE 支持用于检测 native 堆内存错误，检测的错误类型主要为 OOB (Out-of-Bounds，包含 Underflow 和 Overflow) 和 UAF (Use-After-Free)~

另外，Scudo 本身也支持 Double-Free 的检测，但是不依赖 MTE~

MTE - 栈内存检测原理：

MTE - 栈内存检测原理-CSDN博客

Youth cowboy

关注

16
点赞
踩
21

收藏

觉得还不错? 一键收藏
1
评论
MTE - 堆内存检测原理

MTE 堆内存检测的硬件（指令）基础和 MTE 栈内存检测是一样的~不同的是 MTE 栈内存检测的用户空间实现是编译器（llvm）完成的~而 MTE 堆内存检测的用户空间实现则是由内存分配器（scudo，llvm 的运行时库）完成的(of course，如果不使用scudo而使用其他的内存分配器，应用程序也可以通过调用 MTE 指令或者封装的 MTE 相关 api 来实现堆内存检测，只是会比较麻烦，一般不会这么做 ~~~)
复制链接

扫一扫

专栏目录