ULMK_LOG目录
4.12.1 BUFFER_OVERFLOW和BUFFER_UNDERFLOW检测
6.1 Android 14新增Recoverable GWP-Asan功能
6.1.2 app默认开启Recoverable GWP-Asan
6.1.3 Recoverable GWP-Asan内存释放
6.2 Android 14及以上版本默认app启用Recoverable GWP-Asan
一、 背景
该文档是基于Android 12版本和Android 14版本做分析。
二、GWP-Asan介绍
2.1 什么是GWP-ASan
基本用途:GWP-ASan是一种原生内存分配器功能,在内存分配时额外记录一些信息,用于检测DOUBLE_FREE、USE_AFTER_FREE、INVALID_FREE、BUFFER_OVERFLOW、BUFFER_UNDERFLOW五种内存错误问题。
GWP-ASan 分为可恢复的GWP-ASan(Android 14上默认对所有app启用)与基本GWP-ASan,
可恢复的 GWP-ASan(Recoverable GWP-ASan) 与基本 GWP-ASan 的不同之处体现在以下几个方面:
1)可恢复 GWP-ASan 仅在大约 1% 的应用启动时启用,而不是在每次应用启动时启用。
2)如果检测到“释放后堆使用”或“堆缓冲区溢出” bug,该 bug 会显示在崩溃报告 (tombstone) 中。此崩溃报告可通过 ActivityManager#getHistoricalProcessExitReasons API(与原始 GWP-ASan 相同)获取。
3)可恢复的 GWP-ASan 允许发生内存损坏,并且应用会继续运行,而不是在转储崩溃报告后退出。虽然该过程可能会照常进行,但不会再指定应用的行为。由于内存损坏,应用可能会在将来的任意时间点崩溃,或者可能会在没有用户可见的影响的情况下继续运行。
4)转储崩溃报告后,可恢复的 GWP-ASan 会被停用。因此,应用在每次启动时只能获取一份可恢复的 GWP-ASan 报告。
5)如果应用中安装了自定义信号处理程序,则系统永远不会调用它来请求指示可恢复 GWP-ASan 故障的 SIGSEGV 信号。
2.2 GWP-Asan与其他几类工具对比
相似点:gwp-asan(Google's Web Protector - AddressSanitizer)和 malloc_debug 都是用于检测用户指定进程的内存错误的工具,但也存在一些区别,如下:
检测类型 | 启用方式 | 实现原理 | 缺点 | |
GWP-Asan | use-after-free double-free invalid-free(释放与分配的地址不一致) heap-buffer-overflow heap-buffer-underflow | app进程: 详见3.1 Native进程: 详见3.2 | 采用guard page机制,将guard page或已释放的page设置为不可访问,详见第四节。 | |
malloc_debug | mem-leak Out-of-bounds use-after-free | 为进程设置属性值: adb shell setprop libc.debug.malloc.program xxx_process | 通过额外的内存来标记内存分配与释放的信息,根据该信息来进行检测与捕获内存错误。 | 1)相比GWP-Asan,消耗的内存更大 2)被google apex,无法定制化 |
HWASAN | heap-buffer-overflow Heap-use-after-free stack-buffer-overflow global-buffer-overflow double-free Use-after-return Alloc-dealloc-mismatch use-after-poision | / | / | 1)需重新编译 2)内存开销大、性能影响大,比Asan小 3)依赖于ARMv8支持 4)影响App包体积 5)无法线上线下大规模使用 |
Asan | heap-buffer-overflow Heap-use-after-free stack-buffer-overflow global-buffer-overflow double-free Use-after-return Alloc-dealloc-mismatch use-after-poision | / | / | 1)需重新编译 2)内存开销大、性能影响大 3)影响App包体积 4)无法线上线下大规模使用 |
coredump | heap-buffer-overflow Heap-use-after-free stack-buffer-overflow global-buffer-overflow double-free | / | / | 1)memory corruption问题栈非第一现场问题 2)需要了解内存管理的背景知识 3)app无法使用,需厂商客制化支持,默认关闭 |
Valgrind | heap-buffer-overflow Heap-use-after-free double-free out-of-memory | / | / | 1)性能损耗大、线上无法使用 2)检测时会触发app崩溃 |
2.3 GWP-ASan与其它内存分配器的兼容性
如果先安装GWP-ASan,则GWP-ASan与heapprofd/malloc_debug/malloc_hooks兼容。如果已经安装了这些其他库之一,则不启用GWP-ASan。
三、GWP-Asan如何使用
3.1 app进程
app进程启用gwp-asan,需要在AndroidManifest.xml中配置gwpAsanMode,如下:
// AndroidManifest.xml中配置
<application android:gwpAsanMode="always">
</application>
-----------------------------------------------------
<application>
<processes>
<!-- Create the (empty) application process -->
<process />
<!-- Create subprocesses with GWP-ASan both explicitly enabled and disabled. -->
<process android:process=":gwp_asan_enabled"
android:gwpAsanMode="always" />
<process android:process=":gwp_asan_disabled"
android:gwpAsanMode="never" />
</processes>
<!-- Target services and activities to be run on either the GWP-ASan enabled or disabled processes. -->
<activity android:name="android.gwpasan.GwpAsanEnabledActivity"
android:process=":gwp_asan_enabled" />
<activity android:name="android.gwpasan.GwpAsanDisabledActivity"
android:process=":gwp_asan_disabled" />
<service android:name="android.gwpasan.GwpAsanEnabledService"
android:process=":gwp_asan_enabled" />
<service android:name="android.gwpasan.GwpAsanDisabledService"
android:process=":gwp_asan_disabled" />
</application>
apk安装阶段会调用com_android_internal_os_Zygote.cpp中的android_mallopt方法,进行启用GWP-Asan功能(和native进程启用GWP-Asan方式一致)
// com_android_internal_os_Zygote.cpp
static void SpecializeCommon(JNIEnv* env, uid_t uid, gid_t gid, jintArray gids, jint runtime_flags,
jobjectArray rlimits, jlong permitted_capabilities,
jlong effective_capabilities, jint mount_external,
jstring managed_se_info, jstring managed_nice_name,
bool is_system_server, bool is_child_zygote,
jstring managed_instruction_set, jstring managed_app_data_dir,
bool is_top_app, jobjectArray pkg_data_info_list,
jobjectArray allowlisted_data_info_list, bool mount_data_dirs,
bool mount_storage_dirs) {
......
bool forceEnableGwpAsan = false;
switch (runtime_flags & RuntimeFlags::GWP_ASAN_LEVEL_MASK) {
default:
case RuntimeFlags::GWP_ASAN_LEVEL_NEVER:
break;
case RuntimeFlags::GWP_ASAN_LEVEL_ALWAYS:
forceEnableGwpAsan = true;
[[fallthrough]];
case RuntimeFlags::GWP_ASAN_LEVEL_LOTTERY:
android_mallopt(M_INITIALIZE_GWP_ASAN, &forceEnableGwpAsan, sizeof(forceEnableGwpAsan));
}
......
}
3.2 native进程
native进程启用gwp-asan,有两种方式:
1)在native进程的main函数中,将gwp-asan设置为开启状态
cc_binary {
name: "gwp-asan-test",
srcs: [
"gwpasantest.cpp",
],
shared_libs: [
"liblog",
"libbase",
"libutils",
],
header_libs: [
"bionic_libc_platform_headers",
],
}
#include <stdlib.h>
#include <stdio.h>
#include <bionic/malloc.h>
#include <android/log.h>
#include <log/log.h>
#define LOG_TAG "gwp-asan-test"
void heapUseAfterFree() {
int *ptr1 = (int*)malloc(4);
if(ptr1 != NULL) {
*ptr1 = 0xcccc;
free(ptr1);
*ptr1 = 0xabcd;
}
}
void doubleFree() {
int *ptr = (int*)malloc(4);
free(ptr);
free(ptr);
}
void heapBufferOverflow() {
int *ptr = (int*)malloc(sizeof(int)*100);
*(ptr+101) = 101;
*(ptr+102) = 102;
*(ptr+103) = 103;
*(ptr+104) = 104;
free(ptr);
}
int main(int argc, char **argv) {
// Android R以上版本,启用gwp-asan方式
// 初始化 gwp-asan 选项
android_mallopt_gwp_asan_options_t gwp_asan_options;
// 调用 android_mallopt 函数,启用 gwp-asan
android_mallopt(M_INITIALIZE_GWP_ASAN, &gwp_asan_options, sizeof(gwp_asan_options))
// android R版本启用gwp-asan方式
bool enableGwpAsan = true;
int opcode = M_INITIALIZE_GWP_ASAN;
void* arg = &enableGwpAsan;
size_t arg_size = sizeof(bool);
bool result = android_mallopt(opcode, arg, arg_size);
//
ALOG(LOG_DEBUG, LOG_TAG, "result: %d", result);
heapUseAfterFree();
doubleFree();
heapBufferOverflow();
// ...
return 0;
}
2)libc初始化流程阶段的MaybeInitGwpAsan增加判断property属性(默认为true),动态打开或关闭指定的进程的gwp-asan
bool MaybeInitGwpAsan(libc_globals* globals, const android_mallopt_gwp_asan_options_t& mallopt_options) {
......
// 判断指定进程的设置的properity,返回true
......
if (!ShouldGwpAsanSampleProcess(process_sample_rate)) {
return false;
}
......
}
native proc demo参考第5.2节.
四、GWP-Asan实现原理
基本原理:进程启用GWP-Asan功能后,malloc的内存分配以页对齐,控制页的读写权限来检测内存访问是否合法,如将某一页的权限改为不可读写,那么随后所有的读写访问都将产生SIGSEGV的错误(由内核发给问题进程)。从而对内存访问行为进行检测,当发生内存访问错误时,tombstone收集问题进程中的内存错误信息。
GWP-Asan设计框架:
4.1 进程启用GWP-Asan
调用android_mallopt启用GWP-Asan功能,如下:
// bionic/libc/platform/bionic/malloc.h
extern "C" bool android_mallopt(int opcode, void* arg, size_t arg_size) {
if (opcode == M_SET_ALLOCATION_LIMIT_BYTES) {
return LimitEnable(arg, arg_size);
}
if (opcode == M_INITIALIZE_GWP_ASAN) {
if (arg == nullptr || arg_size != sizeof(bool)) {
errno = EINVAL;
return false;
}
__libc_globals.mutate([&](libc_globals* globals) {
return MaybeInitGwpAsan(globals, *reinterpret_cast<bool*>(arg));
});
}
errno = ENOTSUP;
return false;
}
// bionic/libc/bionic/gwp_asan_wrappers.cpp
// Maybe initializes GWP-ASan. Called by android_mallopt() and libc's
// initialisation. This should always be called in a single-threaded context.
// android_mallopt方法和每个进程加载libc库时会调用
bool MaybeInitGwpAsan(libc_globals* globals, bool force_init) {
// 该进程中已经初始化GWP-ASan,直接返回
if (GwpAsanInitialized) {
error_log("GWP-ASan was already initialized for this process.");
return false;
}
// If the caller hasn't forced GWP-ASan on, check whether we should sample
// this process.
// 如果该进程没有指定强制开启GWP-ASan,那么是否开启由ShouldGwpAsanSampleProcess决定,
// 产生的随机数为128的倍数,返回为true。
if (!force_init && !ShouldGwpAsanSampleProcess()) {
return false;
}
// GWP-ASan is compatible with heapprofd/malloc_debug/malloc_hooks iff
// GWP-ASan was installed first. If one of these other libraries was already
// installed, we don't enable GWP-ASan. These libraries are normally enabled
// in libc_init after GWP-ASan, but if the new process is a zygote child and
// trying to initialize GWP-ASan through mallopt(), one of these libraries may
// be installed. It may be possible to change this in future by modifying the
// internal dispatch pointers of these libraries at this point in time, but
// given that they're all debug-only, we don't really mind for now.
if (GetDefaultDispatchTable() != nullptr) {
// Something else is installed.
return false;
}
// GWP-ASan's initialization is always called in a single-threaded context, so
// we can initialize lock-free.
// Set GWP-ASan as the malloc dispatch table.
// 在原生的malloc\free阶段会做dispatch_table判断,调用gwp-asan malloc\free
globals->malloc_dispatch_table = gwp_asan_dispatch;
atomic_store(&globals->default_dispatch_table, &gwp_asan_dispatch);
// If malloc_limit isn't installed, we can skip the default_dispatch_table
// lookup.
if (GetDispatchTable() == nullptr) {
atomic_store(&globals->current_dispatch_table, &gwp_asan_dispatch);
}
#ifndef LIBC_STATIC
SetGlobalFunctions(gwp_asan_gfunctions);
#endif // LIBC_STATIC
GwpAsanInitialized = true;
gwp_asan_initialize(NativeAllocatorDispatch(), nullptr, nullptr);
return true;
}
// This function handles initialisation as asked for by MallocInitImpl. This
// should always be called in a single-threaded context.
bool gwp_asan_initialize(const MallocDispatch* dispatch, bool*, const char*) {
prev_dispatch = dispatch;
// 默认选项配置
Options Opts;
Opts.Enabled = true;
Opts.MaxSimultaneousAllocations = 32;
Opts.SampleRate = 2500; // 采样率为2500ms
Opts.InstallSignalHandlers = false;
Opts.InstallForkHandlers = true;
Opts.Backtrace = android_unsafe_frame_pointer_chase;
// gwp-asan初始化,详见4.1
GuardedAlloc.init(Opts);
// TODO(b/149790891): The log line below causes ART tests to fail as they're
// not expecting any output. Disable the output for now.
// info_log("GWP-ASan has been enabled.");
// 将分配器中两个指针赋值给全局变量,这样可以在生成tombstone时方便地获取它们,
// 通过它们便可以得知重要的调试信息(metadata记录了alloc/free的调用栈信息)
__libc_shared_globals()->gwp_asan_state = GuardedAlloc.getAllocatorState();
__libc_shared_globals()->gwp_asan_metadata = GuardedAlloc.getMetadataRegion();
return true;
}
MaxSimultaneousAllocations:Guarded Pool Memory中可用于分配的slots(pages)数量。当MaxSimultaneousAllocations为32时,Guarded Pool Memory需要分配65页,Metadata需要分配5页,FreeSlots需要分配1页,共耗费284KiB(1Kib=1024bytes)。所有的分配都通过mmap进行,区别在于Guarded Pool Memory初始化后所有的页都不可读写,实际分配时再改变页的属性,而Metadata和FreeSlots在初始化后便可以读写了。
SampleRate:采样率,2500次分配会触发1次采样,进而从Guarded Pool Memory中分配。
Backtrace:获取调用栈时采用的函数,这里采用的是基于FP的栈回溯方案。通常而言,它只适用于64位场景。因为只有在64位场景下,FP才会默认存入栈中。
4.2 初始化
GWP-Asan初始化主要做了以下事情:
1)计算出GWP-Asan内存池中的内存页数量,N个slot page,N+1个guard page,1个internal detetor page,N幂个Metadata page,1个slotfree array page
2)申请内存,并将所有内存页状态属性设置为PORT_NONE(不可访问)
3)获取ProcessSample、SampleRate等属性值
4)根据SampleRate计算出getThreadLocals()->NextSampleCounter(在malloc阶段中使用)
5)计算出GWP-Asan内存池的起始地址与结束地址
slot page:用于存放原始数据;
guard page:用于检测访问越界;
MetaData:用于存放GWP-Asan alloc/free的信息,包括内存状态信息(内存分配的地址&大小&slot内存状态&线程号、内存释放时slot内存状态&线程号)、函数调用栈;
free slot:用于记录空闲的slot page信息;
void GuardedPoolAllocator::init(const options::Options &Opts) {
// Note: We return from the constructor here if GWP-ASan is not available.
// This will stop heap-allocation of class members, as well as mmap() of the
// guarded slots.
// 1. 判断是否启用 GWP-ASAN
if (!Opts.Enabled || Opts.SampleRate == 0 ||
Opts.MaxSimultaneousAllocations == 0)
return;
Check(Opts.SampleRate >= 0, "GWP-ASan Error: SampleRate is < 0.");
Check(Opts.SampleRate < (1 << 30), "GWP-ASan Error: SampleRate is >= 2^30.");
Check(Opts.MaxSimultaneousAllocations >= 0,
"GWP-ASan Error: MaxSimultaneousAllocations is < 0.");
SingletonPtr = this;
Backtrace = Opts.Backtrace;
State.VersionMagic = {{AllocatorVersionMagic::kAllocatorVersionMagic[0],
AllocatorVersionMagic::kAllocatorVersionMagic[1],
AllocatorVersionMagic::kAllocatorVersionMagic[2],
AllocatorVersionMagic::kAllocatorVersionMagic[3]},
AllocatorVersionMagic::kAllocatorVersion,
0};
// slot数量,默认32
State.MaxSimultaneousAllocations = Opts.MaxSimultaneousAllocations;
// linux kernel一个page为4KB
const size_t PageSize = getPlatformPageSize();
// getPageAddr() and roundUpTo() assume the page size to be a power of 2.
assert((PageSize & (PageSize - 1)) == 0);
State.PageSize = PageSize;
// Number of pages required =
// + MaxSimultaneousAllocations * maximumAllocationSize (N pages per slot)
// + MaxSimultaneousAllocations (one guard on the left side of each slot)
// + 1 (an extra guard page at the end of the pool, on the right side)
// + 1 (an extra page that's used for reporting internally-detected crashes,
// like double free and invalid free, to the signal handler; see
// raiseInternallyDetectedError() for more info)
// guard pool byte = 4KB * (2 + 32) + 32 * 4KB,共64个page,包括slot page为32,guard page为33,
// 还有额外的1个page,用于报告内部检测到的错误,如double free或invalid free的内存错误管理
size_t PoolBytesRequired =
PageSize * (2 + State.MaxSimultaneousAllocations) +
State.MaxSimultaneousAllocations * State.maximumAllocationSize();
assert(PoolBytesRequired % PageSize == 0);
// 分配一块guard内存池,返回内存池的首地址,并将所有的内存页状态设置为PROT_NONE
void *GuardedPoolMemory = reserveGuardedPool(PoolBytesRequired);
// 为 Metadata分配5个page的内存,返回首地址
// static constexpr const char *kGwpAsanMetadataName = "GWP-ASan Metadata";
size_t BytesRequired =
roundUpTo(State.MaxSimultaneousAllocations * sizeof(*Metadata), PageSize);
Metadata = reinterpret_cast<AllocationMetadata *>(
map(BytesRequired, kGwpAsanMetadataName));
// Allocate memory and set up the free pages queue.
// 为 free slot分配1个page的内存,返回首地址
// static constexpr const char *kGwpAsanFreeSlotsName = "GWP-ASan Metadata";
BytesRequired = roundUpTo(
State.MaxSimultaneousAllocations * sizeof(*FreeSlots), PageSize);
FreeSlots =
reinterpret_cast<size_t *>(map(BytesRequired, kGwpAsanFreeSlotsName));
// Multiply the sample rate by 2 to give a good, fast approximation for (1 /
// SampleRate) chance of sampling.
// 计算AdjustedSampleRatePlusOne,在gwsp-asan alloc中的shouldSample()判断条件中会用到
// 如果采样率不为1,AdjustedSampleRatePlusOne = (Opts.SampleRate) * 2 + 1
// 如果采样率为1,AdjustedSampleRatePlusOne = 2,在libc中读取 smapleRate prop属性值,属性值为1的话,说明被malloc
if (Opts.SampleRate != 1)
AdjustedSampleRatePlusOne = static_cast<uint32_t>(Opts.SampleRate) * 2 + 1;
else
AdjustedSampleRatePlusOne = 2;
initPRNG();
// 在GWP-Asan malloc时,会用到getThreadLocals()->NextSampleCounter,
// 若AdjustedSampleRatePlusOne为2,NextSampleCounter为1,说明可以被GWP-Asan malloc
getThreadLocals()->NextSampleCounter =
((getRandomUnsigned32() % (AdjustedSampleRatePlusOne - 1)) + 1) &
ThreadLocalPackedVariables::NextSampleCounterMask;
// 内存池的起始地址与结束地址
State.GuardedPagePool = reinterpret_cast<uintptr_t>(GuardedPoolMemory);
State.GuardedPagePoolEnd =
reinterpret_cast<uintptr_t>(GuardedPoolMemory) + PoolBytesRequired;
if (Opts.InstallForkHandlers)
installAtFork();
}
4.3 内存分配
内存分配主要做了以下几件事情:
1)寻找空闲slot,并选择对齐方式。
2)通过mprotect改变该slot的访问状态属性,设置为PROT_READ | PROT_WRITE
3)将此次分配的调用栈记录到对应的Metadata结构体中。
// bionic/libc/bionic/malloc_common.cpp
extern "C" void* malloc(size_t bytes) {
auto dispatch_table = GetDispatchTable();
void *result;
if (__predict_false(dispatch_table != nullptr)) {
result = dispatch_table->malloc(bytes);
} else {
result = Malloc(malloc)(bytes);
}
if (__predict_false(result == nullptr)) {
warning_log("malloc(%zu) failed: returning null pointer", bytes);
return nullptr;
}
return MaybeTagPointer(result);
}
// bionic/libc/bionic/gwp_asan_wrappers.cpp
void* gwp_asan_malloc(size_t bytes) {
// 判断本次malloc是否被选中
if (__predict_false(GuardedAlloc.shouldSample())) {
// gwp-asan alloc分配内存,如果分配成功,返回内存地址
if (void* result = GuardedAlloc.allocate(bytes)) {
return result;
}
}
return prev_dispatch->malloc(bytes);
}
// external/gwp_asan/gwp_asan/guarded_pool_allocator.h
// Return whether the allocation should be randomly chosen for sampling.
// NextSampleCounter为1,return true
GWP_ASAN_ALWAYS_INLINE bool shouldSample() {
// NextSampleCounter == 0 means we "should regenerate the counter".
// == 1 means we "should sample this allocation".
// AdjustedSampleRatePlusOne is designed to intentionally underflow. This
// class must be valid when zero-initialised, and we wish to sample as
// infrequently as possible when this is the case, hence we underflow to
// UINT32_MAX.
// 如果NextSampleCounter为0,需要生成一个随机数,赋值给NextSampleCounter
if (GWP_ASAN_UNLIKELY(getThreadLocals()->NextSampleCounter == 0))
getThreadLocals()->NextSampleCounter =
((getRandomUnsigned32() % (AdjustedSampleRatePlusOne - 1)) + 1) &
ThreadLocalPackedVariables::NextSampleCounterMask; // 0x80000000 - 1,即0x7FFFFFFF
// 如果NextSampleCounter为1或生成的随机数NextSampleCounter为1,返回true
return GWP_ASAN_UNLIKELY(--getThreadLocals()->NextSampleCounter == 0);
}
AdjustedSampleRatePlusOne = static_cast<uint32_t>(Opts.SampleRate) * 2 + 1;
shouldSample:为每个线程随机生成一个NextSampleCounter,每调用一次malloc则减一。当NextSampleCounter为1时(--后为0),则shouldSample返回true,分配从Guarded Pool Memory中取,除此之外都返回false。当NextSampleCounter为0时,生成一个新的随机值。Android中默认的采样率为1/2500,因此AdjustedSampleRatePlusOne的值为5001。至于为什么要乘2,是因为此种采样策略的概率应该取所有随机数的均值。
// external/gwp_asan/gwp_asan/guarded_pool_allocator.cpp
void *GuardedPoolAllocator::allocate(size_t Size, size_t Alignment) {
// GuardedPagePoolEnd == 0 when GWP-ASan is disabled. If we are disabled, fall
// back to the supporting allocator.
if (State.GuardedPagePoolEnd == 0) {
getThreadLocals()->NextSampleCounter =
(AdjustedSampleRatePlusOne - 1) &
ThreadLocalPackedVariables::NextSampleCounterMask;
return nullptr;
}
if (Size == 0)
Size = 1;
if (Alignment == 0)
Alignment = alignof(max_align_t);
// 1. Size是否超过了此分配区允许分配的最大内存State.maximumAllocationSize,默认4K
if (!isPowerOfTwo(Alignment) || Alignment > State.maximumAllocationSize() ||
Size > State.maximumAllocationSize())
return nullptr;
size_t BackingSize = getRequiredBackingSize(Size, Alignment, State.PageSize);
if (BackingSize > State.maximumAllocationSize())
return nullptr;
// Protect against recursivity.
if (getThreadLocals()->RecursiveGuard)
return nullptr;
ScopedRecursiveGuard SRG;
size_t Index;
{
ScopedLock L(PoolMutex);
Index = reserveSlot();
}
if (Index == kInvalidSlotID)
return nullptr;
uintptr_t SlotStart = State.slotToAddr(Index);
AllocationMetadata *Meta = addrToMetadata(SlotStart);
uintptr_t SlotEnd = State.slotToAddr(Index) + State.maximumAllocationSize();
uintptr_t UserPtr;
// Randomly choose whether to left-align or right-align the allocation, and
// then apply the necessary adjustments to get an aligned pointer.
// 内存以16字节对齐
if (getRandomUnsigned32() % 2 == 0)
UserPtr = alignUp(SlotStart, Alignment);
else
UserPtr = alignDown(SlotEnd - Size, Alignment);
assert(UserPtr >= SlotStart);
assert(UserPtr + Size <= SlotEnd);
// If a slot is multiple pages in size, and the allocation takes up a single
// page, we can improve overflow detection by leaving the unused pages as
// unmapped.
const size_t PageSize = State.PageSize;
allocateInGuardedPool(
reinterpret_cast<void *>(getPageAddr(UserPtr, PageSize)),
roundUpTo(Size, PageSize));
Meta->RecordAllocation(UserPtr, Size);
{
ScopedLock UL(BacktraceMutex);
// 记录alloc的函数调用栈,就是调用android_unsafe_frame_pointer_chase
Meta->AllocationTrace.RecordBacktrace(Backtrace);
}
return reinterpret_cast<void *>(UserPtr);
}
// 将 Ptr 向下舍入到最接近的 4096 的倍数,即找到包含该指针的页的起始地址
uintptr_t getPageAddr(uintptr_t Ptr, uintptr_t PageSize) {
return Ptr & ~(PageSize - 1);
}
// 将 Size 向上舍入到最接近的 4096 的倍数。
// 若Size 是 1 ~ 4096,结果是 4096, Size 是 4097,结果是 8192。
size_t roundUpTo(size_t Size, size_t Boundary) {
return (Size + Boundary - 1) & ~(Boundary - 1);
}
4.4 内存对齐
1)malloc分配的首地址为什么需要概率性做align up和align down内存对齐的设计?
对malloc分配的首地址概率性做align up和align down内存对齐,可以概率性检测出该slot page越界访问的内存错误。优点是减少检测工具对系统性能影响,缺点是存在漏检的可能性。
2)为什么向前对齐(align up)不需要做16字节对齐?而向后对齐(align down)需要做16字节对齐?
由于malloc返回的地址原本是16的倍数,因此向前对齐(align up)不需要做16字节对齐。而向后对齐(align down),传入的是SlotEnd - Size,不一定是16倍数,因此需要做16字节对齐,cpu是16字节访问,可以提升访问效率。
核心代码段:
void *allocate(size_t Size, size_t Alignment = alignof(max_align_t)); // 16字节
void *GuardedPoolAllocator::allocate(size_t Size, size_t Alignment) {
...
if (Alignment == 0)
Alignment = alignof(max_align_t); // 16
...
// 内存页的起始地址
uintptr_t SlotStart = State.slotToAddr(Index);
// 内存页的结束地址
uintptr_t SlotEnd = State.slotToAddr(Index) + State.maximumAllocationSize();
// 内存对齐后的存放数据的起始地址
uintptr_t UserPtr;
// Randomly choose whether to left-align or right-align the allocation, and
// then apply the necessary adjustments to get an aligned pointer.
if (getRandomUnsigned32() % 2 == 0)
UserPtr = alignUp(SlotStart, Alignment);
else
UserPtr = alignDown(SlotEnd - Size, Alignment);
...
}
uintptr_t GuardedPoolAllocator::alignUp(uintptr_t Ptr, size_t Alignment) {
assert(isPowerOfTwo(Alignment) && "Alignment must be a power of two!");
assert(Alignment != 0 && "Alignment should be non-zero");
// 若Ptr为16的倍数,Ptr & (Alignment - 1)为0,直接返回,无需做对齐
if ((Ptr & (Alignment - 1)) == 0)
return Ptr;
Ptr += Alignment - (Ptr & (Alignment - 1));
return Ptr;
}
uintptr_t GuardedPoolAllocator::alignDown(uintptr_t Ptr, size_t Alignment) {
assert(isPowerOfTwo(Alignment) && "Alignment must be a power of two!");
assert(Alignment != 0 && "Alignment should be non-zero");
if ((Ptr & (Alignment - 1)) == 0)
return Ptr;
// Ptr & (Alignment - 1)不为0,向前平移n*16-size,如分配20字节,向前平移12个字节,
// 这样的话,首地址距离slot page起始地址和slot page结束地址为16 bit的倍数
Ptr -= Ptr & (Alignment - 1);
return Ptr;
}
Slot page的起始地址记作:SlotStart
Slot page的起始地址记作:SlotEnd
malloc分配返回的首地址记作:UserPtr
4.4.1 align up 16字节对齐原理
举个例子分析align up对齐的实现原理。如,char* ptr = (char*) malloc(20);,分配20个字节的内存,页起始地址为SlotStart ,页结束地址为SlotEnd = SlotStart + 4096。
首地址(存放真实数据的起始地址)为slot page的起始地址,即UserPtr = SlotStart.
align up对齐,检测slot page中20字节区域的左侧越界访问的demo案例,如下:
void alignUpLeftCheck() {
char* ptr = (char*)malloc(20); // char类型数据占1个字节
std::cout << "ptr = " << ptr << std::endl;
// 越界访问到左边的guard page区域,可以检测出来异常,发生crash
*(ptr-1) = 'A';
std::cout << "heapCorruptionLeftAdjacent *(ptr-1)" << std::endl;
}
/*void alignUpLeftCheck_2() {
int* ptr = new int[3]; // int类型数据占4个字节
std::cout << "ptr = " << ptr << std::endl;
// 越界访问到左边的guard page区域,可以检测出来异常,发生crash
ptr[-1] = 'A';
std::cout << "heapCorruption left ptr[-1]" << std::endl;
}*/
align up对齐,检测slot page中20字节区域的右侧越界访问的demo案例,如下:
void alignUpRightCheck() {
char* ptr = (char*)malloc(20); // char类型数据占1个字节
std::cout << "ptr = " << ptr << std::endl;
// 越界访问slot page右边区域(4076)的第1个字节,检测不出异常,不发生crash
int addr = 20;
*(ptr+addr) = 'A';
std::cout << "heapCorruptionRight *(ptr+20)" << std::endl;
// 越界访问slot page右边区域(4076)的最后1个字节,检测不出异常,不发生crash
addr = 4095;
*(ptr+addr) = 'A';
std::cout << "alignUpRightCheck *(ptr+4095)" << std::endl;
// 越界访问到左边的guard page区域,可以检测出来异常,发生crash
addr = 4096;
*(ptr+addr) = 'A';
std::cout << "alignUpRightCheck *(ptr+4096)" << std::endl;
}
/*void alignUpRightCheck_2() {
int* ptr = new int[3]; // int类型数据占4个字节
std::cout << "ptr = " << ptr << std::endl;
// 越界访问slot page右边区域(4076)的前4个字节,检测不出异常,不发生crash
ptr[3] = 'A';
std::cout << "heapCorruption left ptr[3]" << std::endl;
// 越界访问slot page最右边区域(4076)的最后4个字节,检测不出异常,不发生crash
ptr[1021] = 'A';
std::cout << "heapCorruption left ptr[1021]" << std::endl;
// 越界访问到右边的guard page区域,可以检测出来异常,发生crash
ptr[1022] = 'A';
std::cout << "heapCorruption left ptr[1022]" << std::endl;
}*/
4.4.2 align down 16字节对齐原理
举个例子分析align down对齐的实现原理。如,char* ptr = (char*) malloc(20);,分配20个字节的内存,slot page起始地址为SlotStart ,页结束地址为SlotEnd = SlotStart + 4096。
首地址(存放真实数据的起始地址)为slot page的结束地址减去size后向前移12个字节,即UserPtr = SlotEnd - size - n (n:16倍数向前平移n后,首地址为16倍数。如,分配20个字节,向前平移12)
align down对齐且向前移12个字节,检测slot page中20字节区域的左侧越界访问的demo案例,如下:
void alignDown16_LeftCheck() {
char* ptr = (char*)malloc(20); // char类型数据占1个字节
std::cout << "alignDown16_LeftCheck ptr = " << &ptr << std::endl;
//malloc_test2(&ptr);
int addr = 0;
// 越界访问slot page左边区域(4064)的第1个字节,检测不出异常,不发生crash
addr = 1;
*(ptr-addr) = 'A';
std::cout << "alignDown16_LeftCheck *(ptr-1)" << std::endl;
addr = 4064;
*(ptr-addr) = 'A';
std::cout << "alignDown16_LeftCheck *(ptr-4064)" << std::endl;
// crash occur if 32 bit
addr = 4065;
*(ptr-addr) = 'A';
std::cout << "alignDown16_LeftCheck *(ptr-4065)" << std::endl;
}
检测slot page中20字节区域的右侧越界访问的demo案例,如下:
void alignDown16_RightCheck() {
char* ptr = (char*)malloc(20); // char类型数据占1个字节
std::cout << "ptr = " << ptr << std::endl;
int addr = 0;
// 越界访问slot page右边区域(12)的第1个字节,检测不出异常,不发生crash
addr = 20;
*(ptr+addr) = 'A';
std::cout << "heapCorruptionUpAdjacent *(ptr+20)" << std::endl;
// 越界访问slot page右边区域(12)的第12个字节(最后1个字节),检测不出异常,不发生crash
addr = 31;
*(ptr-addr) = 'A';
std::cout << "heapCorruptionLeftAdjacent *(ptr+31)" << std::endl;
// 越界访问到右边的guard page区域,可以检测出来异常,发生crash
// crash if 32 bit
addr = 32;
*(ptr+addr) = 'A';
std::cout << "alignDown8_RightCheck *(ptr+32)" << std::endl;
}
4.5 内存释放
内存释放主要做了以下几件事情:
1)当发生INVALID_FREE或DOUBLE_FREE时,主动触发一个SIGSEGV段错误信号
2)内存标记:GWP-ASan 会标记释放的内存页状态属性设置为PROT_NONE,防止后续的非法访问
3)内存池管理:释放的内存页会重新加入到内存池中,以便后续重新分配
// bionic/libc/bionic/malloc_common.cpp
extern "C" void free(void* mem) {
auto dispatch_table = GetDispatchTable();
mem = MaybeUntagAndCheckPointer(mem);
if (__predict_false(dispatch_table != nullptr)) {
dispatch_table->free(mem);
} else {
Malloc(free)(mem);
}
}
// bionic/libc/bionic/gwp_asan_wrappers.cpp
void gwp_asan_free(void* mem) {
// 判断是否通过gwp-asan alloc分配的内存
if (__predict_false(GuardedAlloc.pointerIsMine(mem))) {
GuardedAlloc.deallocate(mem);
return;
}
prev_dispatch->free(mem);
}
// external/gwp_asan/gwp_asan/guarded_pool_allocator.h
// Returns whether the provided pointer is a current sampled allocation that
// is owned by this pool.
GWP_ASAN_ALWAYS_INLINE bool pointerIsMine(const void *Ptr) const {
return State.pointerIsMine(Ptr);
// This holds the state that's shared between the GWP-ASan allocator and the
// crash handler. This, in conjunction with the Metadata array, forms the entire
// set of information required for understanding a GWP-ASan crash.
struct AllocatorState {
constexpr AllocatorState() {}
// Returns whether the provided pointer is a current sampled allocation that
// is owned by this pool.
GWP_ASAN_ALWAYS_INLINE bool pointerIsMine(const void *Ptr) const {
uintptr_t P = reinterpret_cast<uintptr_t>(Ptr);
return P < GuardedPagePoolEnd && GuardedPagePool <= P;
}
// Returns the address of the N-th guarded slot.
uintptr_t slotToAddr(size_t N) const;
// Returns the largest allocation that is supported by this pool.
size_t maximumAllocationSize() const;
// Gets the nearest slot to the provided address.
size_t getNearestSlot(uintptr_t Ptr) const;
// Returns whether the provided pointer is a guard page or not. The pointer
// must be within memory owned by this pool, else the result is undefined.
bool isGuardPage(uintptr_t Ptr) const;
// The number of guarded slots that this pool holds.
size_t MaxSimultaneousAllocations = 0;
// Pointer to the pool of guarded slots. Note that this points to the start of
// the pool (which is a guard page), not a pointer to the first guarded page.
uintptr_t GuardedPagePool = 0;
uintptr_t GuardedPagePoolEnd = 0;
// Cached page size for this system in bytes.
size_t PageSize = 0;
// The type and address of an internally-detected failure. For INVALID_FREE
// and DOUBLE_FREE, these errors are detected in GWP-ASan, which will set
// these values and terminate the process.
Error FailureType = Error::UNKNOWN;
uintptr_t FailureAddress = 0;
};
// external/gwp_asan/gwp_asan/guarded_pool_allocator.cpp
void GuardedPoolAllocator::deallocate(void *Ptr) {
assert(pointerIsMine(Ptr) && "Pointer is not mine!");
uintptr_t UPtr = reinterpret_cast<uintptr_t>(Ptr);
size_t Slot = State.getNearestSlot(UPtr);
uintptr_t SlotStart = State.slotToAddr(Slot);
// 根据内存地址找到对应的MetaData
AllocationMetadata *Meta = addrToMetadata(UPtr);
// 被释放的内存地址和分配的内存地址不一致,视为无效释放,,触发一个SIGSEGV段错误信号
// Meta->Addr会在内存分配的时候记录
if (Meta->Addr != UPtr) {
// If multiple errors occur at the same time, use the first one.
ScopedLock L(PoolMutex);
trapOnAddress(UPtr, Error::INVALID_FREE);
}
// Intentionally scope the mutex here, so that other threads can access the
// pool during the expensive markInaccessible() call.
// 重复释放内存,触发一个SIGSEGV段错误信号
{
ScopedLock L(PoolMutex);
if (Meta->IsDeallocated) {
trapOnAddress(UPtr, Error::DOUBLE_FREE);
}
// Ensure that the deallocation is recorded before marking the page as
// inaccessible. Otherwise, a racy use-after-free will have inconsistent
// metadata.
// 将内存page标记为已释放,不可读状态
Meta->RecordDeallocation();
// Ensure that the unwinder is not called if the recursive flag is set,
// otherwise non-reentrant unwinders may deadlock.
if (!getThreadLocals()->RecursiveGuard) {
ScopedRecursiveGuard SRG;
ScopedLock UL(BacktraceMutex);
// // 记录free的函数调用栈,就是调用android_unsafe_frame_pointer_chase
Meta->DeallocationTrace.RecordBacktrace(Backtrace);
}
}
// 将释放的内存重新放回到guard pool内存池
deallocateInGuardedPool(reinterpret_cast<void *>(SlotStart),
State.maximumAllocationSize());
// And finally, lock again to release the slot back into the pool.
ScopedLock L(PoolMutex);
// 将一个已释放的内存槽的索引存储到空闲内存槽数组中,以便后续重新分配时能够复用该内存槽
freeSlot(Slot);
}
当GWP-Asan free时,发生INVALID_FREE或DOUBLE_FREE时,触发一个SIGSEGV段错误信号,这样debuggerd进程可以捕获到信号,tombstoned收集Metadata数据。
void GuardedPoolAllocator::trapOnAddress(uintptr_t Address, Error E) {
State.FailureType = E;
State.FailureAddress = Address;
// Raise a SEGV by touching first guard page.
volatile char *p = reinterpret_cast<char *>(State.GuardedPagePool);
*p = 0;
__builtin_unreachable();
}
4.6 随机抽样机制
为了兼顾性能,GWP-Asan采用了随机抽样机制,包括进程随机抽样性和malloc随机抽样,同时被抽中,才会走GWP-Asan内存分配流程。
4.6.1 进程随机抽样
进程随机抽样:进程启动后加载libc库初始化阶段,进行进程随机性抽样,如果被抽中,初始化GWP-Asan内存池。
核心代码:
// bionic/libc/bionic/gwp_asan_wrappers.cpp
bool MaybeInitGwpAsan(libc_globals* globals,
const android_mallopt_gwp_asan_options_t& mallopt_options) {
......
Options options;
unsigned process_sample_rate = kDefaultProcessSampling; // 128
// 读取属性配置
if (!GetGwpAsanOptions(&options, &process_sample_rate, mallopt_options) &&
mallopt_options.desire == Action::DONT_TURN_ON_UNLESS_OVERRIDDEN) {
return false;
}
if (options.SampleRate == 0 || process_sample_rate == 0 ||
options.MaxSimultaneousAllocations == 0) {
return false;
}
// 进行进程随机抽样
if (!ShouldGwpAsanSampleProcess(process_sample_rate)) {
return false;
}
......
}
bool ShouldGwpAsanSampleProcess(unsigned sample_rate) {
if (!isPowerOfTwo(sample_rate)) {
warning_log(
"GWP-ASan process sampling rate of %u is not a power-of-two, and so modulo bias occurs.",
sample_rate);
}
uint8_t random_number;
__libc_safe_arc4random_buf(&random_number, sizeof(random_number));
return random_number % sample_rate == 0;
}
生成随机数:
// bionic/libc/bionic/bionic_arc4random.cpp
void __libc_safe_arc4random_buf(void* buf, size_t n) {
// Only call arc4random_buf once we have `/dev/urandom` because getentropy(3)
// will fall back to using `/dev/urandom` if getrandom(2) fails, and abort if
// if can't use `/dev/urandom`.
static bool have_urandom = access("/dev/urandom", R_OK) == 0;
if (have_urandom) {
arc4random_buf(buf, n);
return;
}
static size_t at_random_bytes_consumed = 0;
if (at_random_bytes_consumed + n > 16) {
async_safe_fatal("ran out of AT_RANDOM bytes, have %zu, requested %zu",
16 - at_random_bytes_consumed, n);
}
memcpy(buf, reinterpret_cast<char*>(getauxval(AT_RANDOM)) + at_random_bytes_consumed, n);
at_random_bytes_consumed += n;
return;
}
4.6.2 malloc随机抽样
malloc抽样随机性:在进程被抽中前提下,当执行malloc时,进行malloc抽样,如果被抽中,走GWP-Asan malloc分配。
(每次malloc\free都会去回溯函数调用栈,对性能有一定的影响,因此设计了malloc随机数机制)
// bionic/libc/bionic/gwp_asan_wrappers.cpp
void* gwp_asan_malloc(size_t bytes) {
if (__predict_false(GuardedAlloc.shouldSample())) { // malloc抽样
if (void* result = GuardedAlloc.allocate(bytes)) { // allocate中会判断进程是否被抽中
return result;
}
}
return prev_dispatch->malloc(bytes);
}
// external/gwp_asan/gwp_asan/guarded_pool_allocator.cpp
void *GuardedPoolAllocator::allocate(size_t Size, size_t Alignment) {
// GuardedPagePoolEnd == 0 when GWP-ASan is disabled. If we are disabled, fall
// back to the supporting allocator.
// 表示进程没被抽中,没有初始化GWP-Asan内存池
if (State.GuardedPagePoolEnd == 0) {
getThreadLocals()->NextSampleCounter =
(AdjustedSampleRatePlusOne - 1) &
ThreadLocalPackedVariables::NextSampleCounterMask;
return nullptr;
}
......
}
// external/gwp_asan/gwp_asan/guarded_pool_allocator.h
// 进行malloc抽样
bool shouldSample() {
// NextSampleCounter == 0 means we "should regenerate the counter".
// == 1 means we "should sample this allocation".
// AdjustedSampleRatePlusOne is designed to intentionally underflow. This
// class must be valid when zero-initialised, and we wish to sample as
// infrequently as possible when this is the case, hence we underflow to
// UINT32_MAX.
// 如果设置了malloc抽样率属性值为1,在init阶段会计算出NextSampleCounter为1
if (GWP_ASAN_UNLIKELY(getThreadLocals()->NextSampleCounter == 0))
getThreadLocals()->NextSampleCounter =
((getRandomUnsigned32() % (AdjustedSampleRatePlusOne - 1)) + 1) &
ThreadLocalPackedVariables::NextSampleCounterMask;
return GWP_ASAN_UNLIKELY(--getThreadLocals()->NextSampleCounter == 0);
}
// 生成随机数
uint32_t GuardedPoolAllocator::getRandomUnsigned32() {
uint32_t RandomState = getThreadLocals()->RandomState;
RandomState ^= RandomState << 13;
RandomState ^= RandomState >> 17;
RandomState ^= RandomState << 5;
getThreadLocals()->RandomState = RandomState;
return RandomState;
}
// 初始化一个伪随机数:1970年1月1日到至今的总时间(s) + 线程ID
void GuardedPoolAllocator::initPRNG() {
getThreadLocals()->RandomState =
static_cast<uint32_t>(time(nullptr) + getThreadID());
}
init() {
......
// 如果采样率不为1,AdjustedSampleRatePlusOne= (Opts.SampleRate) * 2 + 1 = 5001
// 如果采样率为1,AdjustedSampleRatePlusOne = 2,在libc中读取 smapleRate prop属性值,
// 属性值为1的话,说明malloc被抽中
if (Opts.SampleRate != 1)
AdjustedSampleRatePlusOne = static_cast<uint32_t>(Opts.SampleRate) * 2 + 1; // SampleRate = 2500
else
AdjustedSampleRatePlusOne = 2;
// 若AdjustedSampleRatePlusOne为2,NextSampleCounter为1,可以被GWP-Asan malloc
getThreadLocals()->NextSampleCounter =
((getRandomUnsigned32() % (AdjustedSampleRatePlusOne - 1)) + 1) &
ThreadLocalPackedVariables::NextSampleCounterMask; // (1U << 31) - 1
......
}
4.7 Metadata
AllocationMetadata用于记录alloc、free的信息,包括内存状态信息(内存分配的地址&大小&slot内存状态&线程号、内存释放时slot内存状态&线程号)、函数调用栈。当内存发生错误时,可以根据错误地址找到对应的Metadata.
gwp-asan初始化阶段会指定收集alloc\free的栈帧函数---android_unsafe_frame_pointer_chase,如下:
bool gwp_asan_initialize(const MallocDispatch* dispatch, bool*, const char*) {
prev_dispatch = dispatch;
// 默认选项配置
Options Opts;
Opts.Enabled = true;
Opts.MaxSimultaneousAllocations = 32;
Opts.SampleRate = 2500; // 采样率为2500ms
Opts.InstallSignalHandlers = false;
Opts.InstallForkHandlers = true;
Opts.Backtrace = android_unsafe_frame_pointer_chase; // 收集alloc\free的栈帧函数
GuardedAlloc.init(Opts);
......
return true;
}
Gwp-asan alloc\free阶段,记录状态信息及栈帧,如下:
void *GuardedPoolAllocator::allocate(size_t Size, size_t Alignment) {
.....
// 记录内存分配的地址&大小&slot内存状态&线程号
Meta->RecordAllocation(UserPtr, Size);
{
ScopedLock UL(BacktraceMutex);
// 记录alloc的函数调用栈
Meta->AllocationTrace.RecordBacktrace(Backtrace);
}
return reinterpret_cast<void *>(UserPtr);
}
void GuardedPoolAllocator::deallocate(void *Ptr) {
......
// 记录内存释放时slot内存状态&线程号
Meta->RecordDeallocation();
if (!getThreadLocals()->RecursiveGuard) {
ScopedRecursiveGuard SRG;
ScopedLock UL(BacktraceMutex);
// // 记录free的函数调用栈
Meta->DeallocationTrace.RecordBacktrace(Backtrace);
}
......
}
记录alloc\free的内存状态信息,如下:
// external/gwp_asan/gwp_asan/common.cpp
void AllocationMetadata::RecordAllocation(uintptr_t AllocAddr,
size_t AllocSize) {
Addr = AllocAddr;
RequestedSize = AllocSize;
IsDeallocated = false;
AllocationTrace.ThreadID = getThreadID();
DeallocationTrace.TraceSize = 0;
DeallocationTrace.ThreadID = kInvalidThreadID;
}
void AllocationMetadata::RecordDeallocation() {
IsDeallocated = true;
DeallocationTrace.ThreadID = getThreadID();
}
void AllocationMetadata::CallSiteInfo::RecordBacktrace(
options::Backtrace_t Backtrace) {
TraceSize = 0;
if (!Backtrace)
return;
uintptr_t UncompressedBuffer[kMaxTraceLengthToCollect];
size_t BacktraceLength =
Backtrace(UncompressedBuffer, kMaxTraceLengthToCollect);
// Backtrace() returns the number of available frames, which may be greater
// than the number of frames in the buffer. In this case, we need to only pack
// the number of frames that are in the buffer.
if (BacktraceLength > kMaxTraceLengthToCollect)
BacktraceLength = kMaxTraceLengthToCollect;
// 保存压缩后的栈帧数据到CompressedTrace int数组
TraceSize =
compression::pack(UncompressedBuffer, BacktraceLength, CompressedTrace,
AllocationMetadata::kStackFrameStorageBytes);
}
记录alloc\free的栈帧,如下:
// external/gwp_asan/gwp_asan/common.cpp
void AllocationMetadata::CallSiteInfo::RecordBacktrace(
options::Backtrace_t Backtrace) {
TraceSize = 0;
if (!Backtrace)
return;
// static constexpr size_t kMaxTraceLengthToCollect = 128;,默认栈帧数目为128
uintptr_t UncompressedBuffer[kMaxTraceLengthToCollect];
// 这里就是调用android_unsafe_frame_pointer_chase函数收集栈帧信息
size_t BacktraceLength =
Backtrace(UncompressedBuffer, kMaxTraceLengthToCollect);
// Backtrace() returns the number of available frames, which may be greater
// than the number of frames in the buffer. In this case, we need to only pack
// the number of frames that are in the buffer.
if (BacktraceLength > kMaxTraceLengthToCollect)
BacktraceLength = kMaxTraceLengthToCollect;
// 栈帧信息做压缩
TraceSize =
compression::pack(UncompressedBuffer, BacktraceLength, CompressedTrace,
AllocationMetadata::kStackFrameStorageBytes);
}
// bionic/libc/bionic/android_unsafe_frame_pointer_chase.cpp
__attribute__((no_sanitize("address", "hwaddress"))) size_t android_unsafe_frame_pointer_chase(
uintptr_t* buf, size_t num_entries) {
// Disable MTE checks for the duration of this function, since we can't be sure that following
// next_frame pointers won't cause us to read from tagged memory. ASAN/HWASAN are disabled here
// for the same reason.
ScopedDisableMTE x;
struct frame_record {
uintptr_t next_frame, return_addr;
};
// // 获取当前栈帧的起始地址
auto begin = reinterpret_cast<uintptr_t>(__builtin_frame_address(0));
// 获取栈帧的结束地址
auto end = __get_thread_stack_top();
stack_t ss;
if (sigaltstack(nullptr, &ss) == 0 && (ss.ss_flags & SS_ONSTACK)) {
end = reinterpret_cast<uintptr_t>(ss.ss_sp) + ss.ss_size;
}
size_t num_frames = 0;
while (1) {
auto* frame = reinterpret_cast<frame_record*>(begin);
if (num_frames < num_entries) {
// 栈帧或函数地址存入buf
buf[num_frames] = __bionic_clear_pac_bits(frame->return_addr);
}
++num_frames;
if (frame->next_frame < begin + sizeof(frame_record) || frame->next_frame >= end ||
frame->next_frame % sizeof(void*) != 0) {
break;
}
begin = frame->next_frame;
}
return num_frames;
}
extern "C" __LIBC_HIDDEN__ uintptr_t __get_thread_stack_top() {
return __get_thread()->stack_top;
}
4.8 触发tombstone收集日志
4.8.1 如何触发tombstone收集日志
当进程发生user-after-use或堆内存溢出等异常时,kernel发送异常信号,用户空间进程处理相应的信号处理函数,且tombstoned进程判断fault address落在Guarded Pool Memory中,会将问题进程的Metadata输出到tombstone文件。
4.8.2 tombstone如何收集日志
Tombstone中GWP-Asan相关的日志信息:
// 1. 打印malloc tool名称、内存错误类型、diff、内存大小及地址
Cause: [GWP-ASan]: Use After Free, 0 bytes into a 32-byte allocation at 0x702de79fe0
......
// 2. 打印malloc和free的堆栈信息,从GWP_Asan的Metadata中获取
deallocated by thread 21588:
#00 pc 000000000008c9c8 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 000000000008d204 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::deallocate(void*)+412) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#02 pc 000000000004d08c /vendor/lib64/libmivhalclient.so (android::frameworks::automotive::vhal::HidlVhalClient::~HidlVhalClient()+292) (BuildId: 10a0c1874f9c8a88fcfd3eed8e238e76)
......
allocated by thread 21405:
#00 pc 000000000008c9c8 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 000000000008d03c /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::allocate(unsigned long, unsigned long)+600) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#02 pc 0000000000050c1c /apex/com.android.runtime/lib64/bionic/libc.so ((anonymous namespace)::gwp_asan_malloc(unsigned long)+172) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
......
// 3. memory near GWP-Asan内存页信息,anon名称在GWP_Asan模块中设置
memory near x0 ([anon:GWP-ASan Guard Page]):
0000007275060000 0000000000000000 0000000000000000 ................
0000007275060010 0000007275060020 0000000000000001 ..ur...........
......
memory near x5 ([anon:GWP-ASan Alive Slot]):
000000727502bbf0 0000000000000000 0000000000000000 ................
000000727502bc00 7830203d20727470 3237303030303462 ptr = 0xb4000072
......
memory near x25 ([anon:GWP-ASan Metadata]):
00000072753dcf60 0000000000000000 0000000000000000 ................
00000072753dcf70 ffffffffffffffff 0000000000000000 ................
......
// 4. memory map GWP-Asan内存页信息
memory map (135 entries): (fault address prefixed with --->)
......
00000072'7501e000-00000072'7501efff --- 0 1000 [anon:GWP-ASan Guard Page]
00000072'7501f000-00000072'7501ffff rw- 0 1000 [anon:GWP-ASan Alive Slot]
00000072'753dc000-00000072'753e0fff rw- 0 5000 [anon:GWP-ASan Metadata]
......
1)malloc tool名称、内存错误类型、diff、内存大小及地址
void engrave_tombstone(unique_fd output_fd, unique_fd proto_fd,
unwindstack::AndroidUnwinder* unwinder,
const std::map<pid_t, ThreadInfo>& threads, pid_t target_thread,
const ProcessInfo& process_info, OpenFilesList* open_files,
std::string* amfd_data) {
// Don't copy log messages to tombstone unless this is a development device.
Tombstone tombstone;
engrave_tombstone_proto(&tombstone, unwinder, threads, target_thread, process_info, open_files);
if (proto_fd != -1) {
if (!tombstone.SerializeToFileDescriptor(proto_fd.get())) {
async_safe_format_log(ANDROID_LOG_ERROR, LOG_TAG, "failed to write proto tombstone: %s",
strerror(errno));
}
}
log_t log;
log.current_tid = target_thread;
log.crashed_tid = target_thread;
log.tfd = output_fd.get();
log.amfd_data = amfd_data;
tombstone_proto_to_text(tombstone, [&log](const std::string& line, bool should_log) {
_LOG(&log, should_log ? logtype::HEADER : logtype::LOGS, "%s\n", line.c_str());
});
}
void engrave_tombstone_proto(Tombstone* tombstone, unwindstack::AndroidUnwinder* unwinder,
const std::map<pid_t, ThreadInfo>& threads, pid_t target_thread,
const ProcessInfo& process_info, const OpenFilesList* open_files) {
Tombstone result;
......
dump_probable_cause(&result, unwinder, process_info, main_thread);
......
}
static void dump_probable_cause(Tombstone* tombstone, unwindstack::AndroidUnwinder* unwinder,
const ProcessInfo& process_info, const ThreadInfo& main_thread) {
......
GwpAsanCrashData gwp_asan_crash_data(unwinder->GetProcessMemory().get(), process_info,
main_thread);
if (gwp_asan_crash_data.CrashIsMine()) {
gwp_asan_crash_data.AddCauseProtos(tombstone, unwinder);
return;
}
......
}
// 获取crash地址
GwpAsanCrashData::GwpAsanCrashData(unwindstack::Memory* process_memory,
const ProcessInfo& process_info, const ThreadInfo& thread_info) {
......
// Get the external crash address from the thread info.
crash_address_ = 0u;
if (process_info.has_fault_address) {
crash_address_ = process_info.untagged_fault_address;
}
// Ensure the error belongs to GWP-ASan.
if (!__gwp_asan_error_is_mine(&state_, crash_address_)) return;
is_gwp_asan_responsible_ = true;
thread_id_ = thread_info.tid;
// Grab the internal error address, if it exists.
uintptr_t internal_crash_address = __gwp_asan_get_internal_crash_address(&state_, crash_address_);
if (internal_crash_address) {
crash_address_ = internal_crash_address;
}
// Get other information from the internal state.
error_ = __gwp_asan_diagnose_error(&state_, metadata_.get(), crash_address_);
error_string_ = gwp_asan::ErrorToString(error_);
responsible_allocation_ = __gwp_asan_get_metadata(&state_, metadata_.get(), crash_address_);
}
从tombstone获取cause对象,并将cause对象和crash地址进行传参
void GwpAsanCrashData::AddCauseProtos(Tombstone* tombstone,
unwindstack::AndroidUnwinder* unwinder) const {
Cause* cause = tombstone->add_causes();
MemoryError* memory_error = cause->mutable_memory_error();
HeapObject* heap_object = memory_error->mutable_heap();
......
heap_object->set_address(__gwp_asan_get_allocation_address(responsible_allocation_));
heap_object->set_size(__gwp_asan_get_allocation_size(responsible_allocation_));
......
set_human_readable_cause(cause, crash_address_);
}
获取并打印malloc tool名称、内存错误类型、diff、内存大小及地址信息
void set_human_readable_cause(Cause* cause, uint64_t fault_addr) {
if (!cause->has_memory_error() || !cause->memory_error().has_heap()) {
return;
}
const MemoryError& memory_error = cause->memory_error();
const HeapObject& heap_object = memory_error.heap();
// 获取malloc tool名称
const char *tool_str;
switch (memory_error.tool()) {
case MemoryError_Tool_GWP_ASAN:
tool_str = "GWP-ASan";
break;
case MemoryError_Tool_SCUDO:
tool_str = "MTE";
break;
default:
tool_str = "Unknown";
break;
}
// 获取内存错误类型
const char *error_type_str;
switch (memory_error.type()) {
case MemoryError_Type_USE_AFTER_FREE:
error_type_str = "Use After Free";
break;
case MemoryError_Type_DOUBLE_FREE:
error_type_str = "Double Free";
break;
case MemoryError_Type_INVALID_FREE:
error_type_str = "Invalid (Wild) Free";
break;
case MemoryError_Type_BUFFER_OVERFLOW:
error_type_str = "Buffer Overflow";
break;
case MemoryError_Type_BUFFER_UNDERFLOW:
error_type_str = "Buffer Underflow";
break;
default:
cause->set_human_readable(
StringPrintf("[%s]: Unknown error occurred at 0x%" PRIx64 ".", tool_str, fault_addr));
return;
}
// 获取malloc内存页地址与错误地址的距离或偏移
uint64_t diff;
const char* location_str;
if (fault_addr < heap_object.address()) {
// Buffer Underflow, 6 bytes left of a 41-byte allocation at 0xdeadbeef.
location_str = "left of";
diff = heap_object.address() - fault_addr;
} else if (fault_addr - heap_object.address() < heap_object.size()) {
// Use After Free, 40 bytes into a 41-byte allocation at 0xdeadbeef.
location_str = "into";
diff = fault_addr - heap_object.address();
} else {
// Buffer Overflow, 6 bytes right of a 41-byte allocation at 0xdeadbeef.
location_str = "right of";
diff = fault_addr - heap_object.address() - heap_object.size();
}
// Suffix of 'bytes', i.e. 4 bytes' vs. '1 byte'.
const char* byte_suffix = "s";
if (diff == 1) {
byte_suffix = "";
}
// 打印malloc tool名称、内存错误类型、diff、内存大小及地址信息
cause->set_human_readable(StringPrintf(
"[%s]: %s, %" PRIu64 " byte%s %s a %" PRIu64 "-byte allocation at 0x%" PRIx64, tool_str,
error_type_str, diff, byte_suffix, location_str, heap_object.size(), heap_object.address()));
}
数据类型转换:tombstone--->cause--->char*
代码流程:
2)打印malloc和free的堆栈信息
static void print_main_thread(CallbackType callback, const Tombstone& tombstone,
const Thread& thread) {
......
for (const Cause& cause : tombstone.causes()) {
if (tombstone.causes_size() > 1) {
CBS("");
CBL("Cause: %s", cause.human_readable().c_str());
}
if (cause.has_memory_error() && cause.memory_error().has_heap()) {
const HeapObject& heap_object = cause.memory_error().heap();
if (heap_object.deallocation_backtrace_size() != 0) {
CBS("");
CBL("deallocated by thread %" PRIu64 ":", heap_object.deallocation_tid());
print_backtrace(callback, tombstone, heap_object.deallocation_backtrace(), true);
}
if (heap_object.allocation_backtrace_size() != 0) {
CBS("");
CBL("allocated by thread %" PRIu64 ":", heap_object.allocation_tid());
print_backtrace(callback, tombstone, heap_object.allocation_backtrace(), true);
}
}
}
......
}
static void print_backtrace(CallbackType callback, const Tombstone& tombstone,
const google::protobuf::RepeatedPtrField<BacktraceFrame>& backtrace,
bool should_log) {
int index = 0;
for (const auto& frame : backtrace) {
std::string function;
// 收集函数名称和offset
if (!frame.function_name().empty()) {
function =
StringPrintf(" (%s+%" PRId64 ")", frame.function_name().c_str(), frame.function_offset());
}
// 收集build id
std::string build_id;
if (!frame.build_id().empty()) {
build_id = StringPrintf(" (BuildId: %s)", frame.build_id().c_str());
}
// 收集pc寄存器地址
std::string line =
StringPrintf(" #%02d pc %0*" PRIx64 " %s", index++, pointer_width(tombstone) * 2,
frame.rel_pc(), frame.file_name().c_str());
if (frame.file_map_offset() != 0) {
line += StringPrintf(" (offset 0x%" PRIx64 ")", frame.file_map_offset());
}
line += function + build_id;
CB(should_log, "%s", line.c_str());
}
}
3)memory near GWP-Asan内存页信息
static void print_thread_memory_dump(CallbackType callback, const Tombstone& tombstone,
const Thread& thread) {
static constexpr size_t bytes_per_line = 16;
static_assert(bytes_per_line == kTagGranuleSize);
int word_size = pointer_width(tombstone);
for (const auto& mem : thread.memory_dump()) {
CBS("");
if (mem.mapping_name().empty()) {
CBS("memory near %s:", mem.register_name().c_str());
} else {
CBS("memory near %s (%s):", mem.register_name().c_str(), mem.mapping_name().c_str());
}
uint64_t addr = mem.begin_address();
for (size_t offset = 0; offset < mem.memory().size(); offset += bytes_per_line) {
uint64_t tagged_addr = addr;
if (mem.has_arm_mte_metadata() &&
mem.arm_mte_metadata().memory_tags().size() > offset / kTagGranuleSize) {
tagged_addr |=
static_cast<uint64_t>(mem.arm_mte_metadata().memory_tags()[offset / kTagGranuleSize])
<< 56;
}
std::string line = StringPrintf(" %0*" PRIx64, word_size * 2, tagged_addr + offset);
size_t bytes = std::min(bytes_per_line, mem.memory().size() - offset);
for (size_t i = 0; i < bytes; i += word_size) {
uint64_t word = 0;
// Assumes little-endian, but what doesn't?
memcpy(&word, mem.memory().data() + offset + i, word_size);
StringAppendF(&line, " %0*" PRIx64, word_size * 2, word);
}
char ascii[bytes_per_line + 1];
memset(ascii, '.', sizeof(ascii));
ascii[bytes_per_line] = '\0';
for (size_t i = 0; i < bytes; ++i) {
uint8_t byte = mem.memory()[offset + i];
if (byte >= 0x20 && byte < 0x7f) {
ascii[i] = byte;
}
}
CBS("%s %s", line.c_str(), ascii);
}
}
}
4)memory map GWP-Asan内存页信息
static void print_memory_maps(CallbackType callback, const Tombstone& tombstone) {
int word_size = pointer_width(tombstone);
const auto format_pointer = [word_size](uint64_t ptr) -> std::string {
if (word_size == 8) {
uint64_t top = ptr >> 32;
uint64_t bottom = ptr & 0xFFFFFFFF;
return StringPrintf("%08" PRIx64 "'%08" PRIx64, top, bottom);
}
return StringPrintf("%0*" PRIx64, word_size * 2, ptr);
};
std::string memory_map_header =
StringPrintf("memory map (%d %s):", tombstone.memory_mappings().size(),
tombstone.memory_mappings().size() == 1 ? "entry" : "entries");
const Signal& signal_info = tombstone.signal_info();
bool has_fault_address = signal_info.has_fault_address();
uint64_t fault_address = untag_address(signal_info.fault_address());
bool preamble_printed = false;
bool printed_fault_address_marker = false;
for (const auto& map : tombstone.memory_mappings()) {
if (!preamble_printed) {
preamble_printed = true;
if (has_fault_address) {
if (fault_address < map.begin_address()) {
memory_map_header +=
StringPrintf("\n--->Fault address falls at %s before any mapped regions",
format_pointer(fault_address).c_str());
printed_fault_address_marker = true;
} else {
memory_map_header += " (fault address prefixed with --->)";
}
}
CBS("%s", memory_map_header.c_str());
}
std::string line = " ";
if (has_fault_address && !printed_fault_address_marker) {
if (fault_address < map.begin_address()) {
printed_fault_address_marker = true;
CBS("--->Fault address falls at %s between mapped regions",
format_pointer(fault_address).c_str());
} else if (fault_address >= map.begin_address() && fault_address < map.end_address()) {
printed_fault_address_marker = true;
line = "--->";
}
}
// map的起始地址~结束地址
StringAppendF(&line, "%s-%s", format_pointer(map.begin_address()).c_str(),
format_pointer(map.end_address() - 1).c_str());
// 这块内存的rwx权限
StringAppendF(&line, " %s%s%s", map.read() ? "r" : "-", map.write() ? "w" : "-",
map.execute() ? "x" : "-");
// offset和map size
StringAppendF(&line, " %8" PRIx64 " %8" PRIx64, map.offset(),
map.end_address() - map.begin_address());
// 这块内存名称
if (!map.mapping_name().empty()) {
StringAppendF(&line, " %s", map.mapping_name().c_str());
if (!map.build_id().empty()) {
StringAppendF(&line, " (BuildId: %s)", map.build_id().c_str());
}
if (map.load_bias() != 0) {
StringAppendF(&line, " (load bias 0x%" PRIx64 ")", map.load_bias());
}
}
CBS("%s", line.c_str());
}
if (has_fault_address && !printed_fault_address_marker) {
CBS("--->Fault address falls at %s after any mapped regions",
format_pointer(fault_address).c_str());
}
}
4.9 核心数据结构
// external/gwp_asan/gwp_asan/common.h
// 定义内存错误的类型
enum class Error {
UNKNOWN,
USE_AFTER_FREE,
DOUBLE_FREE,
INVALID_FREE,
BUFFER_OVERFLOW,
BUFFER_UNDERFLOW
};
// 用于记录slot page的alloc\free的调用栈信息
struct AllocationMetadata {
static constexpr size_t kStackFrameStorageBytes = 256;
static constexpr size_t kMaxTraceLengthToCollect = 128;
void RecordAllocation(uintptr_t Addr, size_t RequestedSize);
void RecordDeallocation();
struct CallSiteInfo {
// Record the current backtrace to this callsite.
// Backtrace为收集栈帧的函数
void RecordBacktrace(options::Backtrace_t Backtrace);
// The compressed backtrace to the allocation/deallocation.
// 保存压缩后的栈帧数据
uint8_t CompressedTrace[kStackFrameStorageBytes];
// The thread ID for this trace, or kInvalidThreadID if not available.
uint64_t ThreadID = kInvalidThreadID;
// The size of the compressed trace (in bytes). Zero indicates that no
// trace was collected.
size_t TraceSize = 0;
};
uintptr_t Addr = 0;
// Represents the actual size of the allocation.
size_t RequestedSize = 0;
CallSiteInfo AllocationTrace;
CallSiteInfo DeallocationTrace;
// Whether this allocation has been deallocated yet.
bool IsDeallocated = false;
};
// 用于记录slot page的状态
struct AllocatorState {
constexpr AllocatorState() {}
// Returns whether the provided pointer is a current sampled allocation that
// is owned by this pool.
GWP_ASAN_ALWAYS_INLINE bool pointerIsMine(const void *Ptr) const {
uintptr_t P = reinterpret_cast<uintptr_t>(Ptr);
return P < GuardedPagePoolEnd && GuardedPagePool <= P;
}
// Returns the address of the N-th guarded slot.
uintptr_t slotToAddr(size_t N) const;
// Returns the largest allocation that is supported by this pool.
size_t maximumAllocationSize() const;
// Gets the nearest slot to the provided address.
size_t getNearestSlot(uintptr_t Ptr) const;
// Returns whether the provided pointer is a guard page or not. The pointer
// must be within memory owned by this pool, else the result is undefined.
bool isGuardPage(uintptr_t Ptr) const;
// The number of guarded slots that this pool holds.
size_t MaxSimultaneousAllocations = 0;
// Pointer to the pool of guarded slots. Note that this points to the start of
// the pool (which is a guard page), not a pointer to the first guarded page.
uintptr_t GuardedPagePool = 0;
uintptr_t GuardedPagePoolEnd = 0;
// Cached page size for this system in bytes.
// 记录平台的page size,及内核一个page为4KB
size_t PageSize = 0;
// The type and address of an internally-detected failure. For INVALID_FREE
// and DOUBLE_FREE, these errors are detected in GWP-ASan, which will set
// these values and terminate the process.
Error FailureType = Error::UNKNOWN;
uintptr_t FailureAddress = 0;
};
4.10 UML
TBD
4.11 代码时序图
4.11.1 初始化
默认启用GWP-Asan进程抽样:
当进程启动时加载libc.so时,走__libc_init_malloc初始化流程,默认选择128进程抽样GWP-Asan,代码如下:
static void MallocInitImpl(libc_globals* globals) {
......
bool gwp_asan_enabled = MaybeInitGwpAsanFromLibc(globals);
......
}
bool MaybeInitGwpAsanFromLibc(libc_globals* globals) {
// zygote进程不参与GWP-Asan检测
static const char kAppProcessNamePrefix[] = "app_process";
const char* progname = getprogname();
if (strncmp(progname, kAppProcessNamePrefix, sizeof(kAppProcessNamePrefix) - 1) == 0)
return false;
android_mallopt_gwp_asan_options_t mallopt_options;
mallopt_options.program_name = progname;
// 默认选择128进程抽样GWP-Asan
mallopt_options.desire = Action::TURN_ON_WITH_SAMPLING;
return MaybeInitGwpAsan(globals, mallopt_options);
}
app进程启动时,先通过zygote启用Recoverable GWP-Asan,然后才会去加载libc.so,在这之前 GwpAsanInitialized已经被设置为true.具体可参考6.1.
4.11.2 内存分配与释放
4.11.3 异常触发及tombstone收集日志
a.global全局变量设置
在进程load libc库初始化阶段,设置global全局变量,便于tombstone模块调用
bool MaybeInitGwpAsan(libc_globals* globals,
const android_mallopt_gwp_asan_options_t& mallopt_options) {
......
__libc_shared_globals()->gwp_asan_state = GuardedAlloc.getAllocatorState();
__libc_shared_globals()->gwp_asan_metadata = GuardedAlloc.getMetadataRegion();
__libc_shared_globals()->debuggerd_needs_gwp_asan_recovery = NeedsGwpAsanRecovery;
__libc_shared_globals()->debuggerd_gwp_asan_pre_crash_report = GwpAsanPreCrashHandler;
__libc_shared_globals()->debuggerd_gwp_asan_post_crash_report = GwpAsanPostCrashHandler;
......
}
b.检测到内存错误,产生信号
内存释放阶段检测INVALID_FREE和DOUBLE_FREE类型的错误,主动触发一个内存错误,这样MMU和内核模块可以检测到(并发送异常信号给问题进程),代码如下:
void GuardedPoolAllocator::deallocate(void *Ptr) {
......
if (Meta->Addr != UPtr) {
raiseInternallyDetectedError(UPtr, Error::INVALID_FREE);
return;
}
if (Meta->IsDeallocated) {
raiseInternallyDetectedError(UPtr, Error::DOUBLE_FREE);
return;
}
......
}
void GuardedPoolAllocator::raiseInternallyDetectedError(uintptr_t Address,
Error E) {
......
// 访问内存池的最后一个内存页(Internal-Detector page,该内存页状态属性为不可访问),触发一个内存页错误
volatile char *p =
reinterpret_cast<char *>(State.internallyDetectedErrorFaultAddress());
*p = 0;
......
}
USE_AFTER_FREE、BUFFER_OVERFLOW、BUFFER_UNDERFLOW三种错误在MMU和内核中直接被检测到,发送异常信号给问题进程,问题进程在返回用户空间时执行信号处理函数---debuggerd_signal_handler。
c.信号处理,收集日志
先做一些准备工作,然后创建子进程去执行crash_dump程序收集日志。
static void debuggerd_signal_handler(int signal_number, siginfo_t* info, void* context) {
......
gwp_asan_callbacks_t gwp_asan_callbacks = {};
if (g_callbacks.get_gwp_asan_callbacks != nullptr) {
gwp_asan_callbacks = g_callbacks.get_gwp_asan_callbacks();
if (signal_number == SIGSEGV && signal_has_si_addr(info) &&
gwp_asan_callbacks.debuggerd_needs_gwp_asan_recovery &&
gwp_asan_callbacks.debuggerd_gwp_asan_pre_crash_report &&
gwp_asan_callbacks.debuggerd_gwp_asan_post_crash_report &&
gwp_asan_callbacks.debuggerd_needs_gwp_asan_recovery(info->si_addr)) {
gwp_asan_callbacks.debuggerd_gwp_asan_pre_crash_report(info->si_addr);
process_info.recoverable_gwp_asan_crash = true;
}
}
if (no_new_privs && process_info.recoverable_gwp_asan_crash) {
gwp_asan_callbacks.debuggerd_gwp_asan_post_crash_report(info->si_addr);
return;
}
......
// Essentially pthread_create without CLONE_FILES, so we still work during file descriptor
// exhaustion.
pid_t child_pid =
clone(debuggerd_dispatch_pseudothread, pseudothread_stack,
CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID,
&thread_info, nullptr, nullptr, &thread_info.pseudothread_tid);
......
}
4.12 内存错误检测原理
4.12.1 BUFFER_OVERFLOW和BUFFER_UNDERFLOW检测
分配以页为单位,进程malloc申请不足一个page,则分配一个page的空间,然后随机的进行左对齐或右对齐,返回对齐后的首地址。左对齐的话可以检测出Underflow,因为下溢出会访问到左边的Guard Page;与之相反,右对齐的话就可以检测出Overflow。
但有个问题是,右对齐之前的那一小段地址~Guard Page之间的内存被访问的话,无法检测到,因为GWP-Asan是根据控制页的读写权限来检测内存访问是否合法。左对齐页也存在类似的问题。
4.12.2 USE_AFTER_FREE检测
当对一块已分配的内存进行释放时,在free阶段会将它所在的页标记为不可读写的状态(通过mprotect系统调用实现),后续对这块内存的访问可以被检测出来,判定为USE_AFTER_FREE类型的内存错误。
4.12.3 DOUBLE_FREE检测
当对一块GWP-ASan分配的内存进行释放时,统会将它所在的页标记为不可读写的状态(通过mprotect系统调用),再去释放时,在free阶段会访问内存池中的最后一个不可访问的内存页,这样MMU和内核会生成一个异常信号发送给问题进程。
void GuardedPoolAllocator::deallocate(void *Ptr) {
......
if (Meta->Addr != UPtr) {
raiseInternallyDetectedError(UPtr, Error::INVALID_FREE);
return;
}
if (Meta->IsDeallocated) {
raiseInternallyDetectedError(UPtr, Error::DOUBLE_FREE);
return;
}
......
}
// 将内存池中最后一块内存的状态设置为不可访问,以便未来可以被检测到
void GuardedPoolAllocator::raiseInternallyDetectedError(uintptr_t Address,
Error E) {
disable();
State.FailureType = E;
State.FailureAddress = Address;
// Raise a SEGV
volatile char *p =
reinterpret_cast<char *>(State.internallyDetectedErrorFaultAddress());
*p = 0;
assert(State.FailureType == Error::UNKNOWN);
assert(State.FailureAddress == 0u);
deallocateInGuardedPool(
reinterpret_cast<void *>(getPageAddr(
State.internallyDetectedErrorFaultAddress(), State.PageSize)),
State.PageSize);
// And now we're done with patching ourselves back up, enable the allocator.
enable();
}
4.12.4 INVALID_FREE检测
当释放的地址和分配时的地址不一致,实现原理同上。
void GuardedPoolAllocator::deallocate(void *Ptr) {
......
if (Meta->Addr != UPtr) {
raiseInternallyDetectedError(UPtr, Error::INVALID_FREE);
return;
}
if (Meta->IsDeallocated) {
raiseInternallyDetectedError(UPtr, Error::DOUBLE_FREE);
return;
}
......
}
4.13 内核如何设置内存页的访问权限
GWP-Asan检测原理是通过控制页的访问权限来检测内存访问是否合法。GWP-Asan在内存分配时,调用mprotect系统调用函数,陷入内核,将内存页设置为可访问,在内存释放时,将将内存页设置为不可访问。
// GWP-Asan内存分配时,内存页属性设置为可访问
void GuardedPoolAllocator::allocateInGuardedPool(void *Ptr, size_t Size) const {
assert((reinterpret_cast<uintptr_t>(Ptr) % State.PageSize) == 0);
assert((Size % State.PageSize) == 0);
// 内存属性设置为可访问---PROT_READ | PROT_WRITE
Check(mprotect(Ptr, Size, PROT_READ | PROT_WRITE) == 0,
"Failed to allocate in guarded pool allocator memory");
MaybeSetMappingName(Ptr, Size, kGwpAsanAliveSlotName);
}
// GWP-Asan内存释放时,内存页属性设置为不可访问
void GuardedPoolAllocator::deallocateInGuardedPool(void *Ptr,
size_t Size) const {
assert((reinterpret_cast<uintptr_t>(Ptr) % State.PageSize) == 0);
assert((Size % State.PageSize) == 0);
// mmap() a PROT_NONE page over the address to release it to the system, if
// we used mprotect() here the system would count pages in the quarantine
// against the RSS.
// 内存属性设置为不可访问---PROT_NONE
Check(mmap(Ptr, Size, PROT_NONE, MAP_FIXED | MAP_ANONYMOUS | MAP_PRIVATE, -1,
0) != MAP_FAILED,
"Failed to deallocate in guarded pool allocator memory");
MaybeSetMappingName(Ptr, Size, kGwpAsanGuardPageName);
}
用户进程通过GWP-Asan内存分配时,调用mprotect函数,陷入到内核中,查询系统调用函数表,执行SYSCALL_DEFINE3-mprotect函数,通过VMA mprotect设置内存页的访问属性。
// kernel_platform/msm-kernel/mm/mprotect.c
SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
unsigned long, prot)
{
return do_mprotect_pkey(start, len, prot, -1);
}
static int do_mprotect_pkey(unsigned long start, size_t len,
unsigned long prot, int pkey)
{
unsigned long nstart, end, tmp, reqprot;
struct vm_area_struct *vma, *prev;
int error;
const int grows = prot & (PROT_GROWSDOWN|PROT_GROWSUP);
const bool rier = (current->personality & READ_IMPLIES_EXEC) &&
(prot & PROT_READ);
struct mmu_gather tlb;
MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
start = untagged_addr(start);
prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP);
if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */
return -EINVAL;
if (start & ~PAGE_MASK)
return -EINVAL;
if (!len)
return 0;
len = PAGE_ALIGN(len);
end = start + len;
if (end <= start)
return -ENOMEM;
if (!arch_validate_prot(prot, start))
return -EINVAL;
reqprot = prot;
if (mmap_write_lock_killable(current->mm))
return -EINTR;
/*
* If userspace did not allocate the pkey, do not let
* them use it here.
*/
error = -EINVAL;
if ((pkey != -1) && !mm_pkey_is_allocated(current->mm, pkey))
goto out;
mas_set(&mas, start);
// 找到VMA
vma = mas_find(&mas, ULONG_MAX);
error = -ENOMEM;
if (!vma)
goto out;
if (unlikely(grows & PROT_GROWSDOWN)) {
if (vma->vm_start >= end)
goto out;
start = vma->vm_start;
error = -EINVAL;
if (!(vma->vm_flags & VM_GROWSDOWN))
goto out;
} else {
if (vma->vm_start > start)
goto out;
if (unlikely(grows & PROT_GROWSUP)) {
end = vma->vm_end;
error = -EINVAL;
if (!(vma->vm_flags & VM_GROWSUP))
goto out;
}
}
if (start > vma->vm_start)
prev = vma;
else
prev = mas_prev(&mas, 0);
tlb_gather_mmu(&tlb, current->mm);
for (nstart = start ; ; ) {
unsigned long mask_off_old_flags;
unsigned long newflags;
int new_vma_pkey;
/* Here we know that vma->vm_start <= nstart < vma->vm_end. */
/* Does the application expect PROT_READ to imply PROT_EXEC */
if (rier && (vma->vm_flags & VM_MAYEXEC))
prot |= PROT_EXEC;
/*
* Each mprotect() call explicitly passes r/w/x permissions.
* If a permission is not passed to mprotect(), it must be
* cleared from the VMA.
*/
mask_off_old_flags = VM_READ | VM_WRITE | VM_EXEC |
VM_FLAGS_CLEAR;
new_vma_pkey = arch_override_mprotect_pkey(vma, prot, pkey);
newflags = calc_vm_prot_bits(prot, new_vma_pkey);
newflags |= (vma->vm_flags & ~mask_off_old_flags);
/* newflags >> 4 shift VM_MAY% in place of VM_% */
if ((newflags & ~(newflags >> 4)) & VM_ACCESS_FLAGS) {
error = -EACCES;
break;
}
/* Allow architectures to sanity-check the new flags */
if (!arch_validate_flags(newflags)) {
error = -EINVAL;
break;
}
error = security_file_mprotect(vma, reqprot, prot);
if (error)
break;
tmp = vma->vm_end;
if (tmp > end)
tmp = end;
if (vma->vm_ops && vma->vm_ops->mprotect) {
// 设置内存访问属性
error = vma->vm_ops->mprotect(vma, nstart, tmp, newflags);
if (error)
break;
}
error = mprotect_fixup(&tlb, vma, &prev, nstart, tmp, newflags);
if (error)
break;
nstart = tmp;
if (nstart < prev->vm_end)
nstart = prev->vm_end;
if (nstart >= end)
break;
vma = find_vma(current->mm, prev->vm_end);
if (!vma || vma->vm_start != nstart) {
error = -ENOMEM;
break;
}
prot = reqprot;
}
tlb_finish_mmu(&tlb);
out:
mmap_write_unlock(current->mm);
return error;
}
用户进程通过GWP-Asan内存释放时,调用mmap函数,陷入到内核中,查询系统调用函数表,执行SYSCALL_DEFINE6-mmap函数设置内存页的访问属性。
// msm-kernel/arch/arm64/kernel/sys.c
SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
unsigned long, prot, unsigned long, flags,
unsigned long, fd, unsigned long, off)
{
if (offset_in_page(off) != 0)
return -EINVAL;
return ksys_mmap_pgoff(addr, len, prot, flags, fd, off >> PAGE_SHIFT);
}
// 最后调用do_mmap函数来设置内存页的访问属性
4.14 内核如何检测异常访问
所有的异常均是由内核模块负责检测、触发,然后通知到用户空间。
以use-after-free为例,如下:
int* ptr = (int*) malloc(100);
free(ptr);
*ptr= 10;
CPU执行 *ptr = 10 指令,尝试将值 10 写入指针 ptr 指向的内存地址,需要以下几个步骤:
1)虚拟地址到物理地址的转换
CPU首先需要将 ptr 指向的虚拟地址转换为物理地址。这个转换过程依赖于页表和TLB(Translation Lookaside Buffer)。
如果TLB中没有对应的条目,CPU会查询页表(通过MMU,Memory Management Unit)来找到虚拟地址对应的物理地址。
2)权限检查
MMU在转换地址的同时,会检查当前进程是否有权限访问该内存地址。权限检查基于页表中的保护标志(如读、写、执行权限)---在GWP-Asan malloc\free时设置。
3)缺页异常处理
如果发生缺页异常,CPU会陷入内核空间,并调用缺页异常处理程序(Page Fault Handler)。
缺页异常处理程序会根据异常的原因(如权限不足、页面不在内存中等)采取相应的措施。
如果是因为页面不在内存中,内核会从磁盘(或其他存储设备)加载缺失的页面到内存中,并更新页表。
4)内存访问
一旦页表更新并且物理页面加载到内存中,缺页异常处理程序会重新执行导致异常的指令(即 *ptr = 10)。
这次,MMU能够成功地将虚拟地址转换为物理地址,并且权限检查通过,CPU将值 10 写入物理内存。
5)返回用户空间
内存写入操作完成后,CPU继续执行用户空间的下一条指令。
在第2步权限检查时,当MMU发现内存访问无权限,会向CPU发送一个硬件异常,CPU调用对应的异常或中断处理函数,生成一个信号,发送给这个进程,进程返回用户空间时会执行信号处理函数。大致流程,如下:
五、Debug测试
5.1 app demo测试
编译产物包括一个apk + so
debug步骤:
1) push native-lib-test.so文件到设备的/system/lib64目录
2)安装apk:执行adb install gwp-asan-test-app.apk
3)启动app:执行adb shell am start -n com.example.gwpasan/.GwpAsanActivity
4)查询data/tombstones目录的文件
5.2 native proc demo测试
debug步骤:
1)去除libc中的随机数机制(为了方便调试),重新编译libc库(详见4.5),push到/system/lib64目录
2)native demo push到设备的/system/bin目录,执行并运行native demo
i) 执行/system/bin/gwp-asan-test double-free ---表示测试double-free类型的内存错误
i) 执行/system/bin/gwp-asan-test use-after-free ---表示测试use-after-free类型的内存错误
i) 执行/system/bin/gwp-asan-test out-of-bounds ---表示测试out-of-bounds类型的内存错误
3)查询data/tombstones目录的文件
GWP-Asan用于检测DOUBLE_FREE、USE_AFTER_FREE、INVALID_FREE、BUFFER_OVERFLOW、BUFFER_UNDERFLOW五种内存错误类型,Demo Debug错误日志分别如下:
1)DOUBLE_FREE检测及错误日志
Cmdline: ./system/bin/gwp-asan-test-bin double-free
pid: 24481, tid: 24481, name: gwp-asan-test-b >>> ./system/bin/gwp-asan-test-bin <<<
uid: 0
tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x0000007d87d49ff0
Cause: [GWP-ASan]: Double Free, 0 bytes into a 4-byte allocation at 0x7d87d17ff0
x0 0000007d87d49ff0 x1 0000000000000001 x2 0000007d843a2b18 x3 0000007d84192cf8
x4 0000000000000008 x5 b400007d87d15013 x6 000000000000000a x7 b400007d87d19ff2
x8 0000007d87d4a000 x9 0000007d87d09000 x10 000000000000fff0 x11 000000000000000f
x12 000000000000006e x13 0000007ff5081954 x14 0000000000000000 x15 0000007d840a7048
x16 0000000000000001 x17 0000007d8417d988 x18 0000007d8cf76000 x19 0000007d843a2ab8
x20 0000007d843a2af0 x21 0000007d843a2b18 x22 0000007d87d17000 x23 0000007d87d17ff0
x24 0000000000000007 x25 0000007d8bcb0f88 x26 0000007d8bcb0000 x27 0000000000000001
x28 0000000000000000 x29 0000007ff5082260
lr 0000007d8410c134 sp 0000007ff5082260 pc 0000007d8410c134 pst 0000000060001000
4 total frames
backtrace:
#00 pc 000000000008d134 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::deallocate(void*)+204) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 0000000000002760 /system/bin/gwp-asan-test-bin (doubleFree()+452) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#02 pc 00000000000029f4 /system/bin/gwp-asan-test-bin (main+480) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#03 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
deallocated by thread 24481:
#00 pc 000000000008c9c8 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 000000000008d204 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::deallocate(void*)+412) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#02 pc 00000000000026dc /system/bin/gwp-asan-test-bin (doubleFree()+320) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#03 pc 00000000000029f4 /system/bin/gwp-asan-test-bin (main+480) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#04 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#05 pc 0000000000002054 /system/bin/gwp-asan-test-bin (_start_main+64) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#06 pc 0000000000000000 <unknown>
allocated by thread 24481:
#00 pc 000000000008c9c8 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 000000000008d03c /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::allocate(unsigned long, unsigned long)+600) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#02 pc 0000000000050c1c /apex/com.android.runtime/lib64/bionic/libc.so ((anonymous namespace)::gwp_asan_malloc(unsigned long)+172) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#03 pc 0000000000051608 /apex/com.android.runtime/lib64/bionic/libc.so (malloc+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#04 pc 00000000000025cc /system/bin/gwp-asan-test-bin (doubleFree()+48) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#05 pc 00000000000029f4 /system/bin/gwp-asan-test-bin (main+480) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#06 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#07 pc 0000000000002054 /system/bin/gwp-asan-test-bin (_start_main+64) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#08 pc 0000000000000000 <unknown>
......
2)USE_AFTER_FREE检测及错误日志
Cmdline: ./system/bin/gwp-asan-test-bin use-after-free
pid: 24532, tid: 24532, name: gwp-asan-test-b >>> ./system/bin/gwp-asan-test-bin <<<
uid: 0
tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xb4000071064cf000
Cause: [GWP-ASan]: Use After Free, 0 bytes into a 20-byte allocation at 0x71064cf000
x0 00000071065fdd00 x1 00000056952c3c7e x2 000000000000000e x3 0000000000001000
x4 000000710303ccdb x5 0000007fdaaa6848 x6 3938356663666538 x7 3762393835666366
x8 0000000000000041 x9 0000000000000002 x10 0000007109e57000 x11 0000000000000002
x12 0000000000000179 x13 0000007fdaaa5fd4 x14 0000007103046193 x15 0000000000001000
x16 0000000000000001 x17 0000007103118f18 x18 0000007109902000 x19 00000071065fdd00
x20 b4000071064cf000 x21 00000071065fdd00 x22 00000071065fe4e0 x23 000000000000000a
x24 0000007108cd7000 x25 0000000000000000 x26 0000000000000000 x27 0000000000000000
x28 0000000000000000 x29 0000007fdaaa6950
lr 00000056952c527c sp 0000007fdaaa6940 pc 00000056952c5290 pst 0000000020001000
3 total frames
backtrace:
#00 pc 0000000000002290 /system/bin/gwp-asan-test-bin (useAfterFree()+228) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#01 pc 0000000000002a6c /system/bin/gwp-asan-test-bin (main+600) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#02 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
deallocated by thread 24532:
#00 pc 000000000008c9c8 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 000000000008d204 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::deallocate(void*)+412) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#02 pc 0000000000002278 /system/bin/gwp-asan-test-bin (useAfterFree()+204) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#03 pc 0000000000002a6c /system/bin/gwp-asan-test-bin (main+600) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#04 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#05 pc 0000000000002054 /system/bin/gwp-asan-test-bin (_start_main+64) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#06 pc 0000000000000000 <unknown>
allocated by thread 24532:
#00 pc 000000000008c9c8 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 000000000008d03c /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::allocate(unsigned long, unsigned long)+600) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#02 pc 0000000000050c1c /apex/com.android.runtime/lib64/bionic/libc.so ((anonymous namespace)::gwp_asan_malloc(unsigned long)+172) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#03 pc 0000000000051608 /apex/com.android.runtime/lib64/bionic/libc.so (malloc+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#04 pc 000000000005302c /system/lib64/libc++.so (operator new(unsigned long)+28) (BuildId: d2cabb0864b772e7c0c8aee489aec88d)
#05 pc 00000000000021d8 /system/bin/gwp-asan-test-bin (useAfterFree()+44) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#06 pc 0000000000002a6c /system/bin/gwp-asan-test-bin (main+600) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#07 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#08 pc 0000000000002054 /system/bin/gwp-asan-test-bin (_start_main+64) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#09 pc 0000000000000000 <unknown>
......
3)INVALID_FREE检测及错误日志
Cmdline: ./system/bin/gwp-asan-test-bin invalid-free
pid: 24835, tid: 24835, name: gwp-asan-test-b >>> ./system/bin/gwp-asan-test-bin <<<
uid: 0
tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x000000727505fff0
Cause: [GWP-ASan]: Invalid (Wild) Free, 0 bytes right of a 4-byte allocation at 0x727502dff0
x0 000000727505fff0 x1 0000000000000001 x2 0000007273dd2b18 x3 0000007273bc2cf8
x4 0000000000000008 x5 b40000727502bc19 x6 000000000000000a x7 00656572662d6469
x8 0000007275060000 x9 000000727501f000 x10 000000000000fff4 x11 000000000000000f
x12 0000007fc3c6d650 x13 0000007fc3c6d620 x14 0000000000000000 x15 0000000000001000
x16 0000000000000001 x17 0000007273bad988 x18 0000007276236000 x19 0000007273dd2ab8
x20 0000007273dd2b18 x21 0000007273dd2af0 x22 000000727502d000 x23 000000727502dff4
x24 0000007275567000 x25 00000072753dcf88 x26 00000072753dc000 x27 0000000000000000
x28 0000000000000000 x29 0000007fc3c6d6d0
lr 0000007273b3c190 sp 0000007fc3c6d6d0 pc 0000007273b3c190 pst 0000000060001000
4 total frames
backtrace:
#00 pc 000000000008d190 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::deallocate(void*)+296) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 0000000000002110 /system/bin/gwp-asan-test-bin (invalidFree()+184) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#02 pc 0000000000002b5c /system/bin/gwp-asan-test-bin (main+840) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#03 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
allocated by thread 24835:
#00 pc 000000000008c9c8 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 000000000008d03c /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::allocate(unsigned long, unsigned long)+600) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#02 pc 0000000000050c1c /apex/com.android.runtime/lib64/bionic/libc.so ((anonymous namespace)::gwp_asan_malloc(unsigned long)+172) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#03 pc 0000000000051608 /apex/com.android.runtime/lib64/bionic/libc.so (malloc+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#04 pc 0000000000002084 /system/bin/gwp-asan-test-bin (invalidFree()+44) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#05 pc 0000000000002b5c /system/bin/gwp-asan-test-bin (main+840) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#06 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#07 pc 0000000000002054 /system/bin/gwp-asan-test-bin (_start_main+64) (BuildId: a10339d018cd5874320f7cf79ae055cf)
#08 pc 0000000000000000 <unknown>
......
4)BUFFER_UNDERFLOW检测及错误日志
Cmdline: ./system/bin/gwp-asan-test-bin out-of-bounds
pid: 31250, tid: 31250, name: gwp-asan-test-b >>> ./system/bin/gwp-asan-test-bin <<<
uid: 0
tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xb40000757ba12fff
Cause: [GWP-ASan]: Buffer Underflow, 1 byte left of a 20-byte allocation at 0x757ba13000
x0 00000075774fcd00 x1 00000060766ccd2b x2 0000000000000010 x3 000000757ad7ccf8
x4 0000000000000008 x5 b40000757ba1100f x6 000000000000000a x7 7e20776f6c467265
x8 8c9fbb943553fe85 x9 8c9fbb943553fe85 x10 0000000000007a12 x11 00000000000000e0
x12 0000000000000018 x13 0000000000000002 x14 0000000000000000 x15 0000000000001000
x16 00000075774f9c70 x17 00000075774aad00 x18 000000757c146000 x19 00000075774fcd00
x20 b40000757ba13000 x21 00000075774fd4e0 x22 00000075774fcd00 x23 000000000000000a
x24 000000757be23000 x25 0000000000000041 x26 0000000000000000 x27 0000000000000000
x28 0000000000000000 x29 0000007feec70d10
lr 00000060766cd458 sp 0000007feec70d00 pc 00000060766cd468 pst 0000000060001000
3 total frames
backtrace:
#00 pc 0000000000001468 /system/bin/gwp-asan-test-bin (heapOverOrUnderFlow2()+332) (BuildId: f2344430b6f625e3d3503bce3de5a05b)
#01 pc 0000000000001a40 /system/bin/gwp-asan-test-bin (main+720) (BuildId: f2344430b6f625e3d3503bce3de5a05b)
#02 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
allocated by thread 31250:
#00 pc 000000000008c9c8 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 000000000008d03c /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::allocate(unsigned long, unsigned long)+600) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#02 pc 0000000000050c1c /apex/com.android.runtime/lib64/bionic/libc.so ((anonymous namespace)::gwp_asan_malloc(unsigned long)+172) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#03 pc 0000000000051608 /apex/com.android.runtime/lib64/bionic/libc.so (malloc+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#04 pc 000000000005302c /system/lib64/libc++.so (operator new(unsigned long)+28) (BuildId: d2cabb0864b772e7c0c8aee489aec88d)
#05 pc 000000000000134c /system/bin/gwp-asan-test-bin (heapOverOrUnderFlow2()+48) (BuildId: f2344430b6f625e3d3503bce3de5a05b)
#06 pc 0000000000001a40 /system/bin/gwp-asan-test-bin (main+720) (BuildId: f2344430b6f625e3d3503bce3de5a05b)
#07 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#08 pc 0000000000001054 /system/bin/gwp-asan-test-bin (_start_main+64) (BuildId: f2344430b6f625e3d3503bce3de5a05b)
#09 pc 0000000000000000 <unknown>
......
5)BUFFER_OVERFLOW检测及错误日志
Cmdline: ./system/bin/gwp-asan-test-bin out-of-bounds
pid: 743, tid: 743, name: gwp-asan-test-b >>> ./system/bin/gwp-asan-test-bin <<<
uid: 0
tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xb4000075e11763e0
Cause: [GWP-ASan]: Buffer Overflow, 1004 bytes right of a 20-byte allocation at 0x75e1175fe0
x0 00000075e113dd00 x1 000000620e93fd1c x2 000000000000000e x3 00000075dd3b6cf8
x4 0000000000000008 x5 b4000075e1173007 x6 000000000000000a x7 7f7f7f7f7f7f7f7f
x8 b23e63fa5f472c6b x9 b23e63fa5f472c6b x10 00000000000002e7 x11 00000000000000e0
x12 0000000000000018 x13 0000000000000003 x14 0000000000000000 x15 0000000000001000
x16 00000075e113ac70 x17 00000075e10ebd00 x18 00000075e167a000 x19 00000075e113dd00
x20 b4000075e1175fe0 x21 00000075e113e4e0 x22 00000075e113dd00 x23 000000000000000a
x24 00000075e13b8000 x25 0000000000000041 x26 0000000000000000 x27 0000000000000000
x28 0000000000000000 x29 0000007fd2a7d260
lr 000000620e9403e8 sp 0000007fd2a7d250 pc 000000620e9403fc pst 0000000060001000
3 total frames
backtrace:
#00 pc 00000000000013fc /system/bin/gwp-asan-test-bin (heapOverOrUnderFlow2()+224) (BuildId: d8fb1cebce88df5a173e6dfc8b7e994b)
#01 pc 0000000000001a40 /system/bin/gwp-asan-test-bin (main+720) (BuildId: d8fb1cebce88df5a173e6dfc8b7e994b)
#02 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
allocated by thread 743:
#00 pc 000000000008c9c8 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 000000000008d03c /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::allocate(unsigned long, unsigned long)+600) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#02 pc 0000000000050c1c /apex/com.android.runtime/lib64/bionic/libc.so ((anonymous namespace)::gwp_asan_malloc(unsigned long)+172) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#03 pc 0000000000051608 /apex/com.android.runtime/lib64/bionic/libc.so (malloc+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#04 pc 000000000005302c /system/lib64/libc++.so (operator new(unsigned long)+28) (BuildId: d2cabb0864b772e7c0c8aee489aec88d)
#05 pc 000000000000134c /system/bin/gwp-asan-test-bin (heapOverOrUnderFlow2()+48) (BuildId: d8fb1cebce88df5a173e6dfc8b7e994b)
#06 pc 0000000000001a40 /system/bin/gwp-asan-test-bin (main+720) (BuildId: d8fb1cebce88df5a173e6dfc8b7e994b)
#07 pc 000000000008d9cc /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+108) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#08 pc 0000000000001054 /system/bin/gwp-asan-test-bin (_start_main+64) (BuildId: d8fb1cebce88df5a173e6dfc8b7e994b)
#09 pc 0000000000000000 <unknown>
......
六、Android 12 与Android 14差异点
6.1 Android 14新增Recoverable GWP-Asan功能
Recoverable GWP-Asan:相比普通的GWP-Asan,Recoverable GWP-Asan具备以下几个特点:
1)进程发生GWP-Asan内存错误时,tombstone收集错误日志,但进程不会crash
2)进程只有在第一次发生GWP-Asan内存错误时,tombstone收集错误日志
Android 14 App默认打开可恢复的GWP-Asan功能(Recoverable GWP-ASan),采用随机抽样的方式选择该app进程,而native进程没有。
6.1.1 Recoverable GWP-Asan初始化
初始化阶段会注册global方法,提供给debuggerd进程回调:
bool MaybeInitGwpAsan(libc_globals* globals,
const android_mallopt_gwp_asan_options_t& mallopt_options) {
......
__libc_shared_globals()->gwp_asan_state = GuardedAlloc.getAllocatorState();
__libc_shared_globals()->gwp_asan_metadata = GuardedAlloc.getMetadataRegion();
__libc_shared_globals()->debuggerd_needs_gwp_asan_recovery = NeedsGwpAsanRecovery;
__libc_shared_globals()->debuggerd_gwp_asan_pre_crash_report = GwpAsanPreCrashHandler;
__libc_shared_globals()->debuggerd_gwp_asan_post_crash_report = GwpAsanPostCrashHandler;
return true;
}
6.1.2 app默认开启Recoverable GWP-Asan
App启动时,系统默认设置为app设置为TURN_ON_FOR_APP_SAMPLED_NON_CRASHING,即启用可恢复的GWP-Asan
const char* kGwpAsanAppRecoverableSysprop =
"persist.device_config.memory_safety_native.gwp_asan_recoverable_apps";
// com_android_internal_os_Zygote.cpp
static void SpecializeCommon(...) {
......
switch (runtime_flags & RuntimeFlags::GWP_ASAN_LEVEL_MASK) {
// 默认所有app启用可恢复的GWP-Asan功能
default:
case RuntimeFlags::GWP_ASAN_LEVEL_DEFAULT:
// 如果属性不存在,则返回默认值 true
gwp_asan_options.desire = GetBoolProperty(kGwpAsanAppRecoverableSysprop, true)
? Action::TURN_ON_FOR_APP_SAMPLED_NON_CRASHING
: Action::DONT_TURN_ON_UNLESS_OVERRIDDEN; // 属性值设置false,关闭该功能
android_mallopt(M_INITIALIZE_GWP_ASAN, &gwp_asan_options, sizeof(gwp_asan_options));
break;
// 关闭GWP-Asan功能
case RuntimeFlags::GWP_ASAN_LEVEL_NEVER:
gwp_asan_options.desire = Action::DONT_TURN_ON_UNLESS_OVERRIDDEN;
android_mallopt(M_INITIALIZE_GWP_ASAN, &gwp_asan_options, sizeof(gwp_asan_options));
break;
// 整个app全局启用GWP-Asan功能,包括所有进程、activity、service等
case RuntimeFlags::GWP_ASAN_LEVEL_ALWAYS:
gwp_asan_options.desire = Action::TURN_ON_FOR_APP;
android_mallopt(M_INITIALIZE_GWP_ASAN, &gwp_asan_options, sizeof(gwp_asan_options));
break;
// 抽样方式启用GWP-Asan功能
case RuntimeFlags::GWP_ASAN_LEVEL_LOTTERY:
gwp_asan_options.desire = Action::TURN_ON_WITH_SAMPLING;
android_mallopt(M_INITIALIZE_GWP_ASAN, &gwp_asan_options, sizeof(gwp_asan_options));
break;
}
......
}
// bionic/libc/bionic/malloc_common_dynamic.cpp
extern "C" bool android_mallopt(int opcode, void* arg, size_t arg_size) {
......
if (opcode == M_INITIALIZE_GWP_ASAN) {
if (arg == nullptr || arg_size != sizeof(android_mallopt_gwp_asan_options_t)) {
errno = EINVAL;
return false;
}
return EnableGwpAsan(*reinterpret_cast<android_mallopt_gwp_asan_options_t*>(arg));
}
......
}
MaybeInitGwpAsan方法主要做了两个事情:
1)根据options赋值---GetGwpAsanOptions
2)根据process_sample_rate判断是否被选中---ShouldGwpAsanSampleProcess
// bionic/libc/bionic/gwp_asan_wrappers.cpp
bool EnableGwpAsan(const android_mallopt_gwp_asan_options_t& options) {
if (GwpAsanInitialized) {
return true;
}
bool ret_value;
__libc_globals.mutate(
[&](libc_globals* globals) { ret_value = MaybeInitGwpAsan(globals, options); });
return ret_value;
}
// 获取options参数、根据process_sample_rate判断是否被选中
bool MaybeInitGwpAsan(libc_globals* globals,
const android_mallopt_gwp_asan_options_t& mallopt_options) {
......
Options options;
unsigned process_sample_rate = kDefaultProcessSampling;
if (!GetGwpAsanOptions(&options, &process_sample_rate, mallopt_options) &&
mallopt_options.desire == Action::DONT_TURN_ON_UNLESS_OVERRIDDEN) {
return false;
}
// 如果malloc随机数为0或进程抽样率为0或分配内存数为0,返回false,GWP-Asan启用失败
if (options.SampleRate == 0 || process_sample_rate == 0 ||
options.MaxSimultaneousAllocations == 0) {
return false;
}
// 如果process_sample_rate为1,返回true。否则,生成的随机数为128的倍数,才会返回true
if (!ShouldGwpAsanSampleProcess(process_sample_rate)) {
return false;
}
......
return true;
}
bool ShouldGwpAsanSampleProcess(unsigned sample_rate) {
// 进程采样率为2的幂,否则计算会出现偏差
if (!isPowerOfTwo(sample_rate)) {
warning_log(
"GWP-ASan process sampling rate of %u is not a power-of-two, and so modulo bias occurs.",
sample_rate);
}
uint8_t random_number;
__libc_safe_arc4random_buf(&random_number, sizeof(random_number));
return random_number % sample_rate == 0;
}
// 检测是否为2的幂
bool isPowerOfTwo(uint64_t x) {
assert(x != 0);
return (x & (x - 1)) == 0;
}
GetGwpAsanOptions方法主要做了两个事情:
1)设置一些默认值---SetDefaultGwpAsanOptions
2)根据prop属性值进行赋值
// 设置一些默认值、根据prop属性值进行赋值
bool GetGwpAsanOptions(Options* options, unsigned* process_sample_rate,
const android_mallopt_gwp_asan_options_t& mallopt_options) {
SetDefaultGwpAsanOptions(options, process_sample_rate, mallopt_options);
bool had_overrides = false;
/*static const char* kSampleRateSystemSysprop = "libc.debug.gwp_asan.sample_rate.system_default";
static const char* kSampleRateAppSysprop = "libc.debug.gwp_asan.sample_rate.app_default";
static const char* kSampleRateTargetedSyspropPrefix = "libc.debug.gwp_asan.sample_rate.";
static const char* kSampleRateEnvVar = "GWP_ASAN_SAMPLE_RATE"; */
//
unsigned long long buf;
if (GetGwpAsanIntegerOption(&buf, mallopt_options, kSampleRateSystemSysprop,
kSampleRateAppSysprop, kSampleRateTargetedSyspropPrefix,
kSampleRateEnvVar, "sample rate")) {
// malloc抽样率
options->SampleRate = buf;
had_overrides = true;
}
/*static const char* kProcessSamplingSystemSysprop =
"libc.debug.gwp_asan.process_sampling.system_default";
static const char* kProcessSamplingAppSysprop = "libc.debug.gwp_asan.process_sampling.app_default";
static const char* kProcessSamplingTargetedSyspropPrefix = "libc.debug.gwp_asan.process_sampling.";
static const char* kProcessSamplingEnvVar = "GWP_ASAN_PROCESS_SAMPLING";*/
if (GetGwpAsanIntegerOption(&buf, mallopt_options, kProcessSamplingSystemSysprop,
kProcessSamplingAppSysprop, kProcessSamplingTargetedSyspropPrefix,
kProcessSamplingEnvVar, "process sampling rate")) {
// 进程启用GWP-Asan抽样率
*process_sample_rate = buf;
had_overrides = true;
}
/* static const char* kMaxAllocsSystemSysprop = "libc.debug.gwp_asan.max_allocs.system_default";
static const char* kMaxAllocsAppSysprop = "libc.debug.gwp_asan.max_allocs.app_default";
static const char* kMaxAllocsTargetedSyspropPrefix = "libc.debug.gwp_asan.max_allocs.";
static const char* kMaxAllocsEnvVar = "GWP_ASAN_MAX_ALLOCS"; */
// 可分配内存数量
if (GetGwpAsanIntegerOption(&buf, mallopt_options, kMaxAllocsSystemSysprop, kMaxAllocsAppSysprop,
kMaxAllocsTargetedSyspropPrefix, kMaxAllocsEnvVar,
"maximum simultaneous allocations")) {
options->MaxSimultaneousAllocations = buf;
had_overrides = true;
} else if (had_overrides) {
// Multiply the number of slots available, such that the ratio between
// sampling rate and slots is kept the same as the default. For example, a
// sampling rate of 1000 is 2.5x more frequent than default, and so
// requires 80 slots (32 * 2.5).
// 如果malloc抽样频率被设置为原来的N备,则最大分配内存设置为默认的1/N,除非手动设置最大分配内存值
// kDefaultSampleRate默认值为2500,kDefaultMaxAllocs默认值为32
float frequency_multiplier = static_cast<float>(options->SampleRate) / kDefaultSampleRate;
options->MaxSimultaneousAllocations =
/* default */ kDefaultMaxAllocs / frequency_multiplier;
}
/* static const char* kRecoverableSystemSysprop = "libc.debug.gwp_asan.recoverable.system_default";
static const char* kRecoverableAppSysprop = "libc.debug.gwp_asan.recoverable.app_default";
static const char* kRecoverableTargetedSyspropPrefix = "libc.debug.gwp_asan.recoverable.";
static const char* kRecoverableEnvVar = "GWP_ASAN_RECOVERABLE"; */
// GWP-Asan的recoverable设置
bool recoverable = false;
if (GetGwpAsanBoolOption(&recoverable, mallopt_options, kRecoverableSystemSysprop,
kRecoverableAppSysprop, kRecoverableTargetedSyspropPrefix,
kRecoverableEnvVar, "recoverable")) {
options->Recoverable = recoverable;
GwpAsanRecoverable = recoverable;
had_overrides = true;
}
return had_overrides;
}
SetDefaultGwpAsanOptions的代码,如下:
app默认设置为TURN_ON_FOR_APP_SAMPLED_NON_CRASHING,process_sample_rate设置为128,且Recoverable设置为true,
// app默认设置为TURN_ON_FOR_APP_SAMPLED_NON_CRASHING,1/128抽样,且Recoverable设置为true
void SetDefaultGwpAsanOptions(Options* options, unsigned* process_sample_rate,
const android_mallopt_gwp_asan_options_t& mallopt_options) {
options->Enabled = true;
options->InstallSignalHandlers = false;
options->InstallForkHandlers = true;
options->Backtrace = android_unsafe_frame_pointer_chase;
options->SampleRate = kDefaultSampleRate;
options->MaxSimultaneousAllocations = kDefaultMaxAllocs;
// process_sample_rate为1,表示直接启用GWP-Asan,无需 1/128 抽样
*process_sample_rate = 1;
// 默认配置了TURN_ON_WITH_SAMPLING或TURN_ON_FOR_APP_SAMPLED_NON_CRASHING,需1/128 抽样
if (mallopt_options.desire == Action::TURN_ON_WITH_SAMPLING) {
*process_sample_rate = kDefaultProcessSampling; // 128
} else if (mallopt_options.desire == Action::TURN_ON_FOR_APP_SAMPLED_NON_CRASHING) {
*process_sample_rate = kDefaultProcessSampling; // 128
options->Recoverable = true;
GwpAsanRecoverable = true;
}
}
GetGwpAsanIntegerOption的代码,如下:
bool GetGwpAsanIntegerOption(unsigned long long* result,
const android_mallopt_gwp_asan_options_t& mallopt_options,
const char* system_sysprop, const char* app_sysprop,
const char* targeted_sysprop_prefix, const char* env_var,
const char* descriptive_name) {
char buffer[PROP_VALUE_MAX];
if (!GetGwpAsanOptionImpl(buffer, mallopt_options, system_sysprop, app_sysprop,
targeted_sysprop_prefix, env_var)) {
return false;
}
char* end;
unsigned long long value = strtoull(buffer, &end, 10);
if (value == ULLONG_MAX || *end != '\0') {
warning_log("Invalid GWP-ASan %s: \"%s\". Using default value instead.", descriptive_name,
buffer);
return false;
}
*result = value;
return true;
}
bool GetGwpAsanOptionImpl(char* value_out,
const android_mallopt_gwp_asan_options_t& mallopt_options,
const char* system_sysprop, const char* app_sysprop,
const char* targeted_sysprop_prefix, const char* env_var) {
// 获取程序名称
const char* basename = "";
if (mallopt_options.program_name) basename = __gnu_basename(mallopt_options.program_name);
constexpr size_t kSyspropMaxLen = 512;
char program_specific_sysprop[kSyspropMaxLen] = {};
char persist_program_specific_sysprop[kSyspropMaxLen] = {};
char persist_default_sysprop[kSyspropMaxLen] = {};
const char* sysprop_names[4] = {};
// Tests use a blank program name to specify that system properties should not
// be used. Tests still continue to use the environment variable though.
if (*basename != '\0') {
const char* default_sysprop = system_sysprop;
// 如果action为TURN_ON_FOR_APP,默认系统属性为app属性
if (mallopt_options.desire == Action::TURN_ON_FOR_APP) {
default_sysprop = app_sysprop;
}
async_safe_format_buffer(&program_specific_sysprop[0], kSyspropMaxLen, "%s%s",
targeted_sysprop_prefix, basename);
async_safe_format_buffer(&persist_program_specific_sysprop[0], kSyspropMaxLen, "%s%s",
kPersistPrefix, program_specific_sysprop);
async_safe_format_buffer(&persist_default_sysprop[0], kSyspropMaxLen, "%s%s", kPersistPrefix,
default_sysprop);
// In order of precedence, always take the program-specific sysprop (e.g.
// '[persist.]libc.debug.gwp_asan.sample_rate.cameraserver') over the
// generic sysprop (e.g.
// '[persist.]libc.debug.gwp_asan.(system_default|app_default)'). In
// addition, always take the non-persistent option over the persistent
// option.
// 优先级顺序:程序特定的非持久化系统属性、程序特定的持久化系统属性、
// 默认的非持久化系统属性、默认的持久化系统属性
sysprop_names[0] = program_specific_sysprop;
sysprop_names[1] = persist_program_specific_sysprop;
sysprop_names[2] = default_sysprop;
sysprop_names[3] = persist_default_sysprop;
}
return get_config_from_env_or_sysprops(env_var, sysprop_names, arraysize(sysprop_names),
value_out, PROP_VALUE_MAX);
}
6.1.3 Recoverable GWP-Asan内存释放
void GuardedPoolAllocator::deallocate(void *Ptr) {
assert(pointerIsMine(Ptr) && "Pointer is not mine!");
uintptr_t UPtr = reinterpret_cast<uintptr_t>(Ptr);
size_t Slot = State.getNearestSlot(UPtr);
uintptr_t SlotStart = State.slotToAddr(Slot);
AllocationMetadata *Meta = addrToMetadata(UPtr);
// If this allocation is responsible for crash, never recycle it. Turn the
// deallocate() call into a no-op.
if (Meta->HasCrashed)
return;
// 释放时的内存与分配时的内存地址不一致,视为无效释放
if (Meta->Addr != UPtr) {
raiseInternallyDetectedError(UPtr, Error::INVALID_FREE);
return;
}
// double-free
if (Meta->IsDeallocated) {
raiseInternallyDetectedError(UPtr, Error::DOUBLE_FREE);
return;
}
......
}
void GuardedPoolAllocator::raiseInternallyDetectedError(uintptr_t Address,
Error E) {
// Disable the allocator before setting the internal failure state. In
// non-recoverable mode, the allocator will be permanently disabled, and so
// things will be accessed without locks.
// 在处理进程内存错误时,关闭gwp-asan分配器
disable();
// Races between internally- and externally-raised faults can happen. Right
// now, in this thread we've locked the allocator in order to raise an
// internally-detected fault, and another thread could SIGSEGV to raise an
// externally-detected fault. What will happen is that the other thread will
// wait in the signal handler, as we hold the allocator's locks from the
// disable() above. We'll trigger the signal handler by touching the
// internal-signal-raising address below, and the signal handler from our
// thread will get to run first as we will continue to hold the allocator
// locks until the enable() at the end of this function. Be careful though, if
// this thread receives another SIGSEGV after the disable() above, but before
// touching the internal-signal-raising address below, then this thread will
// get an "externally-raised" SIGSEGV while *also* holding the allocator
// locks, which means this thread's signal handler will deadlock. This could
// be resolved with a re-entrant lock, but asking platforms to implement this
// seems unnecessary given the only way to get a SIGSEGV in this critical
// section is either a memory safety bug in the couple lines of code below (be
// careful!), or someone outside uses `kill(this_thread, SIGSEGV)`, which
// really shouldn't happen.
State.FailureType = E;
State.FailureAddress = Address;
// Raise a SEGV by touching a specific address that identifies to the crash
// handler that this is an internally-raised fault. Changing this address?
// Don't forget to update __gwp_asan_get_internal_crash_address.
volatile char *p =
reinterpret_cast<char *>(State.internallyDetectedErrorFaultAddress());
*p = 0;
// This should never be reached in non-recoverable mode. Ensure that the
// signal handler called handleRecoverablePostCrashReport(), which was
// responsible for re-setting these fields.
assert(State.FailureType == Error::UNKNOWN);
assert(State.FailureAddress == 0u);
// In recoverable mode, the signal handler (after dumping the crash) marked
// the page containing the InternalFaultSegvAddress as read/writeable, to
// allow the second touch to succeed after returning from the signal handler.
// Now, we need to mark the page as non-read/write-able again, so future
// internal faults can be raised.
deallocateInGuardedPool(
reinterpret_cast<void *>(getPageAddr(
State.internallyDetectedErrorFaultAddress(), State.PageSize)),
State.PageSize);
// And now we're done with patching ourselves back up, enable the allocator.
enable();
}
6.1.4 Tombstone
检测到由Recoverable GWP-ASan引起的可恢复崩溃时,进行必要的处理,包括生成调试报告(仅第一次)和恢复内存分配器,以允许应用继续运行。
// When debuggerd's signal handler is the first handler called, it's great at
// handling the recoverable GWP-ASan mode. For apps, sigchain (from libart) is
// always the first signal handler, and so the following function is what
// sigchain must call before processing the signal. This allows for processing
// of a potentially recoverable GWP-ASan crash. If the signal requires GWP-ASan
// recovery, then dump a report (via the regular debuggerd hanndler), and patch
// up the allocator, and allow the process to continue (indicated by returning
// 'true'). If the crash has nothing to do with GWP-ASan, or recovery isn't
// possible, return 'false'.
bool debuggerd_handle_signal(int signal_number, siginfo_t* info, void* context) {
if (signal_number != SIGSEGV || !signal_has_si_addr(info)) return false;
if (g_callbacks.get_gwp_asan_callbacks == nullptr) return false;
gwp_asan_callbacks_t gwp_asan_callbacks = g_callbacks.get_gwp_asan_callbacks();
if (gwp_asan_callbacks.debuggerd_needs_gwp_asan_recovery == nullptr ||
gwp_asan_callbacks.debuggerd_gwp_asan_pre_crash_report == nullptr ||
gwp_asan_callbacks.debuggerd_gwp_asan_post_crash_report == nullptr ||
!gwp_asan_callbacks.debuggerd_needs_gwp_asan_recovery(info->si_addr)) {
return false;
}
// Only dump a crash report for the first GWP-ASan crash. ActivityManager
// doesn't like it when an app crashes multiple times, and is even more strict
// about an app crashing multiple times in a short time period. While the app
// won't crash fully when we do GWP-ASan recovery, ActivityManager still gets
// the information about the crash through the DropBoxManager service. If an
// app has multiple back-to-back GWP-ASan crashes, this would lead to the app
// being killed, which defeats the purpose of having the recoverable mode. To
// mitigate against this, only generate a debuggerd crash report for the first
// GWP-ASan crash encountered. We still need to do the patching up of the
// allocator though, so do that.
static pthread_mutex_t first_crash_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&first_crash_mutex);
static bool first_crash = true;
// 只有第一次会生成报告
if (first_crash) {
// `debuggerd_signal_handler` will call
// `debuggerd_gwp_asan_(pre|post)_crash_report`, so no need to manually call
// them here.
debuggerd_signal_handler(signal_number, info, context);
first_crash = false;
} else {
gwp_asan_callbacks.debuggerd_gwp_asan_pre_crash_report(info->si_addr);
gwp_asan_callbacks.debuggerd_gwp_asan_post_crash_report(info->si_addr);
}
pthread_mutex_unlock(&first_crash_mutex);
return true;
}
debuggerd_gwp_asan_pre_crash_report:
void GwpAsanPreCrashHandler(void* fault_ptr) {
fault_ptr = untag_address(fault_ptr);
if (!NeedsGwpAsanRecovery(fault_ptr)) return;
GuardedAlloc.preCrashReport(fault_ptr);
}
void GuardedPoolAllocator::preCrashReport(void *Ptr) {
assert(pointerIsMine(Ptr) && "Pointer is not mine!");
uintptr_t InternalCrashAddr = __gwp_asan_get_internal_crash_address(
&State, reinterpret_cast<uintptr_t>(Ptr));
if (!InternalCrashAddr)
disable();
// If something in the signal handler calls malloc() while dumping the
// GWP-ASan report (e.g. backtrace_symbols()), make sure that GWP-ASan doesn't
// service that allocation. `PreviousRecursiveGuard` is protected by the
// allocator locks taken in disable(), either explicitly above for
// externally-raised errors, or implicitly in raiseInternallyDetectedError()
// for internally-detected errors.
PreviousRecursiveGuard = getThreadLocals()->RecursiveGuard;
getThreadLocals()->RecursiveGuard = true;
}
debuggerd_gwp_asan_post_crash_report:
void GwpAsanPostCrashHandler(void* fault_ptr) {
fault_ptr = untag_address(fault_ptr);
if (!NeedsGwpAsanRecovery(fault_ptr)) return;
GuardedAlloc.postCrashReportRecoverableOnly(fault_ptr);
}
void GuardedPoolAllocator::postCrashReportRecoverableOnly(void *SignalPtr) {
uintptr_t SignalUPtr = reinterpret_cast<uintptr_t>(SignalPtr);
uintptr_t InternalCrashAddr =
__gwp_asan_get_internal_crash_address(&State, SignalUPtr);
uintptr_t ErrorUptr = InternalCrashAddr ?: SignalUPtr;
AllocationMetadata *Metadata = addrToMetadata(ErrorUptr);
Metadata->HasCrashed = true;
allocateInGuardedPool(
reinterpret_cast<void *>(getPageAddr(SignalUPtr, State.PageSize)),
State.PageSize);
// Clear the internal state in order to not confuse the crash handler if a
// use-after-free or buffer-overflow comes from a different allocation in the
// future.
if (InternalCrashAddr) {
State.FailureType = Error::UNKNOWN;
State.FailureAddress = 0;
}
size_t Slot = State.getNearestSlot(ErrorUptr);
// If the slot is available, remove it permanently.
for (size_t i = 0; i < FreeSlotsLength; ++i) {
if (FreeSlots[i] == Slot) {
FreeSlots[i] = FreeSlots[FreeSlotsLength - 1];
FreeSlotsLength -= 1;
break;
}
}
getThreadLocals()->RecursiveGuard = PreviousRecursiveGuard;
if (!InternalCrashAddr)
enable();
}
小结:
Recoverable GWP-ASan相比GWP-ASan功能,优势如下:
a.可恢复性:Recoverable GWP-ASan在检测到内存错误时尝试恢复,而基本的GWP-ASan可能只是报告错误。
b.用户体验:Recoverable GWP-ASan旨在减少应用程序崩溃,提高用户体验,而基本的GWP-ASan可能不会考虑错误发生后的应用程序状态。
c.量产环境适用性:Recoverable GWP-ASan更适合在生产环境中使用,因为它可以减少对用户的影响,而基本的GWP-ASan更多用于开发和测试阶段。
6.2 Android 14及以上版本默认app启用Recoverable GWP-Asan
Android 12版本的代码:
// com_android_internal_os_Zygote.cpp
static void SpecializeCommon(...) {
......
bool forceEnableGwpAsan = false;
switch (runtime_flags & RuntimeFlags::GWP_ASAN_LEVEL_MASK) {
// 默认不启用GWP-Asan
default:
case RuntimeFlags::GWP_ASAN_LEVEL_NEVER:
break;
case RuntimeFlags::GWP_ASAN_LEVEL_ALWAYS:
forceEnableGwpAsan = true;
[[fallthrough]];
case RuntimeFlags::GWP_ASAN_LEVEL_LOTTERY:
android_mallopt(M_INITIALIZE_GWP_ASAN, &forceEnableGwpAsan, sizeof(forceEnableGwpAsan));
}
......
}
Android 14版本的代码:
默认所有的app启用可恢复的GWP-Asan功能,即gwp_asan_options.desire设置为TURN_ON_FOR_APP_SAMPLED_NON_CRASHING.
如果"persist.device_config.memory_safety_native.gwp_asan_recoverable_apps"属性值设置为false,关闭该功能。
const char* kGwpAsanAppRecoverableSysprop =
"persist.device_config.memory_safety_native.gwp_asan_recoverable_apps";
// com_android_internal_os_Zygote.cpp
static void SpecializeCommon(...) {
......
switch (runtime_flags & RuntimeFlags::GWP_ASAN_LEVEL_MASK) {
// 默认所有app启用可恢复的GWP-Asan功能
default:
case RuntimeFlags::GWP_ASAN_LEVEL_DEFAULT:
// 如果属性不存在,则返回默认值 true
gwp_asan_options.desire = GetBoolProperty(kGwpAsanAppRecoverableSysprop, true)
? Action::TURN_ON_FOR_APP_SAMPLED_NON_CRASHING
: Action::DONT_TURN_ON_UNLESS_OVERRIDDEN; // 属性值设置false,关闭该功能
android_mallopt(M_INITIALIZE_GWP_ASAN, &gwp_asan_options, sizeof(gwp_asan_options));
break;
// 关闭GWP-Asan功能
case RuntimeFlags::GWP_ASAN_LEVEL_NEVER:
gwp_asan_options.desire = Action::DONT_TURN_ON_UNLESS_OVERRIDDEN;
android_mallopt(M_INITIALIZE_GWP_ASAN, &gwp_asan_options, sizeof(gwp_asan_options));
break;
// 整个app全局启用GWP-Asan功能,包括所有进程、activity、service等
case RuntimeFlags::GWP_ASAN_LEVEL_ALWAYS:
gwp_asan_options.desire = Action::TURN_ON_FOR_APP;
android_mallopt(M_INITIALIZE_GWP_ASAN, &gwp_asan_options, sizeof(gwp_asan_options));
break;
// 抽样方式启用GWP-Asan功能
case RuntimeFlags::GWP_ASAN_LEVEL_LOTTERY:
gwp_asan_options.desire = Action::TURN_ON_WITH_SAMPLING;
android_mallopt(M_INITIALIZE_GWP_ASAN, &gwp_asan_options, sizeof(gwp_asan_options));
break;
}
......
}
6.3 启用GWP-Asan的方式不一致
1)调用android_mallopt方法,传入参数不一致
2)启用GWP-Asan的判断条件存在较大的差异
Android 12代码:
bool enableGwpAsan = true;
int opcode = M_INITIALIZE_GWP_ASAN;
bool result = android_mallopt(opcode, &enableGwpAsan, sizeof(enableGwpAsan));
// bionic/libc/bionic/malloc_common_dynamic.cpp
extern "C" bool android_mallopt(int opcode, void* arg, size_t arg_size) {
......
if (opcode == M_INITIALIZE_GWP_ASAN) {
if (arg == nullptr || arg_size != sizeof(bool)) {
errno = EINVAL;
return false;
}
__libc_globals.mutate([&](libc_globals* globals) {
return MaybeInitGwpAsan(globals, *reinterpret_cast<bool*>(arg));
});
}
......
}
bool MaybeInitGwpAsan(libc_globals* globals, bool force_init) {
......
// If the caller hasn't forced GWP-ASan on, check whether we should sample
// this process.
if (!force_init && !ShouldGwpAsanSampleProcess()) {
return false;
}
......
return true;
}
Android 14代码:
android_mallopt_gwp_asan_options_t gwp_asan_options;
int opcode = M_INITIALIZE_GWP_ASAN;
android_mallopt(opcode, &gwp_asan_options, sizeof(gwp_asan_options));
// bionic/libc/bionic/malloc_common_dynamic.cpp
extern "C" bool android_mallopt(int opcode, void* arg, size_t arg_size) {
......
if (opcode == M_INITIALIZE_GWP_ASAN) {
if (arg == nullptr || arg_size != sizeof(android_mallopt_gwp_asan_options_t)) {
errno = EINVAL;
return false;
}
return EnableGwpAsan(*reinterpret_cast<android_mallopt_gwp_asan_options_t*>(arg));
}
......
}
// bionic/libc/bionic/gwp_asan_wrappers.cpp
bool EnableGwpAsan(const android_mallopt_gwp_asan_options_t& options) {
if (GwpAsanInitialized) {
return true;
}
bool ret_value;
__libc_globals.mutate(
[&](libc_globals* globals) { ret_value = MaybeInitGwpAsan(globals, options); });
return ret_value;
}
bool MaybeInitGwpAsan(libc_globals* globals,
const android_mallopt_gwp_asan_options_t& mallopt_options) {
......
Options options;
unsigned process_sample_rate = kDefaultProcessSampling; // 128
if (!GetGwpAsanOptions(&options, &process_sample_rate, mallopt_options) &&
mallopt_options.desire == Action::DONT_TURN_ON_UNLESS_OVERRIDDEN) {
return false;
}
// 如果malloc随机数为0或进程抽样率为0或分配内存数为0,返回false,GWP-Asan启用失败
if (options.SampleRate == 0 || process_sample_rate == 0 ||
options.MaxSimultaneousAllocations == 0) {
return false;
}
// 如果process_sample_rate为1,返回true。否则,生成的随机数为128的倍数,才会返回true
if (!ShouldGwpAsanSampleProcess(process_sample_rate)) {
return false;
}
......
return true;
}
6.4 Android 14上选项可配置
1)配置malloc抽样率
"libc.debug.gwp_asan.sample_rate.xxx" ---提供给系统进程配置malloc抽样率 "libc.debug.gwp_asan.sample_rate.xxx" ---提供给app配置malloc抽样率
2)配置进程启用GWP-Asan抽样率
"libc.debug.gwp_asan.process_sampling.xxx" ---提供给系统进程启用GWP-Asan抽样率 "libc.debug.gwp_asan.process_sampling.app_default" -提供给app启用GWP-Asan抽样率
3)配置最大可分配内存的数量
"libc.debug.gwp_asan.max_allocs.xxx" ---提供给系统进程配置最大可分配内存的数量 "libc.debug.gwp_asan.max_allocs.xxx" ---提供给app配置最大可分配内存的数量
如果不手动设置GWP-Asan最大内存分配size,则需要根据sample_rate来调整,若sample_rate设置为默认值的N倍,最大内存分配size为默认值的1/N,因此一般情况下都需要手动设置。
4)配置GWP-Asan recoverable启用或关闭
"libc.debug.gwp_asan.recoverable.xxx" ---提供给系统进程启用可恢复的GWP-Asan(tombstone只会收集错误信息,进程不会crash) "libc.debug.gwp_asan.recoverable.xxx" ---提供给app启用可恢复的GWP-Asan
5)app默认Recoverable GWP-Asan功能属性配置
persist.device_config.memory_safety_native.gwp_asan_recoverable_apps属性值false表示关闭app默认Recoverable GWP-Asan功能,true或不存在表示默认打开。如果上面的属性配置了,则会覆盖persist.device_config.memory_safety_native.gwp_asan_recoverable_apps属性配置。
gwp-asan-test-bin单进程打开:
设置属性:
:/ # setprop libc.debug.gwp_asan.process_sampling.gwp-asan-test-bin 1
:/ # setprop libc.debug.gwp_asan.sample_rate.gwp-asan-test-bin 1
:/ # setprop ibc.debug.gwp_asan.max_allocs.gwp-asan-test-bin 64
:/ # setprop libc.debug.gwp_asan.recoverable.gwp-asan-test-bin 1
执行进程:
./system/bin/gwp-asan-test double-free
如果要对所有的native进程打开:
setprop libc.debug.gwp_asan.process_sampling.system_default 1
setprop libc.debug.gwp_asan.sample_rate.system_default 1
setprop libc.debug.gwp_asan.max_allocs.system_default 128
setprop libc.debug.gwp_asan.recoverable.system_default 1
如果要对所有的app进程打开:
setprop libc.debug.gwp_asan.process_sampling.app_default 1
setprop libc.debug.gwp_asan.sample_rate.app_default 1
单个进程打开GWP-Asan: set prop libc.debug.gwp_asan.process_sampling.xxx(进程名) 1 setprop libc.debug.gwp_asan.sample_rate.xxx(进程名) 1
注:
1)以上属性值的配置不带[persist.]前缀,设备重启后失效,需要重新设置属性。如果有需要设备重启后,然保留属性值,可以设置带[persist.]前缀的属性值,如:
setprop persist.libc.debug.gwp_asan.process_sampling.system_default 1,属性值会保存到/data/property/
persistent_properties文件中,下次重启后数据还在。
2)xxx.libc.debug.gwp_asan.process_sampling.app_default属性值设置会覆盖persist.device_config.memory_safety_native.gwp_asan_recoverable_apps属性值
优先级顺序:程序特定的非持久化系统属性--->程序特定的持久化系统属性--->默认的非持久化系统属性--->默认的持久化系统属性
3)单个进程属性值设置会覆盖全局属性值的设置
4)进程采样率需设置2的幂,否则可能会导致模运算偏差(modulo bias)
6.5 Recoverable实现原理
进程发生内存错误时,为什么不会立刻Crash?
1)BUFFER_OVERFLOW、BUFFER_UNDERFLOW、USE-AFTER-FREE类型的错误
当收集完tombstone日志后,标记为崩溃已处理,程序不退出
2)DOUBLE-FREE、INVALID_FREE类型错误
在deallocate流程中检测到DOUBLE-FREE、INVALID_FREE错误并处理,处理完后跳出deallocate逻辑
七、性能&稳定性影响
7.1 性能评估
1)内存占用
单个进程默认情况下使能GWP-Asan,占GWP-Asan内存池大小为67个page,共268KB. 包括slot page 32个,guard page 33个,internal-mem-detect page 1个,freeslot page 1个。
如果系统中存在1000个进程且全部使能GWP-Asan,共消耗内存:1000 * 67 * 4KB = 268000 KB = 261.7 MB
2)内存分配与释放效率
由于GWP-Asan malloc\free时,需要收集堆栈信息,理论上存在一定的耗时。
对比测试Scudo malloc\free(正常分配机制)和GWP-Asan malloc\free发现耗时差距不大,基本可以忽略。
部分log,如下:
Scudo malloc/free
06-05 12:00:17.615 16241 16241 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc start time
06-05 12:00:17.615 16241 16241 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc end time
06-05 12:00:17.615 16241 16241 E gwp-asan-test-bin2: gwp-asan-test-bin : free start time
06-05 12:00:17.615 16241 16241 E gwp-asan-test-bin2: gwp-asan-test-bin : free end time
06-05 12:00:20.013 16255 16255 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc start time
06-05 12:00:20.014 16255 16255 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc end time
06-05 12:00:20.014 16255 16255 E gwp-asan-test-bin2: gwp-asan-test-bin : free start time
06-05 12:00:20.014 16255 16255 E gwp-asan-test-bin2: gwp-asan-test-bin : free end time
06-05 12:00:22.197 16269 16269 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc start time
06-05 12:00:22.197 16269 16269 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc end time
06-05 12:00:22.197 16269 16269 E gwp-asan-test-bin2: gwp-asan-test-bin : free start time
06-05 12:00:22.197 16269 16269 E gwp-asan-test-bin2: gwp-asan-test-bin : free end time
06-05 12:00:23.639 16272 16272 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc start time
06-05 12:00:23.639 16272 16272 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc end time
06-05 12:00:23.639 16272 16272 E gwp-asan-test-bin2: gwp-asan-test-bin : free start time
06-05 12:00:23.639 16272 16272 E gwp-asan-test-bin2: gwp-asan-test-bin : free end time
06-05 12:00:24.940 16284 16284 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc start time
06-05 12:00:24.941 16284 16284 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc end time
06-05 12:00:24.941 16284 16284 E gwp-asan-test-bin2: gwp-asan-test-bin : free start time
06-05 12:00:24.941 16284 16284 E gwp-asan-test-bin2: gwp-asan-test-bin : free end time
------------------------------------------------------------------------------------------
GWP-Asan malloc/free
06-05 11:57:03.326 15393 15393 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc start time
06-05 11:57:03.326 15393 15393 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc end time
06-05 11:57:03.326 15393 15393 E gwp-asan-test-bin2: gwp-asan-test-bin : free start time
06-05 11:57:03.326 15393 15393 E gwp-asan-test-bin2: gwp-asan-test-bin : free end time
06-05 11:58:11.613 15692 15692 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc start time
06-05 11:58:11.613 15692 15692 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc end time
06-05 11:58:11.613 15692 15692 E gwp-asan-test-bin2: gwp-asan-test-bin : free start time
06-05 11:58:11.613 15692 15692 E gwp-asan-test-bin2: gwp-asan-test-bin : free end time
06-05 11:58:15.299 15717 15717 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc start time
06-05 11:58:15.299 15717 15717 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc end time
06-05 11:58:15.299 15717 15717 E gwp-asan-test-bin2: gwp-asan-test-bin : free start time
06-05 11:58:15.299 15717 15717 E gwp-asan-test-bin2: gwp-asan-test-bin : free end time
06-05 11:58:17.366 15721 15721 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc start time
06-05 11:58:17.366 15721 15721 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc end time
06-05 11:58:17.366 15721 15721 E gwp-asan-test-bin2: gwp-asan-test-bin : free start time
06-05 11:58:17.366 15721 15721 E gwp-asan-test-bin2: gwp-asan-test-bin : free end time
06-05 11:58:42.906 15824 15824 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc start time
06-05 11:58:42.907 15824 15824 E gwp-asan-test-bin2: gwp-asan-test-bin : malloc end time
06-05 11:58:42.907 15824 15824 E gwp-asan-test-bin2: gwp-asan-test-bin : free start time
06-05 11:58:42.907 15824 15824 E gwp-asan-test-bin2: gwp-asan-test-bin : free end time
八、GWP-Asan缺陷
1)Android U上,Native进程启动时默认打开Normal GWP-Asan,app进程启动时默认打开Recoverable GWP-Asan,但是存在一定的进程抽样随机性,如果没被抽中,则一直无法启用Normal GWP-Asan或Recoverable GWP-Asan,需要等到下次系统重启或进程重启,才能进行重新抽样
2)Android U上,Native进程启动时默认打开Normal GWP-Asan,当触发GWP-Asan内存错误时,进程崩溃,影响用户体验
3)浪费内存且存在内存检测的局限性
GWP-Asan malloc分配不足1个page的内存,会自动分配1个page的内存,比较浪费。且默认32个page的内存分配完后,没有及时释放,就没法再分配了(即GWP-Asan对该进程检测失效),只能走原生的scudo malloc分配机制
4)右对齐之前的那一小段地址~Guard Page之间的内存被访问的话,无法被检测到,因为GWP-Asan是根据控制页的读写权限来检测内存访问是否合法
如下代码,通过GWP-Asan分配的内存会以1个page(4KB) 左(右)对齐,若内存以align up方式对齐:
void heapOverOrUnderFlow() {
char* ptr = new char[20];
// 越界访问,GWP-Asan检测不出来
ptr[20] = 'A';
// 越界访问,GWP-Asan检测不出来
ptr[4095] = 'A';
std::cout << "heapOverFlow !" << std::endl;
// 越界访问,GWP-Asan可以检测出来(访问了右边的Guard page区域)
ptr[4096] = 'A';
std::cout << "heapOverFlow !!" << std::endl;
}
经过测试验证发现,越界访问绿色区域,GWP-Asan检测不出来,需要改进或优化。
解决方案:可以在GWP-Asan malloc阶段对绿色区域写入指定的值,然后在GWP-Asan free阶段检测绿色区域的值是否发生变化,如果发生变化说明这块内存被破坏,触发GWP-Asan内存错误异常。
5)访问相邻的slot page检测不出来
如,char* ptr = (char*)malloc(20);申请分配20个字节的内存,不足一个page,GWP-Asan会分配1个page的内存(一个slot page内存页),并采用AlignUp方式对齐。若这个slot page内存页左右两边的slot page已被分配,状态属性会被设置为可访问,因此当越界访问到左右两边的slot page内存页区域,GWP-Asan检测不出来。
void heapCorruptionAlignUpAdjacent() {
char* ptr = (char*)malloc(20);
std::cout << "ptr = " << ptr << std::endl;
// 访问了右边的guard page区域,GWP-Asan可以检测出来
*(ptr+4096) = 'A';
// 访问了左边的guard page区域,GWP-Asan可以检测出来
*(ptr-1) = 'A';
// 访问了(右边)下一个slot page区域,GWP-Asan检测不出来
*(ptr+4096+4096) = 'A';
// 访问了(左边)上一个slot page区域,GWP-Asan检测不出来
*(ptr-4096-1) = 'A';
}
6)GWP-Asan malloc\free阶段会收集栈帧信息,影响性能
GWP-Asan malloc\free阶段会收集栈帧信息,有些场景下栈帧数量比较多(最大128),理论上存在一定的耗时,影响性能。这部分日志信息对分析问题影响不是很大,可以考虑在不同阶段打开或关闭。如研发和内测阶段打开,在量产阶段关闭,使得更加轻量级。
tombstone日志中,GWP-Asan malloc\free阶段收集的栈帧信息:
pid: 21398, tid: 21588, name: HwBinder:21398_ >>> /vendor/bin/vde_shadow
.....
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xb40000702de79fe0
Cause: [GWP-ASan]: Use After Free, 0 bytes into a 32-byte allocation at 0x702de79fe0
......
backtrace:
#00 pc 000000000004e938 /vendor/lib64/libmivhalclient.so (android::frameworks::automotive::vhal::HidlVhalClient::onBinderDied()+68) (BuildId: 10a0c1874f9c8a88fcfd3eed8e238e76)
#01 pc 000000000004b7b4 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::hidl_binder_death_recipient::binderDied(android::wp<android::hardware::IBinder> const&)+116) (BuildId: 053c4c807cfcb65b623a130634848051)
#02 pc 0000000000096a64 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::BpHwBinder::reportOneDeath(android::hardware::BpHwBinder::Obituary const&)+140) (BuildId: 053c4c807cfcb65b623a130634848051)
#03 pc 0000000000096984 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::BpHwBinder::sendObituary()+248) (BuildId: 053c4c807cfcb65b623a130634848051)
#04 pc 0000000000099dc4 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::IPCThreadState::executeCommand(int)+1004) (BuildId: 053c4c807cfcb65b623a130634848051)
#05 pc 000000000009987c /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::IPCThreadState::getAndExecuteCommand()+228) (BuildId: 053c4c807cfcb65b623a130634848051)
#06 pc 000000000009aa64 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::IPCThreadState::joinThreadPool(bool)+176) (BuildId: 053c4c807cfcb65b623a130634848051)
#07 pc 00000000000a5694 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::PoolThread::threadLoop()+28) (BuildId: 053c4c807cfcb65b623a130634848051)
#08 pc 00000000000145cc /apex/com.android.vndk.v34/lib64/libutils.so (android::Thread::_threadLoop(void*)+288) (BuildId: bae357fd422729c97fcc6f22c007b49c)
#09 pc 00000000000fd988 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#10 pc 0000000000096b24 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
deallocated by thread 21588:
#00 pc 000000000008c9c8 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 000000000008d204 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::deallocate(void*)+412) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#02 pc 000000000004d08c /vendor/lib64/libmivhalclient.so (android::frameworks::automotive::vhal::HidlVhalClient::~HidlVhalClient()+292) (BuildId: 10a0c1874f9c8a88fcfd3eed8e238e76)
#03 pc 000000000003930c /vendor/lib64/libmivhalclient.so (vendor::micar::hardware::vehicle::interface::MiVhalClient::~MiVhalClient()+116) (BuildId: 10a0c1874f9c8a88fcfd3eed8e238e76)
#04 pc 0000000000318a5c /vendor/bin/vde_shadow (std::__ndk1::__shared_count::__release_shared[abi:v170000]()+60) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#05 pc 00000000003189fc /vendor/bin/vde_shadow (std::__ndk1::__shared_weak_count::__release_shared[abi:v170000]()+24) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#06 pc 0000000000316bc4 /vendor/bin/vde_shadow (std::__ndk1::shared_ptr<vendor::micar::hardware::vehicle::interface::IMiVhalClient>::~shared_ptr[abi:v170000]()+52) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#07 pc 0000000000318748 /vendor/bin/vde_shadow (std::__ndk1::shared_ptr<vendor::micar::hardware::vehicle::interface::IMiVhalClient>::reset[abi:v170000]()+68) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#08 pc 0000000000318670 /vendor/bin/vde_shadow (VHalStub::onBinderDied()+436) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#09 pc 000000000031d6e0 /vendor/bin/vde_shadow (VHalStub::init()::$_0::operator()() const+24) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#10 pc 000000000031d6b8 /vendor/bin/vde_shadow (decltype(std::declval<VHalStub::init()::$_0&>()()) std::__ndk1::__invoke[abi:v170000]<VHalStub::init()::$_0&>(VHalStub::init()::$_0&)+20) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#11 pc 000000000031d670 /vendor/bin/vde_shadow (void std::__ndk1::__invoke_void_return_wrapper<void, true>::__call<VHalStub::init()::$_0&>(VHalStub::init()::$_0&)+20) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#12 pc 000000000031d64c /vendor/bin/vde_shadow (std::__ndk1::__function::__alloc_func<VHalStub::init()::$_0, std::__ndk1::allocator<VHalStub::init()::$_0>, void ()>::operator()[abi:v170000]()+24) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#13 pc 000000000031c6e4 /vendor/bin/vde_shadow (std::__ndk1::__function::__func<VHalStub::init()::$_0, std::__ndk1::allocator<VHalStub::init()::$_0>, void ()>::operator()()+24) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#14 pc 000000000004e964 /vendor/lib64/libmivhalclient.so (android::frameworks::automotive::vhal::HidlVhalClient::onBinderDied()+112) (BuildId: 10a0c1874f9c8a88fcfd3eed8e238e76)
#15 pc 000000000004b7b4 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::hidl_binder_death_recipient::binderDied(android::wp<android::hardware::IBinder> const&)+116) (BuildId: 053c4c807cfcb65b623a130634848051)
#16 pc 0000000000096a64 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::BpHwBinder::reportOneDeath(android::hardware::BpHwBinder::Obituary const&)+140) (BuildId: 053c4c807cfcb65b623a130634848051)
#17 pc 0000000000096984 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::BpHwBinder::sendObituary()+248) (BuildId: 053c4c807cfcb65b623a130634848051)
#18 pc 0000000000099dc4 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::IPCThreadState::executeCommand(int)+1004) (BuildId: 053c4c807cfcb65b623a130634848051)
#19 pc 000000000009987c /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::IPCThreadState::getAndExecuteCommand()+228) (BuildId: 053c4c807cfcb65b623a130634848051)
#20 pc 000000000009aa64 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::IPCThreadState::joinThreadPool(bool)+176) (BuildId: 053c4c807cfcb65b623a130634848051)
#21 pc 00000000000a5694 /apex/com.android.vndk.v34/lib64/libhidlbase.so (android::hardware::PoolThread::threadLoop()+28) (BuildId: 053c4c807cfcb65b623a130634848051)
#22 pc 00000000000145cc /apex/com.android.vndk.v34/lib64/libutils.so (android::Thread::_threadLoop(void*)+288) (BuildId: bae357fd422729c97fcc6f22c007b49c)
#23 pc 00000000000fd988 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#24 pc 0000000000096b24 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#25 pc 0000000000000000 <unknown>
allocated by thread 21405:
#00 pc 000000000008c9c8 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#01 pc 000000000008d03c /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::allocate(unsigned long, unsigned long)+600) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#02 pc 0000000000050c1c /apex/com.android.runtime/lib64/bionic/libc.so ((anonymous namespace)::gwp_asan_malloc(unsigned long)+172) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#03 pc 0000000000051608 /apex/com.android.runtime/lib64/bionic/libc.so (malloc+84) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#04 pc 00000000006495c8 /vendor/bin/vde_shadow (operator new(unsigned long)+28) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#05 pc 00000000000461d0 /vendor/lib64/libmivhalclient.so (std::__1::pair<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::shared_ptr<std::__1::function<void ()> >, void*>*>, bool> std::__1::__hash_table<std::__1::shared_ptr<std::__1::function<void ()> >, std::__1::hash<std::__1::shared_ptr<std::__1::function<void ()> > >, std::__1::equal_to<std::__1::shared_ptr<std::__1::function<void ()> > >, std::__1::allocator<std::__1::shared_ptr<std::__1::function<void ()> > > >::__emplace_unique_key_args<std::__1::shared_ptr<std::__1::function<void ()> >, std::__1::shared_ptr<std::__1::function<void ()> > const&>(std::__1::shared_ptr<std::__1::function<void ()> > const&, std::__1::shared_ptr<std::__1::function<void ()> > const&)+300) (BuildId: 10a0c1874f9c8a88fcfd3eed8e238e76)
#06 pc 000000000004dbac /vendor/lib64/libmivhalclient.so (android::frameworks::automotive::vhal::HidlVhalClient::addOnBinderDiedCallback(std::__1::shared_ptr<std::__1::function<void ()> >)+56) (BuildId: 10a0c1874f9c8a88fcfd3eed8e238e76)
#07 pc 0000000000038d3c /vendor/lib64/libmivhalclient.so (vendor::micar::hardware::vehicle::interface::MiVhalClient::addOnBinderDiedCallback(std::__1::shared_ptr<std::__1::function<void ()> >)+252) (BuildId: 10a0c1874f9c8a88fcfd3eed8e238e76)
#08 pc 0000000000316d58 /vendor/bin/vde_shadow (VHalStub::init()+140) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#09 pc 0000000000316c00 /vendor/bin/vde_shadow (VHalStub::start()+40) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#10 pc 0000000000327374 /vendor/bin/vde_shadow (VHalStub::qt_static_metacall(QObject*, QMetaObject::Call, int, void**)+148) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#11 pc 0000000000405c88 /vendor/bin/vde_shadow (QObject::event(QEvent*)+484) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#12 pc 00000000003c6510 /vendor/bin/vde_shadow (QCoreApplication::notifyInternal(QObject*, QEvent*)+120) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#13 pc 00000000003c7100 /vendor/bin/vde_shadow (QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*)+644) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#14 pc 0000000000418710 /vendor/bin/vde_shadow (QEventDispatcherUNIX::processEvents(QFlags<QEventLoop::ProcessEventsFlag>)+56) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#15 pc 00000000003c3d20 /vendor/bin/vde_shadow (QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>)+524) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#16 pc 000000000032a57c /vendor/bin/vde_shadow (QThread::exec()+188) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#17 pc 000000000032c500 /vendor/bin/vde_shadow (QThreadPrivate::start(void*)+332) (BuildId: 438556fdf22bbaf8fe163f7cea70089e6aae66bf)
#18 pc 00000000000fd988 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#19 pc 0000000000096b24 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68) (BuildId: 4bdefc10cfc9decb2a98e2727eadcc4a)
#20 pc 0000000000000000 <unknown>
7)use-after-free存在漏检的风险
use-after-free检测依赖内存页的状态属性,一段时间后这块内存被其他模块重新分配后,内存页属性被改为可访问,当前模块再去访问这块内存时,则无法检测出use-after-ree的错误
九、总结
1)Android 12上,native demo进程可以通过调用android_mallopt方法启用GWP-Asan功能
可以检测到的内存错误类型:
a)use-after-free
b)double free
c)out-of-bounds (BUFFER_OVERFLOW + BUFFER_UNDERFLOW)
d) invaild-free
2)Android 12上,app可以通过配置AndroidManifest.xml方式启用GWP-Asan功能,最终通过zygote调用android_mallopt方法启用。Android 14及以上版本,默认所有的app启用可恢复的GWP-Asan功能。
3)即使native demo进程或app启用GWP-Asan功能,GWP-Asan采用随机数方式进行内存分配检测,如有可能进程第1000次malloc才会被GWP-Asan检测到,即调用GWP-Asan alloc\free.
4)一个进程启用GWP-Asan功能,初始化阶段默认需要申请67个page的虚拟内存,即268KB