Native内存泄漏一步查

最新推荐文章于 2024-05-06 14:09:16 发布

程序员一东

最新推荐文章于 2024-05-06 14:09:16 发布

阅读量260

点赞数

分类专栏： Android 文章标签： android Native

本文链接：https://blog.csdn.net/Eqiqi/article/details/132456385

版权

Android 专栏收录该内容

144 篇文章 12 订阅

订阅专栏

虚拟内存也会被耗尽

作为Android开发者的我们，一定经历过APP从32位从64位架构的切换。目前国内市场还是存在32架构的要求的，并没有全面禁止，32位架构有一个缺点是，可分配给用户空间的虚拟内存太少了（一般一半留给内核空间，可配置）所以往往导致虚拟内存不足引发OOM。切换成64位架构后，在ARM64上，4kb的页大小情况下默认能分配给进程的虚拟内存大小是2^39次方，其实64位并不能完全分配完，但是39次方这个量级依旧比32位可用的虚拟内存大小大得多，因此往往我们升级为64位架构适配后，虚拟内存不足的问题会被缓解，这里比较有意思的是，只是缓解，如果你的应用是长时间存在的话，依旧会触发到因为虚拟内存不足导致的OOM，即使虚拟内存很大了，比如存在大量虚拟内存泄漏的情况。

我们可以看到的例子，比如mmap分配失败了，因为native Thread创建需要mmap创建一层栈空间，又或者是其他调用mmap分配内存时失败

java.lang.OutOfMemoryError: Could not allocate JNI Env: Failed anonymous mmap(0x0, 8192, 0x34, 0x220, -1, 0): Out of memory. See process maps in the log.
        at java.lang.Thread.nativeCreate(Thread.java)
        at java.lang.Thread.start(Thread.java:733)
        at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:975)
        at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1043)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1185)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
        at java.lang.Thread.run(Thread.java:764)

因此，找到罪魁祸首的手段非常重要，实际上，由于ART虚拟机本身引起的native 内存泄漏也不少，因此Android团队在 Android N 之后系统增加了 libmemunreachable 模块，用于检测native内存泄漏。

libmemunreachable

MemUnreachable.cpp中，提供了一个获取不可达内存地址的方法，GetUnreachableMemory

GetUnreachableMemory(UnreachableMemoryInfo& info, size_t limit)

通过GetUnreachableMemory，我们可以在info函数中获取到不可达地址的信息。

384 bytes in 9 allocations unreachable out of 20003960 bytes in 40784 allocations
  384 bytes in 9 unreachable allocations
  ABI: 'arm64'
  320 bytes unreachable at 7e879d09c0
  8 bytes unreachable at 7e8d891160
  8 bytes unreachable at 7e8d9fec78
  ....

可以看到，获取的信息还是比较多的，比如泄漏的大小，泄漏的地址都能找到，当然，这个so大部分是用于ART自检的，如果我们也想用怎么办，比如我们想监控自己的APP有没有产生native内存泄漏怎么办？别急，方法有的是！在使用前，我们会讲一下原理，如果不感兴趣可以直接调到使用小节。

我们来想一下，如果我们要做内存泄漏检测，一般怎么做？

第一步就是要检测内存是否可达对吧，这一步很重要，比如我们Java堆内存检测内存是否可达，其实就是从一些gc root 出发如果能找到存在对象到gc root的引用链，就证明该内存有被使用，否则就是不可达内存，会被虚拟机GC回收。

对于Native层来说也是一样的，检测内存是否可达也是从Root内存出发（Root是当前正在被使用的内存，比如与虚拟机Heap有联系的内存，或者线程栈范围内的内存），然后判断是不是泄漏内存，其实Native判断会比Java层判断要简单，因为Native 内存只要不存在Root引用链的内存，一定是泄漏内存，因为Native层里面可没有像Java一样的GC机制，如果没有被释放，它是一直存在的，这点需要注意，所以我们找泄漏内存，就变成了找不可达内存即可

下面我们进入源码的分析

GetUnreachableMemory

bool GetUnreachableMemory(UnreachableMemoryInfo& info, size_t limit) {
    if (info.version > 0) {
        MEM_ALOGE("unsupported UnreachableMemoryInfo.version %zu in GetUnreachableMemory",
                  info.version);
        return false;
    }

    int parent_pid = getpid();
    int parent_tid = gettid();

    Heap heap;

    AtomicState<State> state(STARTING);
    LeakPipe pipe;

    PtracerThread thread{[&]() -> int {
        /
        // Collection thread
        /
        MEM_ALOGI("collecting thread info for process %d...", parent_pid);

        if (!state.transition_or(STARTING, PAUSING, [&] {
            MEM_ALOGI("collecting thread expected state STARTING, aborting");
            return ABORT;
        })) {
            return 1;
        }

        ThreadCapture thread_capture(parent_pid, heap);
        allocator::vector<ThreadInfo> thread_info(heap);
        allocator::vector<Mapping> mappings(heap);
        allocator::vector<uintptr_t> refs(heap);
        这里主要做一些自检
        // ptrace all the threads
        if (!thread_capture.CaptureThreads()) {
            state.set(ABORT);
            return 1;
        }

        // collect register contents and stacks
        if (!thread_capture.CapturedThreadInfo(thread_info)) {
            state.set(ABORT);
            return 1;
        }

        // snapshot /proc/pid/maps
        if (!ProcessMappings(parent_pid, mappings)) {
            state.set(ABORT);
            return 1;
        }

        if (!BinderReferences(refs)) {
            state.set(ABORT);
            return 1;
        }

        // Atomically update the state from PAUSING to COLLECTING.
        // The main thread may have given up waiting for this thread to finish
        // pausing, in which case it will have changed the state to ABORT.
        if (!state.transition_or(PAUSING, COLLECTING, [&] {
            MEM_ALOGI("collecting thread aborting");
            return ABORT;
        })) {
            return 1;
        }

        // malloc must be enabled to call fork, at_fork handlers take the same
        // locks as ScopedDisableMalloc.  All threads are paused in ptrace, so
        // memory state is still consistent.  Unfreeze the original thread so it
        // can drop the malloc locks, it will block until the collection thread
        // exits.
        thread_capture.ReleaseThread(parent_tid);
       
        因为存在耗时，所以fork子进程去处理检测
        // fork a process to do the heap walking
        int ret = fork();
        if (ret < 0) {
            return 1;
        } else if (ret == 0) {
            /
            // Heap walker process
            /
            // Examine memory state in the child using the data collected above and
            // the CoW snapshot of the process memory contents.

            if (!pipe.OpenSender()) {
                _exit(1);
            }

            MemUnreachable unreachable{parent_pid, heap};
           这里很关键，是分析的开始，这里注意参数，是Root的起点
            if (!unreachable.CollectAllocations(thread_info, mappings, refs)) {
                _exit(2);
            }
            size_t num_allocations = unreachable.Allocations();
            size_t allocation_bytes = unreachable.AllocationBytes();

            allocator::vector<Leak> leaks{heap};

            size_t num_leaks = 0;
            size_t leak_bytes = 0;
            前面配置好Root 后，就发起查找GetUnreachableMemory
            bool ok = unreachable.GetUnreachableMemory(leaks, limit, &num_leaks, &leak_bytes);
            检测完通过管道pipe通知到父进程即可
            ok = ok && pipe.Sender().Send(num_allocations);
            ok = ok && pipe.Sender().Send(allocation_bytes);
            ok = ok && pipe.Sender().Send(num_leaks);
            ok = ok && pipe.Sender().Send(leak_bytes);
            ok = ok && pipe.Sender().SendVector(leaks);

            if (!ok) {
                _exit(3);
            }

            _exit(0);
        } else {
            // Nothing left to do in the collection thread, return immediately,
            // releasing all the captured threads.
            MEM_ALOGI("collection thread done");
            return 0;
        }
    }};

    /
    // Original thread
    /

    {
        // Disable malloc to get a consistent view of memory
        ScopedDisableMalloc disable_malloc;

        // Start the collection thread
        thread.Start();
        如果等待超时会abort
        // Wait for the collection thread to signal that it is ready to fork the
        // heap walker process.
        if (!state.wait_for_either_of(COLLECTING, ABORT, 30s)) {
            // The pausing didn't finish within 30 seconds, attempt to atomically
            // update the state from PAUSING to ABORT.  The collecting thread
            // may have raced with the timeout and already updated the state to
            // COLLECTING, in which case aborting is not necessary.
            if (state.transition(PAUSING, ABORT)) {
                MEM_ALOGI("main thread timed out waiting for collecting thread");
            }
        }

        // Re-enable malloc so the collection thread can fork.
    }

    // Wait for the collection thread to exit
    int ret = thread.Join();
    if (ret != 0) {
        return false;
    }

    // Get a pipe from the heap walker process.  Transferring a new pipe fd
    // ensures no other forked processes can have it open, so when the heap
    // walker process dies the remote side of the pipe will close.
    if (!pipe.OpenReceiver()) {
        return false;
    }
    通过管道接受子进程处理好的数据，然后返回
    bool ok = true;
    ok = ok && pipe.Receiver().Receive(&info.num_allocations);
    ok = ok && pipe.Receiver().Receive(&info.allocation_bytes);
    ok = ok && pipe.Receiver().Receive(&info.num_leaks);
    ok = ok && pipe.Receiver().Receive(&info.leak_bytes);
    ok = ok && pipe.Receiver().ReceiveVector(info.leaks);
    if (!ok) {
        return false;
    }

    MEM_ALOGI("unreachable memory detection done");
    MEM_ALOGE("%zu bytes in %zu allocation%s unreachable out of %zu bytes in %zu allocation%s",
              info.leak_bytes, info.num_leaks, plural(info.num_leaks), info.allocation_bytes,
              info.num_allocations, plural(info.num_allocations));
    return true;
}

GetUnreachableMemory其实是一个入口方法，通过fork子进程去内存泄漏探测，原因是当前进程会继续分配内存，如果需要分析会导致进程被阻塞，因为涉及到线程的挂起等操作，所以会通过子进程去分析。子进程分析后，通过管道pipe的方式写回数据即可。这里我们重点标记一下CollectAllocations方法

Root对象

在介绍CollectAllocations前，我们有必要知道，Root对象是怎么被加进来的，刚才我们也讲过，从Root引用链出发，不可达的内存才是泄漏内存，那么Root的选取就非常关键了。添加Root的方法如下

void HeapWalker::Root(uintptr_t begin, uintptr_t end) {
    roots_.push_back(Range{begin, end});
}

void HeapWalker::Root(const allocator::vector<uintptr_t>& vals) {
    root_vals_.insert(root_vals_.end(), vals.begin(), vals.end());
}

这里我们就明白了CollectAllocations下一步，应该就是要添加Root对象以及触发检测了

CollectAllocations

bool MemUnreachable::CollectAllocations(const allocator::vector<ThreadInfo>& threads,
                                        const allocator::vector<Mapping>& mappings,
                                        const allocator::vector<uintptr_t>& refs) {
                                       
    MEM_ALOGI("searching process %d for allocations", pid_);

    for (auto it = mappings.begin(); it != mappings.end(); it++) {
        heap_walker_.Mapping(it->begin, it->end);
    }
    同样做自检
    allocator::vector<Mapping> heap_mappings{mappings};
    allocator::vector<Mapping> anon_mappings{mappings};
    allocator::vector<Mapping> globals_mappings{mappings};
    allocator::vector<Mapping> stack_mappings{mappings};
    if (!ClassifyMappings(mappings, heap_mappings, anon_mappings, globals_mappings, stack_mappings)) {
        return false;
    }
 
    for (auto it = heap_mappings.begin(); it != heap_mappings.end(); it++) {
        MEM_ALOGV("Heap mapping %" PRIxPTR "-%" PRIxPTR " %s", it->begin, it->end, it->name);
       
        HeapIterate(*it,
                    [&](uintptr_t base, size_t size) { heap_walker_.Allocation(base, base + size); });
    }

    for (auto it = anon_mappings.begin(); it != anon_mappings.end(); it++) {
        MEM_ALOGV("Anon mapping %" PRIxPTR "-%" PRIxPTR " %s", it->begin, it->end, it->name);
        打上地址标记
        heap_walker_.Allocation(it->begin, it->end);
    }

    for (auto it = globals_mappings.begin(); it != globals_mappings.end(); it++) {
        MEM_ALOGV("Globals mapping %" PRIxPTR "-%" PRIxPTR " %s", it->begin, it->end, it->name);
        设置map地址为root
        heap_walker_.Root(it->begin, it->end);
    }

    for (auto thread_it = threads.begin(); thread_it != threads.end(); thread_it++) {
        for (auto it = stack_mappings.begin(); it != stack_mappings.end(); it++) {
            if (thread_it->stack.first >= it->begin && thread_it->stack.first <= it->end) {
                MEM_ALOGV("Stack %" PRIxPTR "-%" PRIxPTR " %s", thread_it->stack.first, it->end, it->name);
                当前有效线程的栈地址 作为root
                heap_walker_.Root(thread_it->stack.first, it->end);
            }
        }
        
        heap_walker_.Root(thread_it->regs);
    }
    heap相关地址设置为root
    heap_walker_.Root(refs);

    MEM_ALOGI("searching done");

    return true;
}

DetectLeaks

配置好Root后，我们再回到GetUnreachableMemory里面的子进程处理逻辑，里面会有这么一段代码 bool ok = unreachable.GetUnreachableMemory(leaks, limit, &num_leaks, &leak_bytes);

这里就是配置好了Root，触发检测泄漏了

bool HeapWalker::DetectLeaks() {
    // Recursively walk pointers from roots to mark referenced allocations
    for (auto it = roots_.begin(); it != roots_.end(); it++) {
        查找是否存在与Root的引用链
        RecurseRoot(*it);
    }

    Range vals;
    vals.begin = reinterpret_cast<uintptr_t>(root_vals_.data());
    vals.end = vals.begin + root_vals_.size() * sizeof(uintptr_t);

    RecurseRoot(vals);

    if (segv_page_count_ > 0) {
        MEM_ALOGE("%zu pages skipped due to segfaults", segv_page_count_);
    }

    return true;
}

查找地址与Root的联系

void HeapWalker::RecurseRoot(const Range& root) {
  allocator::vector<Range> to_do(1, root, allocator_);
  while (!to_do.empty()) {
    Range range = to_do.back();
    to_do.pop_back();

    walking_range_ = range;
    ForEachPtrInRange(range, [&](Range& ref_range, AllocationInfo* ref_info) {
      if (!ref_info->referenced_from_root) {
        如果能在有效地址找到，那么证明这个地址属于有效引用，标记为true
        ref_info->referenced_from_root = true;
        to_do.push_back(ref_range);
      }
    });
    walking_range_ = Range{0, 0};
  }
}

之后就是把泄漏地址写入的过程了，在前面源码有解释，我们就不再赘述了

使用libmemunreachable

虽然是系统so，但是也不妨碍我们使用这个方法去获取泄漏内存，我们只需要通过dlsym与符号，就能够调用GetUnreachableMemory方法。

GetUnreachableMemory符号在Android不同版本也有点不一样

大于api 26符号是

_ZN7android26GetUnreachableMemoryStringEbm

小于api 26 但大于等于24的符号为

_Z26GetUnreachableMemoryStringbm

因此我们直接通过符号调用即可，因为Android 7之后dlopen 有一定限制，这里我们直接采用shadowhook_dlopen去打开即可（当然我们也可以通过一些其他手段，比如模拟内建函数发起，这里不细说，我们之前在这片文章说过）

void *handle = shadowhook_dlopen("libmemunreachable.so");
void *func;
if (android_get_device_api_level() > __ANDROID_API_O__) {
    func = shadowhook_dlsym(handle,
                            "_ZN7android26GetUnreachableMemoryStringEbm");
} else {
    func = shadowhook_dlsym(handle,
                            "_Z26GetUnreachableMemoryStringbm");
}

std::string result = ((std::string (*)(bool , size_t )) func)(false, 1024);
__android_log_print(ANDROID_LOG_ERROR, "hello", "%s", result.c_str());
return result;

当然，使用这个函数前提，我们还需要通过prctl调用把DUMPABLE设置为1，因为分析数据采用了ptrace，因此这个标识是必须的

if (prctl(PR_SET_DUMPABLE, 1, 0, 0, 0) == -1) {
    return unreachable_mem;
}

当然，因为我们拿到的是一串字符串，如果我们只想要里面的大小与地址信息，我们还需要通过正则表达式提取出来有效的内容，内容如下

384 bytes in 9 allocations unreachable out of 20003960 bytes in 40784 allocations
  384 bytes in 9 unreachable allocations
  ABI: 'arm64'
  320 bytes unreachable at 7e879d09c0
  8 bytes unreachable at 7e8d891160
  8 bytes unreachable at 7e8d9fec78
  ....

比如我们只想要的数据是320 bytes unreachable at 7e879d09c0 ，这一行的 320 与7e879d09c0，我们可以通过以下代码匹配

regex_t reg;
regmatch_t match[1];
匹配有效行
char *pattern = "[0-9]+ bytes unreachable at [A-Za-z0-9]+";

if (regcomp(&reg, pattern, REG_EXTENDED) != 0) {
    printf("regcomp error\n");
    return 1;
}


while (regexec(&reg, unreachable_memory, 1, match, 0) == 0) {
    __android_log_print(ANDROID_LOG_ERROR, "hello",
                        "Match found at position %zd, length %ld: %.*s\n", match[0].rm_so,
                        match[0].rm_eo - match[0].rm_so, match[0].rm_eo - match[0].rm_so,
                        unreachable_memory + match[0].rm_so);
    char result[100] = {""};
    strncpy(result, unreachable_memory + match[0].rm_so, match[0].rm_eo - match[0].rm_so);
    __android_log_print(ANDROID_LOG_ERROR, "hello", "裁剪字符串为 %s", result);
    // 不关心字符串部分，只关心数字部分
    unsigned long addr = strtoul(strrchr(result, ' ') + 1, NULL, 16);
    unsigned long size = strtoul(result, NULL, 10);
    __android_log_print(ANDROID_LOG_ERROR, "hello", "裁剪字符串size %lu %lu", size, addr);
    unreachable_memory += match[0].rm_eo;

    uint64_t leak = addr + size;
    __android_log_print(ANDROID_LOG_ERROR, "hello", "leak is %lu", leak);

}
regfree(&reg);

总结

到这里，我们就能通过libmemunreachable找到泄漏的内存地址以及大小了，当然，这里的信息可能还不够，比如想获取泄漏的堆栈信息等等，这个时候就需要我们去hook 一些分配函数了，比如malloc mmap等，这里我就不给出了，emmm，有机会我会填完这个坑！

最后

如果想要成为架构师或想突破20~30K薪资范畴，那就不要局限在编码，业务，要会选型、扩展，提升编程思维。此外，良好的职业规划也很重要，学习的习惯很重要，但是最重要的还是要能持之以恒，任何不能坚持落实的计划都是空谈。

如果你没有方向，这里给大家分享一套由阿里高级架构师编写的《Android八大模块进阶笔记》，帮大家将杂乱、零散、碎片化的知识进行体系化的整理，让大家系统而高效地掌握Android开发的各个知识点。
在这里插入图片描述
相对于我们平时看的碎片化内容，这份笔记的知识点更系统化，更容易理解和记忆，是严格按照知识体系编排的。

全套视频资料：

一、面试合集

在这里插入图片描述
二、源码解析合集

三、开源框架合集
在这里插入图片描述
欢迎大家一键三连支持，若需要文中资料，直接点击文末CSDN官方认证微信卡片免费领取↓↓↓

程序员一东

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Native内存泄漏一步查

到这里，我们就能通过libmemunreachable找到泄漏的内存地址以及大小了，当然，这里的信息可能还不够，比如想获取泄漏的堆栈信息等等，这个时候就需要我们去hook 一些分配函数了，比如malloc mmap等，这里我就不给出了，emmm，有机会我会填完这个坑！如果想要成为架构师或想突破20~30K薪资范畴，那就不要局限在编码，业务，要会选型、扩展，提升编程思维。此外，良好的职业规划也很重要，学习的习惯很重要，但是最重要的还是要能持之以恒，任何不能坚持落实的计划都是空谈。
复制链接

扫一扫