前言
这个问题在来小米之前就遇到并解决过,当时的解决方案与朴老师的初步解决方案一样,本文在之前的初步分析结果之上进一步进行了深入分析,最终得出了当前看起来相对合理并符合原来架构设计的最终方案。
文中引用了朴老师抓的backtrace,同时在进一步分析的过程中朴老师也提出的大量有建设性的问题,感谢朴老师!
一、问题现象
1、systemui高频崩溃
2、system server崩溃导致重启
二、解决方案
通过初步分析、深入分析(具体分析过程和关键代码及log在下面)我们知道了问题的原因:
1、_CompressedAsset::getBuffer不是线程安全的
2、多个AssetManager一起share同一个_CompressedAsset对象,而不同的AssetManager在对_CompressedAsset进行get和set时存在多线程并发竞争同一资源的场景
3、多线程对_CompressedAsset进行get和set时没有用同一个lock同步
4、两个线程同时走进_CompressedAsset::getBuffer并发生double free
针对问题的根本原因,给出以下解决方案:
对临界资源的get和set用同一个lock同步,有序访问。
patch已同步提交给AOSP review并merged,patch详情请猛击:https://android-review.googlesource.com/#/c/271434/
三、初步分析
1、查看崩溃时的backtrace,发现是对象析构时free memory时出问题了,为什么?
backtrace:
#00 pc 0003d546 /system/lib/libc.so (arena_run_dalloc+97)
#01 pc 0003e5e9 /system/lib/libc.so (je_arena_dalloc_large+24)
#02 pc 000463d9 /system/lib/libc.so (ifree+700)
#03 pc 0000fa13 /system/lib/libc.so (free+10)
#04 pc 00018f15 /system/lib/libandroidfw.so (android::StreamingZipInflater::~StreamingZipInflater()+44)
#05 pc 0000dc3f /system/lib/libandroidfw.so (android::_CompressedAsset::getBuffer(bool)+62)
#06 pc 000177c3 /system/lib/libandroidfw.so (android::ResTable::add(android::Asset*, android::Asset*, int, bool)+22)
#07 pc 00010a33 /system/lib/libandroidfw.so (android::AssetManager::appendPathToResTable(android::AssetManager::asset_path const&, unsigned int*) const+270)
#08 pc 00010e5f /system/lib/libandroidfw.so (android::AssetManager::getResTable(bool) const+102)
#09 pc 00080935 /system/lib/libandroid_runtime.so
#10 pc 0001a963 /data/dalvik-cache/arm/system@framework@boot.oat
之前抓的system server double free crash时的coredump中的thread trace:
2、查看backtrace对应的源码发现是在_CompressedAsset::getBuffer函数中的delete mZipInflater;时挂掉的,为什么?
/*
* Get a pointer to a read-only buffer of data.
*
* The first time this is called, we expand the compressed data into a
* buffer.
*/
const void* _CompressedAsset::getBuffer(bool)
{
unsigned char* buf = NULL;
if (mBuf != NULL)
return mBuf;
/*
* Allocate a buffer and read the file into it.
*/
buf = new unsigned char[mUncompressedLen];
if (buf == NULL) {
ALOGW("alloc %ld bytes failed\n", (long) mUncompressedLen);
goto bail;
}
if (mMap != NULL) {
if (!ZipUtils::inflateToBuffer(mMap->getDataPtr(), buf,
mUncompressedLen, mCompressedLen))
goto bail;
} else {
assert(mFd >= 0);
/*
* Seek to the start of the compressed data.
*/
if (lseek(mFd, mStart, SEEK_SET) != mStart)
goto bail;
/*
* Expand the data into it.
*/
if (!ZipUtils::inflateToBuffer(mFd, buf, mUncompressedLen,
mCompressedLen))
goto bail;
}
/*
* Success - now that we have the full asset in RAM we
* no longer need the streaming inflater
*/
delete mZipInflater;
mZipInflater = NULL;
mBuf = buf;
buf = NULL;
bail:
delete[] buf;
return mBuf;
}
3、继续查看源码,mZipInflater在_CompressedAsset构造时会被赋值为NULL,delete NULL不会有问题,也就是说mZipInflater一定被创建了,继续找代码发现是在openChunk函数中被创建的
/*
* Constructor.
*/
_CompressedAsset::_CompressedAsset(void)
: mStart(0), mCompressedLen(0), mUncompressedLen(0), mOffset(0),
mMap(NULL), mFd(-1), mZipInflater(NULL), mBuf(NULL)
{
}
/*
* Open a chunk of compressed data in a mapped region.
*
* Nothing is expanded until the first read call.
*/
status_t _CompressedAsset::openChunk(FileMap* dataMap, size_t uncompressedLen)
{
assert(mFd < 0); // no re-open
assert(mMap == NULL);
assert(dataMap != NULL);
mMap = dataMap;
mStart = -1; // not used
mCompressedLen = dataMap->getDataLength();
mUncompressedLen = uncompressedLen;
a