我们先看一下这个ANR的主线程状态:
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x7569ab08 self=0x77afe14c00
| sysTid=3188 nice=0 cgrp=default sched=0/0 handle=0x78359a6550
| state=S schedstat=( 4907584566 6966621388 17409 ) utm=341 stm=149 core=0 HZ=100
| stack=0x7fd97db000-0x7fd97dd000 stackSize=8MB
| held mutexes=
kernel: (couldn't read /proc/self/task/3188/stack)
native: #00 pc 000000000001f06c /system/lib64/libc.so (syscall+28)
native: #01 pc 00000000000221d0 /system/lib64/libc.so (__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+140)
native: #02 pc 0000000000080f4c /system/lib64/libc.so (pthread_cond_wait+60)
native: #03 pc 000000000047f1fc /system/lib64/libhwui.so (android::uirenderer::renderthread::DrawFrameTask::postAndWait()+260)
native: #04 pc 000000000047f0c8 /system/lib64/libhwui.so (android::uirenderer::renderthread::DrawFrameTask::drawFrame()+44)
at android.view.ThreadedRenderer.nSyncAndDrawFrame(Native method)
at android.view.ThreadedRenderer.draw(ThreadedRenderer.java:823)
at android.view.ViewRootImpl.draw(ViewRootImpl.java:3312)
at android.view.ViewRootImpl.performDraw(ViewRootImpl.java:3116)
at android.view.ViewRootImpl.performTraversals(ViewRootImpl.java:2481)
at android.view.ViewRootImpl.doTraversal(ViewRootImpl.java:1457)
at android.view.ViewRootImpl$TraversalRunnable.run(ViewRootImpl.java:7202)
at android.view.Choreographer$CallbackRecord.run(Choreographer.java:949)
at android.view.Choreographer.doCallbacks(Choreographer.java:761)
at android.view.Choreographer.doFrame(Choreographer.java:696)
at android.view.Choreographer$FrameDisplayEventReceiver.run(Choreographer.java:935)
at android.os.Handler.handleCallback(Handler.java:873)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:193)
at android.app.ActivityThread.main(ActivityThread.java:6734)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:506)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
从主线程的堆栈我们可以看出,应用block在了renderthread的postAndWait方法,我们接着看一下这个方法的实现:
void DrawFrameTask::postAndWait() {
AutoMutex _lock(mLock);
mRenderThread->queue().post([this]() { run(); });
mSignal.wait(mLock);
}
从这里我们可以看出是mSignal这个在等待锁释放,进一步搜索发现mSignal是在unblockUiThread方法中释放的
void DrawFrameTask::unblockUiThread() {
AutoMutex _lock(mLock);
mSignal.signal();
}
那么unblockUiThread方法是什么时候调用的呢?搜索代码发现该方法在DrawFrameTask的run方法中调用
void DrawFrameTask::run() {
ATRACE_NAME("DrawFrame");
bool canUnblockUiThread;
bool canDrawThisFrame;
{
TreeInfo info(TreeInfo::MODE_FULL, *mContext);
canUnblockUiThread = syncFrameState(info);
canDrawThisFrame = info.out.canDrawThisFrame;
if (mFrameCompleteCallback) {
mContext->addFrameCompleteListener(std::move(mFrameCompleteCallback));
mFrameCompleteCallback = nullptr;
}
}
// Grab a copy of everything we need
CanvasContext* context = mContext;
std::function<void(int64_t)> callback = std::move(mFrameCallback);
mFrameCallback = nullptr;
// From this point on anything in "this" is *UNSAFE TO ACCESS*
if (canUnblockUiThread) {
unblockUiThread();
}
// Even if we aren't drawing this vsync pulse the next frame number will still be accurate
if (CC_UNLIKELY(callback)) {
context->enqueueFrameWork([callback, frameNr = context->getFrameNumber()]() {
callback(frameNr);
});
}
if (CC_LIKELY(canDrawThisFrame)) {
context->draw();
} else {
// wait on fences so tasks don't overlap next frame
context->waitOnFences();
}
if (!canUnblockUiThread) {
unblockUiThread();
}
}
根据postAndWait方法我们可以看到run方法实际是在renderthread中的消息队列执行的,这里的消息队列可以理解为和Android Java层中的Handler那个消息队列差不多。因此我们可以得出结论主线程被DrawFrameTask::postAndWait方法block,是因为上一次postAndWait放入到renderthread消息队列的run方法并未执行,所以下一步我们需要找出renderthread这个线程的堆栈信息,看看这个线程在干什么
"RenderThread" daemon prio=7 tid=19 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x13280ed8 self=0x778f662c00
| sysTid=3558 nice=0 cgrp=default sched=0/0 handle=0x778e87f4f0
| state=R schedstat=( 36485428771 60660326144 33388 ) utm=1153 stm=2495 core=2 HZ=100
| stack=0x778e784000-0x778e786000 stackSize=1009KB
| held mutexes=
kernel: (couldn't read /proc/self/task/3558/stack)
native: #00 pc 000000000006f29c /system/lib64/libc.so (write+8)
native: #01 pc 000000000006c80c /system/lib64/libc.so (__dwrite+16)
native: #02 pc 000000000007d94c /system/lib64/libc.so (fflush+180)
native: #03 pc 000000000006c7c4 /system/lib64/libc.so (vdprintf+176)
native: #04 pc 000000000007b7bc /system/lib64/libc.so (dprintf+112)
native: #05 pc 000000000012fc64 /system/lib64/libhwui.so (android::uirenderer::JankTracker::dumpFrames(int)+224)
native: #06 pc 0000000000112f90 /system/lib64/libhwui.so (_ZNSt3__120__packaged_task_funcIZN7android10uirenderer12renderthread11RenderProxy15dumpProfileInfoEiiE4$_27NS_9allocatorIS5_EEFvvEEclEv$d1b19a60f4b1bb5f6a5927e3e6deb93d+84)
native: #07 pc 000000000048290c /system/lib64/libhwui.so (std::__1::packaged_task<void ()>::operator()()+88)
native: #08 pc 00000000004366c4 /system/lib64/libhwui.so (android::uirenderer::WorkQueue::process()+168)
native: #09 pc 0000000000114ee8 /system/lib64/libhwui.so (android::uirenderer::renderthread::RenderThread::threadLoop()+240)
native: #10 pc 000000000000fa68 /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+280)
native: #11 pc 0000000000081a08 /system/lib64/libc.so (__pthread_start(void*)+36)
native: #12 pc 00000000000234cc /system/lib64/libc.so (__start_thread+68)
(no managed stack frames)
从堆栈里面,我们可以很容易看到renderthread线程正在执行一个dumpFrames方法,而且block在了dprintf这个函数上。
void JankTracker::dumpFrames(int fd) {
dprintf(fd, "\n\n---PROFILEDATA---\n");
for (size_t i = 0; i < static_cast<size_t>(FrameInfoIndex::NumIndexes); i++) {
dprintf(fd, "%s", FrameInfoNames[i].c_str());
dprintf(fd, ",");
}
for (size_t i = 0; i < mFrames.size(); i++) {
FrameInfo& frame = mFrames[i];
if (frame[FrameInfoIndex::SyncStart] == 0) {
continue;
}
dprintf(fd, "\n");
for (int i = 0; i < static_cast<int>(FrameInfoIndex::NumIndexes); i++) {
dprintf(fd, "%" PRId64 ",", frame[i]);
}
}
dprintf(fd, "\n---PROFILEDATA---\n\n");
}
查阅资料得知,dprintf是一个往管道写入数据的方法。而写管道卡住可能有如下原因:
1、管道的read被关闭
2、管道的write写入太多数据,read太慢
3、整个系统的I/O出现了重大问题
因此我们需要看一下这个dumpFrames方法的管道是谁创建的,最终我们查阅源码的调用栈发现,这个管道是在AMS里面创建的
final void dumpGraphicsHardwareUsage(FileDescriptor fd,
PrintWriter pw, String[] args) {
...
for (int i = procs.size() - 1 ; i >= 0 ; i--) {
ProcessRecord r = procs.get(i);
if (r.thread != null) {
pw.println("\n** Graphics info for pid " + r.pid + " [" + r.processName + "] **");
pw.flush();
try {
TransferPipe tp = new TransferPipe();
try {
r.thread.dumpGfxInfo(tp.getWriteFd(), args);
tp.go(fd);
} finally {
tp.kill();
}
} catch (IOException e) {
pw.println("Failure while dumping the app: " + r);
pw.flush();
} catch (RemoteException e) {
pw.println("Got a RemoteException while dumping the app " + r);
pw.flush();
}
}
}
}
可以看到这里实际是用户执行了dumpsys命令以后,AMS为每个Android进程创建了一个TransferPipe线程以及对应的管道,然后去获取对应的显示信息。我们再来看看管道的创建:
protected TransferPipe(String bufferPrefix, String threadName) throws IOException {
mThread = new Thread(this, threadName);
mFds = ParcelFileDescriptor.createPipe();
mBufferPrefix = bufferPrefix;
}
public static ParcelFileDescriptor[] createPipe() throws IOException {
try {
final FileDescriptor[] fds = Os.pipe();
return new ParcelFileDescriptor[] {
new ParcelFileDescriptor(fds[0]),
new ParcelFileDescriptor(fds[1]) };
} catch (ErrnoException e) {
throw e.rethrowAsIOException();
}
}
从上面创建管道的代码中,我们可以看到这是一个阻塞的管道,也就是说如果write写满了,会直接阻塞write。然后我们再结合AMS的代码以及测试提供的dumpsys文件,我们可以看出其他进程的显示信息均正常打印,而我们这个ANR 应用的显示信息却没有,说明是ANR应用的显示信息太多了,导致占满了管道,进而阻塞了renderthread,进而阻塞了主线程,最终导致了ANR的发生。