Android系统服务死锁、ANR检测机制
Android系统运行以后,System_server中可能有成百上千个线程在运行,各种服务之间调用很频繁,也很复杂,难免会出现死锁和长时间未响应的问题。这个问题对于系统来说是非常严重的,因为一旦出现这种情况,会导致一系列的并发症,最终会导致界面卡死,手机耗电急剧上升,发热严重。当然,我们要做的第一步是尽量避免此情况的发生,这种需要大量的测试和实践,Android系统现在已经做的很不错了,但是也要考虑一旦出现这种情况,系统对此的处理。本文主要来回顾下framework层 Watchdog、anr检测、处理相关的知识。
Watchdog检测原理
watchdog主要对系统重要的服务进行检测和处理,下来从源码的角度来分析它如何实现的。watchdog首先本身是一个线程,继承于Thread,在system_server初始化的过程中启动。
private Watchdog() {
super("watchdog");
// The shared foreground thread is the main checker. It is where we
// will also dispatch monitor checks and do other work.
mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
"foreground thread", DEFAULT_TIMEOUT);
mHandlerCheckers.add(mMonitorChecker);
// Add checker for main thread. We only do a quick check since there
// can be UI running on the thread.
mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
"main thread", DEFAULT_TIMEOUT));
// Add checker for shared UI thread.
mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
"ui thread", DEFAULT_TIMEOUT));
// And also check IO thread.
mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
"i/o thread", DEFAULT_TIMEOUT));
// And the display thread.
mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
"display thread", DEFAULT_TIMEOUT));
}
首先,在它初始化过程中,将几个重要的线程添加到mHandlerCheckers中,这些线程全都是事件驱动线程,继承于HandlerThread,而HandlerChecker本身是个Runnable对象。前台线程也是最主要的检测者,外界服务添加monitor check都是添加到mMonitorChecker中。
public void addMonitor(Monitor monitor) {
synchronized (this) {
if (isAlive()) {
throw new RuntimeException("Monitors can't be added once the Watchdog is running");
}
mMonitorChecker.addMonitor(monitor);
}
}
接下来看看Watchdog运行之后做了什么事情:
@Override
public void run() {
boolean waitedHalf = false;
while (true) {
final ArrayList<HandlerChecker> blockedCheckers;
final String subject;
final boolean allowRestart; //可动态设置,当发生死锁,系统是否需要重启
int debuggerWasConnected = 0;
synchronized (this) {
long timeout = CHECK_INTERVAL; // 30s
//会调用每个线程对应的HandlerCheckers的scheduleCheckLocked方法
//HandlerChecker中又持有该线程Handler引用,Handler又能获取到Looper
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
hc.scheduleCheckLocked();
}
if (debuggerWasConnected > 0) {
debuggerWasConnected--;
}
//记录开始时间
long start = SystemClock.uptimeMillis();
while (timeout > 0) {
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
try {
wait(timeout); //等待30s
} catch (InterruptedException e) {
Log.wtf(TAG, e);
}
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
}
//这个方法稍后分析,waitState 是执行完获取HandlerCheck检测结果
final int waitState = evaluateCheckerCompletionLocked();
if (waitState == COMPLETED) {
//代表没有死锁的发生,重新开始
// The monitors have returned; reset
waitedHalf = false;
continue;
} else if (waitState == WAITING) {
//还是等待中
// still waiting but within their configured intervals; back off and recheck
continue;
} else if (waitState == WAITED_HALF) {
//如果30s内HandleCheck未执行完,则打印native进程状态
if (!waitedHalf) {
// We've waited half the deadlock-detection interval. Pull a stack
// trace and wait another half.
ArrayList<Integer> pids = new ArrayList<Integer>();
pids.add(Process.myPid());
ActivityManagerService.dumpStackTraces(true, pids, null, null,
NATIVE_STACKS_OF_INTEREST);
waitedHalf = true;
}
continue;
}
//如果1分钟还未执行完,则获取哪些HandlerChecker堵塞了。
blockedCheckers = getBlockedCheckersLocked();
//将堵塞详细信息打印出来
subject = describeCheckersLocked(blockedCheckers);
allowRestart = mAllowRestart;
}
//记录到EventLog中
EventLog.writeEvent(EventLogTags