watchdog的源码很简单,主要有两个功能
1监控system_server中几个关键的锁,原理就是在android_fg线程中尝试加锁
2监控几个常用线程的执行时间,原理就是在这几个线程中执行任务
主要监控的线程有:
1 FgThread 前台操作,优先级比较高,但是应该尽量少放任务,因为会阻塞后面的任务
2 SystemServer main线程 ui操作线程
3 UiThread 用于AMS ui创建
4 IoThread io操作
5 DisplayThread display相关操作
要监控的monitor(主要是用于获取锁),主要实现是添加montor回调,在一定时间内在fg中执行一遍所有的monitor回调函数,主要的monitor有:
1 BinderThreadMonitor 监控是否有可用的binder进程
public voidmonitor() {
Binder.blockUntilThreadAvailable();
}
2 MountService 两个daemon进程
private final Object mDaemonLock = new Object();
public void monitor() {
if (mConnector != null) {
mConnector.monitor();
}
if (mCryptConnector != null) {
mCryptConnector.monitor();
}
}
3 NetworkManagementService 一个daemon进程
public void monitor() {
if (mConnector != null) {
mConnector.monitor();
}
}
4 ActivityManagerService this锁
public void monitor() {
synchronized (this) { }
}
5 PowerManagerService
public void monitor() {
// Grab and release lock for watchdog monitor to detect deadlocks.
synchronized (mLock) {
}
}
6 TvRemoteService
public void monitor() {
synchronized (mLock) { /* check for deadlock */ }
}
7 WindowManagerService
public void monitor() {
synchronized (mWindowMap) { }
}
知道了要监控的东西我们来看看如何监控
public class Watchdog extends Thread
Watchdog实现了Thread类,所以是运行在线程中的,我们来看下run方法:
1 synchronized (this) {
2 long timeout = CHECK_INTERVAL;
3 // Make sure we (re)spin the checkers that have become idle within
4 // this wait-and-check interval
5 for (int i=0; i<mHandlerCheckers.size(); i++) {
6 HandlerChecker hc = mHandlerCheckers.get(i);
7 hc.scheduleCheckLocked();
8 }
9
10 if (debuggerWasConnected > 0) {
11 debuggerWasConnected--;
12 }
13
14 // NOTE: We use uptimeMillis() here because we do not want to increment the time we
15 // wait while asleep. If the device is asleep then the thing that we are waiting
16 // to timeout on is asleep as well and won't have a chance to run, causing a false
17 // positive on when to kill things.
18 long start = SystemClock.uptimeMillis();
19 while (timeout > 0) {
20 if (Debug.isDebuggerConnected()) {
21 debuggerWasConnected = 2;
22 }
23 try {
24 wait(timeout);
25 } catch (InterruptedException e) {
26 Log.wtf(TAG, e);
27 }
28 if (Debug.isDebuggerConnected()) {
29 debuggerWasConnected = 2;
30 }
31 timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
32 }
33
34 final int waitState = evaluateCheckerCompletionLocked();
35 if (waitState == COMPLETED) {
36 // The monitors have returned; reset
37 waitedHalf = false;
38 continue;
39 } else if (waitState == WAITING) {
40 // still waiting but within their configured intervals; back off and recheck
41 continue;
42 } else if (waitState == WAITED_HALF) {
43 if (!waitedHalf) {
44 // We've waited half the deadlock-detection interval. Pull a stack
45 // trace and wait another half.
46 ArrayList<Integer> pids = new ArrayList<Integer>();
47 pids.add(Process.myPid());
48 ActivityManagerService.dumpStackTraces(true, pids, null, null,
NATIVE_STACKS_OF_INTEREST);
49 waitedHalf = true;
50 }
51 continue;
52 }
// something is overdue!
53 blockedCheckers = getBlockedCheckersLocked();
54 subject = describeCheckersLocked(blockedCheckers);
55 allowRestart = mAllowRestart;
56 }
主要逻辑都在这里面,前面说的监控的线程都封装在 HandlerChecker 数据结构里面,保存在mHandlerCheckers成员变量维护的集合中, 第5到7行代码就是拿出来它调用scheduleCheckLocked方法
public void scheduleCheckLocked() {
if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
// If the target looper has recently been polling, then
// there is no reason to enqueue our checker on it since that
// is as good as it not being deadlocked. This avoid having
// to do a context switch to check the thread. Note that we
// only do this if mCheckReboot is false and we have no
// monitors, since those would need to be executed at this point.
mCompleted = true;
return;
}
if (!mCompleted) {
// we already have a check in flight, so no need
return;
}
mCompleted = false;
mCurrentMonitor = null;
mStartTime = SystemClock.uptimeMillis();
mHandler.postAtFrontOfQueue(this);
}
这个方法很简单 就是往要监控的线程里面抛出一个任务(HandlerCheckers 实现Runnable接口)放在监控进程消息队列的最前边:注意isPolling()判断空闲时刻不添加任务
1 public void run() {
2 final int size = mMonitors.size();
3 for (int i = 0 ; i < size ; i++) {
4 synchronized (Watchdog.this) {
5 mCurrentMonitor = mMonitors.get(i);
6 }
7 mCurrentMonitor.monitor();
8 }
9
10 synchronized (Watchdog.this) {
11 mCompleted = true;
12 mCurrentMonitor = null;
13 }
14 }
如果监控的线程里面没有耗时任务,马上就会执行到run方法,run方法执行到之后就会遍历执行monitors(只有FgThread里面添加了monitor,如一开始所说的monitor)
public void addMonitor(Monitor monitor) {
synchronized (this) {
if (isAlive()) {
throw new RuntimeException("Monitors can't be added once the Watchdog is running");
}
mMonitorChecker.addMonitor(monitor);
}
}
如上 mMonitorChecker就是包装了FbThread的HandlerChecker.
执行完monitor方法后 11行设置mCompleted=true.
然后我们回到Watchdog的run方法里面,给各个线程添加完任务后,watchdog线程就进入到睡眠状态,等待30s超时. 30s后再去检查各个线程和monitor状态
使用evaluateCheckerCompletionLocked函数返回HandlerChecker状态
private int evaluateCheckerCompletionLocked() {
int state = COMPLETED;
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
state = Math.max(state, hc.getCompletionStateLocked());
}
return state;
}
public int getCompletionStateLocked() {
if (mCompleted) {
return COMPLETED;
} else {
long latency = SystemClock.uptimeMillis() - mStartTime;
if (latency < mWaitMax/2) {
return WAITING;
} else if (latency < mWaitMax) {
return WAITED_HALF;
}
}
return OVERDUE;
}
函数很简单 总结一下完成任务处理返回COMPLETED ,没有完成但是小于30s(应该极少发生)返回WAITING,超过30s但是少于60s没有完成则返回WAITED_HALF,最后如果60s都没有完成任务,则返回OVERDUE
再次回到watchdog的run函数35到最后都是对这几个状态的处理. 取状态最大值,如果是
COMPLETED状态说明所有线程和锁都没有严重的阻塞情况.
状态为WAITING的话还不到30s不处理继续下一轮检测(还记得吗一轮的超时时间为30s)
状态为WAITED_HALF的时候 打印traces.txt 继续下一轮检测
状态为OVERDUE 说明已经超过一秒没有响应,这种情况系统就完全hang住了,就需要做一些清理工作重启system_server了
还要一种情况比较特殊 就是跑monkey的时候,不指定–kill-process-after-error参数 则不会杀死system_server
log中的关键字:
Blocked in handler on foreground thread (android.fg), Blocked in handler on main thread (main), Blocked in handler on display thread (android.display), Blocked in handler on ActivityManager (ActivityManager)
“Blocked in monitor ” + mCurrentMonitor.getClass().getName() ” on ” + mName + ” (” + getThread().getName() + “)”
日志文件 traces_SystemServer_WDT_${time}.txt
对于Monkey的时候 monkey程序还会抓取当前的bugreport保存在anr_watchdog文件下面