Android Watchdog是用于监控其它系统服务是否处于正常工作状态的一种机制。
一些重要的系统服务,如果处于死锁等异常状态时,系统已处于非正常的工作状态,这时重启系统来恢复android是非常必要的动作。
一、Watchdog的启动。
Watchdog 是在SystemServer当中启动的:
Slog.i(TAG, “Init Watchdog”);
final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
Watchdog使用了单实例模式:
public static Watchdog getInstance() {
if (sWatchdog == null) {
sWatchdog = new Watchdog();
}
return sWatchdog;
}
接着我们来看init函数:
public void init(Context context, ActivityManagerService activity) {
mResolver = context.getContentResolver();
mActivity = activity;
context.registerReceiver(new RebootRequestReceiver(),
new IntentFilter(Intent.ACTION_REBOOT),
android.Manifest.permission.REBOOT, null);
}
注册了ACTION_REBOOT的广播接收器
final class RebootRequestReceiver extends BroadcastReceiver {
@Override
public void onReceive(Context c, Intent intent) {
if (intent.getIntExtra("nowait", 0) != 0) {
rebootSystem("Received ACTION_REBOOT broadcast");
return;
}
Slog.w(TAG, "Unsupported ACTION_REBOOT broadcast: " + intent);
}
}
rebootSystem函数应该就是重启系统了:
void rebootSystem(String reason) {
Slog.i(TAG, “Rebooting system because: ” + reason);
IPowerManager pms = (IPowerManager)ServiceManager.getService(Context.POWER_SERVICE);
try {
pms.reboot(false, reason, false);
} catch (RemoteException ex) {
}
}
果然是这样,所以watchdog有一个重要的工作,就是接收广播并重启系统。
二、Watchdog的工作原理。
当需要使用Watchdog时,首先将被监控对象的线程的handler传给watchdog
Watchdog.getInstance().addThread(mHandler);
将被监控对象传给watchdog
Watchdog.getInstance().addMonitor(this);
addThread是把当前线程的handler传入,并new了一个HandlerChecker对象
public void addThread(Handler thread) {
addThread(thread, DEFAULT_TIMEOUT);
}
public void addThread(Handler thread, long timeoutMillis) {
synchronized (this) {
if (isAlive()) {
throw new RuntimeException("Threads can't be added once the Watchdog is running");
}
final String name = thread.getLooper().getThread().getName();
mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis));
}
}
HandlerChecker是一个runnable:
public final class HandlerChecker implements Runnable
这里可以看出,HandlerChecker使用的是被监控对象的线程。
addMonitor函数:
public void addMonitor(Monitor monitor) {
mMonitors.add(monitor);
}
只是做了一个保存而已。后续会用到。
接下来,我们来看watchdog运行时,是怎样使用这两个对象来达到监控的目的的。
Watchdog的run()函数:
@Override
public void run() {
boolean waitedHalf = false;
boolean mSFHang = false;
while (true) {//死循环
final ArrayList<HandlerChecker> blockedCheckers;
String subject;
mSFHang = false;
if (exceptionHWT != null && waitedHalf == false ) {
exceptionHWT.WDTMatterJava(300);
}
final boolean allowRestart;
int debuggerWasConnected = 0;
Slog.w(TAG, "SWT Watchdog before synchronized:" + SystemClock.uptimeMillis());
synchronized (this) {
Slog.w(TAG, "SWT Watchdog after synchronized:" + SystemClock.uptimeMillis());
long timeout = CHECK_INTERVAL;
long SFHangTime;
// Make sure we (re)spin the checkers that have become idle within
// this wait-and-check interval
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
hc.scheduleCheckLocked();//逐个的检查
}
if (debuggerWasConnected > 0) {
debuggerWasConnected--;
}
// NOTE: We use uptimeMillis() here because we do not want to increment the time we
// wait while asleep. If the device is asleep then the thing that we are waiting
// to timeout on is asleep as well and won't have a chance to run, causing a false
// positive on when to kill things.
long start = SystemClock.uptimeMillis();
while (timeout > 0) {
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
try {
wait(timeout);//等待30秒,或者有notify激活
} catch (InterruptedException e) {
Log.wtf(TAG, e);
}
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);//检查等待时间是否真的到达
}
final int waitState = evaluateCheckerCompletionLocked();//检查状态
if (waitState == COMPLETED) {
// The monitors have returned; reset
waitedHalf = false;
//CputimeEnable(new String("0"));
continue;
} else if (waitState == WAITING) {
// still waiting but within their configured intervals; back off and recheck
// CputimeEnable(new String("0"));
continue;
} else if (waitState == WAITED_HALF) {
if (!waitedHalf) {
...
waitedHalf = true;
}
continue;
}
// something is overdue!
blockedCheckers = getBlockedCheckersLocked();
subject = describeCheckersLocked(blockedCheckers);
allowRestart = mAllowRestart;
}
...
Process.killProcess(Process.myPid());
System.exit(10);
}
waitedHalf = false;
}
}
看这段代码中使用到的几个重要的函数:
public void scheduleCheckLocked() {
if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
mCompleted = true;
return;
}
if (!mCompleted) {
// we already have a check in flight, so no need
return;
}
mCompleted = false;
mCurrentMonitor = null;
mStartTime = SystemClock.uptimeMillis();
mHandler.postAtFrontOfQueue(this);
}
前面说过HandlerChecker使用的是被监控对象的线程handler,所以这里mHandler.postAtFrontOfQueue实际上就是上被监控对象发消息。
看一下postAtFrontOfQueue函数:
public final boolean postAtFrontOfQueue(Runnable r) {
return this.sendMessageAtFrontOfQueue(getPostMessage(r));
}
private static Message getPostMessage(Runnable r) {
Message m = Message.obtain();
m.callback = r;
return m;
}
public void dispatchMessage(Message msg) {
if(msg.callback != null) {
handleCallback(msg);
} else {
if(this.mCallback != null && this.mCallback.handleMessage(msg)) {
return;
}
this.handleMessage(msg);
}
}
private static void handleCallback(Message message) {
message.callback.run();
}
所以,postAtFrontOfQueue(r)最终会调用r.run()。
所以,如果被监控对象如果发生消息堵塞,根本就不可能会处理到postAtFrontOfQueue的这个消息。也即被控制对象的monitor()函数不会被调用,最终导致超时未响应。因此,watchdog实际上还有监控消息队例是否堵塞的作用。
总之,watchdog是每过30秒,通过向被监控对象发消息的方式,来检查被监控对象的状态的。
接下来,我们看postAtFrontOfQueue后最终来处理这个消息的地方,即HandlerChecker的run函数:
public void run() {
final int size = mMonitors.size();
for (int i = 0 ; i < size ; i++) {
synchronized (Watchdog.this) {
mCurrentMonitor = mMonitors.get(i);
}
mCurrentMonitor.monitor();
}
synchronized (Watchdog.this) {
mCompleted = true;
mCurrentMonitor = null;
}
}
一个重要的地方mCurrentMonitor.monitor(),我们来看一个实际调用的地方,是怎样实现这个接口的:
public void monitor() {
synchronized (this) { }
}
这是ActivityManagerService的实现,什么都没有做,只是用了一个synchronized(this),如果发生了死锁,那么monitor()就会一直处于等待状态,mCompleted = true;就不会被执行到,那么mCompleted 的值就为false;
未完待续