Android WatchDog分析

  Android Watchdog是用于监控其它系统服务是否处于正常工作状态的一种机制。
一些重要的系统服务,如果处于死锁等异常状态时,系统已处于非正常的工作状态,这时重启系统来恢复android是非常必要的动作。

一、Watchdog的启动。
Watchdog 是在SystemServer当中启动的:
Slog.i(TAG, “Init Watchdog”);
final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
Watchdog使用了单实例模式:
public static Watchdog getInstance() {
if (sWatchdog == null) {
sWatchdog = new Watchdog();
}

    return sWatchdog;
}

接着我们来看init函数:
public void init(Context context, ActivityManagerService activity) {
mResolver = context.getContentResolver();
mActivity = activity;

    context.registerReceiver(new RebootRequestReceiver(),
            new IntentFilter(Intent.ACTION_REBOOT),
            android.Manifest.permission.REBOOT, null);
}
注册了ACTION_REBOOT的广播接收器
final class RebootRequestReceiver extends BroadcastReceiver {
    @Override
    public void onReceive(Context c, Intent intent) {
        if (intent.getIntExtra("nowait", 0) != 0) {
            rebootSystem("Received ACTION_REBOOT broadcast");
            return;
        }
        Slog.w(TAG, "Unsupported ACTION_REBOOT broadcast: " + intent);
    }
}

rebootSystem函数应该就是重启系统了:
void rebootSystem(String reason) {
Slog.i(TAG, “Rebooting system because: ” + reason);
IPowerManager pms = (IPowerManager)ServiceManager.getService(Context.POWER_SERVICE);
try {
pms.reboot(false, reason, false);
} catch (RemoteException ex) {
}
}
果然是这样,所以watchdog有一个重要的工作,就是接收广播并重启系统。

二、Watchdog的工作原理。
当需要使用Watchdog时,首先将被监控对象的线程的handler传给watchdog
Watchdog.getInstance().addThread(mHandler);
将被监控对象传给watchdog
Watchdog.getInstance().addMonitor(this);

addThread是把当前线程的handler传入,并new了一个HandlerChecker对象
    public void addThread(Handler thread) {
    addThread(thread, DEFAULT_TIMEOUT);
}

public void addThread(Handler thread, long timeoutMillis) {
    synchronized (this) {
        if (isAlive()) {
            throw new RuntimeException("Threads can't be added once the Watchdog is running");
        }
        final String name = thread.getLooper().getThread().getName();
        mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis));
    }
}

HandlerChecker是一个runnable:
public final class HandlerChecker implements Runnable
这里可以看出,HandlerChecker使用的是被监控对象的线程。

addMonitor函数:
public void addMonitor(Monitor monitor) {
mMonitors.add(monitor);
}
只是做了一个保存而已。后续会用到。

接下来,我们来看watchdog运行时,是怎样使用这两个对象来达到监控的目的的。

Watchdog的run()函数:

@Override
public void run() {
    boolean waitedHalf = false;
    boolean mSFHang = false;
    while (true) {//死循环
        final ArrayList<HandlerChecker> blockedCheckers;
        String subject;
        mSFHang = false;
        if (exceptionHWT != null && waitedHalf == false ) {
            exceptionHWT.WDTMatterJava(300);
        }
        final boolean allowRestart;
        int debuggerWasConnected = 0;

        Slog.w(TAG, "SWT Watchdog before synchronized:" + SystemClock.uptimeMillis());

        synchronized (this) {

            Slog.w(TAG, "SWT Watchdog after synchronized:" + SystemClock.uptimeMillis());

            long timeout = CHECK_INTERVAL;
            long SFHangTime;
            // Make sure we (re)spin the checkers that have become idle within
            // this wait-and-check interval
            for (int i=0; i<mHandlerCheckers.size(); i++) {
                HandlerChecker hc = mHandlerCheckers.get(i);
                hc.scheduleCheckLocked();//逐个的检查
            }

            if (debuggerWasConnected > 0) {
                debuggerWasConnected--;
            }

            // NOTE: We use uptimeMillis() here because we do not want to increment the time we
            // wait while asleep. If the device is asleep then the thing that we are waiting
            // to timeout on is asleep as well and won't have a chance to run, causing a false
            // positive on when to kill things.
            long start = SystemClock.uptimeMillis();
            while (timeout > 0) {
                if (Debug.isDebuggerConnected()) {
                    debuggerWasConnected = 2;
                }
                try {
                    wait(timeout);//等待30秒,或者有notify激活
                } catch (InterruptedException e) {
                    Log.wtf(TAG, e);
                }
                if (Debug.isDebuggerConnected()) {
                    debuggerWasConnected = 2;
                }
                timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);//检查等待时间是否真的到达
            }

                final int waitState = evaluateCheckerCompletionLocked();//检查状态
                if (waitState == COMPLETED) {
                    // The monitors have returned; reset
                    waitedHalf = false;
                    //CputimeEnable(new String("0"));
                    continue;
                } else if (waitState == WAITING) {
                    // still waiting but within their configured intervals; back off and recheck
                   // CputimeEnable(new String("0"));
                    continue;
                } else if (waitState == WAITED_HALF) {
                    if (!waitedHalf) {
              ...
                        waitedHalf = true;
                    }
                    continue;
                }
            // something is overdue!
            blockedCheckers = getBlockedCheckersLocked();
            subject = describeCheckersLocked(blockedCheckers);
            allowRestart = mAllowRestart;
        }

        ...

            Process.killProcess(Process.myPid());
            System.exit(10);
        }

        waitedHalf = false;
    }
}

看这段代码中使用到的几个重要的函数:
public void scheduleCheckLocked() {
if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
mCompleted = true;
return;
}

        if (!mCompleted) {
            // we already have a check in flight, so no need
            return;
        }

        mCompleted = false;
        mCurrentMonitor = null;
        mStartTime = SystemClock.uptimeMillis();
        mHandler.postAtFrontOfQueue(this);
    }

前面说过HandlerChecker使用的是被监控对象的线程handler,所以这里mHandler.postAtFrontOfQueue实际上就是上被监控对象发消息。
看一下postAtFrontOfQueue函数:

public final boolean postAtFrontOfQueue(Runnable r) {
    return this.sendMessageAtFrontOfQueue(getPostMessage(r));
}

private static Message getPostMessage(Runnable r) {
    Message m = Message.obtain();
    m.callback = r;
    return m;
}

public void dispatchMessage(Message msg) {
if(msg.callback != null) {
handleCallback(msg);
} else {
if(this.mCallback != null && this.mCallback.handleMessage(msg)) {
return;
}
this.handleMessage(msg);
}

}

private static void handleCallback(Message message) {
    message.callback.run();
}
所以,postAtFrontOfQueue(r)最终会调用r.run()。
所以,如果被监控对象如果发生消息堵塞,根本就不可能会处理到postAtFrontOfQueue的这个消息。也即被控制对象的monitor()函数不会被调用,最终导致超时未响应。因此,watchdog实际上还有监控消息队例是否堵塞的作用。

总之,watchdog是每过30秒,通过向被监控对象发消息的方式,来检查被监控对象的状态的。

接下来,我们看postAtFrontOfQueue后最终来处理这个消息的地方,即HandlerChecker的run函数:
public void run() {
final int size = mMonitors.size();
for (int i = 0 ; i < size ; i++) {
synchronized (Watchdog.this) {
mCurrentMonitor = mMonitors.get(i);
}
mCurrentMonitor.monitor();
}

        synchronized (Watchdog.this) {
            mCompleted = true;
            mCurrentMonitor = null;
        }
    }

一个重要的地方mCurrentMonitor.monitor(),我们来看一个实际调用的地方,是怎样实现这个接口的:
public void monitor() {
synchronized (this) { }
}
这是ActivityManagerService的实现,什么都没有做,只是用了一个synchronized(this),如果发生了死锁,那么monitor()就会一直处于等待状态,mCompleted = true;就不会被执行到,那么mCompleted 的值就为false;
未完待续

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值