Android WatchDog分析

最新推荐文章于 2024-06-20 16:20:13 发布

Fifi_0617

最新推荐文章于 2024-06-20 16:20:13 发布

阅读量1.1k

点赞数

分类专栏：系统服务文章标签： android watchdog

本文链接：https://blog.csdn.net/zyfzhangyafei/article/details/58079621

版权

系统服务专栏收录该内容

23 篇文章 0 订阅

订阅专栏

　　Android Watchdog是用于监控其它系统服务是否处于正常工作状态的一种机制。
一些重要的系统服务，如果处于死锁等异常状态时，系统已处于非正常的工作状态，这时重启系统来恢复android是非常必要的动作。

一、Watchdog的启动。
Ｗatchdog 是在SystemServer当中启动的：
Slog.i(TAG, “Init Watchdog”);
final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
Watchdog使用了单实例模式：
public static Watchdog getInstance() {
if (sWatchdog == null) {
sWatchdog = new Watchdog();
}

    return sWatchdog;
}

接着我们来看init函数：
public void init(Context context, ActivityManagerService activity) {
mResolver = context.getContentResolver();
mActivity = activity;

    context.registerReceiver(new RebootRequestReceiver(),
            new IntentFilter(Intent.ACTION_REBOOT),
            android.Manifest.permission.REBOOT, null);
}
注册了ACTION_REBOOT的广播接收器
final class RebootRequestReceiver extends BroadcastReceiver {
    @Override
    public void onReceive(Context c, Intent intent) {
        if (intent.getIntExtra("nowait", 0) != 0) {
            rebootSystem("Received ACTION_REBOOT broadcast");
            return;
        }
        Slog.w(TAG, "Unsupported ACTION_REBOOT broadcast: " + intent);
    }
}

rebootSystem函数应该就是重启系统了：
void rebootSystem(String reason) {
Slog.i(TAG, “Rebooting system because: ” + reason);
IPowerManager pms = (IPowerManager)ServiceManager.getService(Context.POWER_SERVICE);
try {
pms.reboot(false, reason, false);
} catch (RemoteException ex) {
}
}
果然是这样，所以watchdog有一个重要的工作，就是接收广播并重启系统。

二、Watchdog的工作原理。
当需要使用Watchdog时，首先将被监控对象的线程的handler传给watchdog
Watchdog.getInstance().addThread(mHandler);
将被监控对象传给watchdog
Watchdog.getInstance().addMonitor(this);

addThread是把当前线程的handler传入，并new了一个HandlerChecker对象
    public void addThread(Handler thread) {
    addThread(thread, DEFAULT_TIMEOUT);
}

public void addThread(Handler thread, long timeoutMillis) {
    synchronized (this) {
        if (isAlive()) {
            throw new RuntimeException("Threads can't be added once the Watchdog is running");
        }
        final String name = thread.getLooper().getThread().getName();
        mHandlerCheckers.add(new HandlerChecker(thread, name, timeoutMillis));
    }
}

HandlerChecker是一个runnable:
public final class HandlerChecker implements Runnable
这里可以看出，HandlerChecker使用的是被监控对象的线程。

addMonitor函数：
public void addMonitor(Monitor monitor) {
mMonitors.add(monitor);
}
只是做了一个保存而已。后续会用到。

接下来，我们来看watchdog运行时，是怎样使用这两个对象来达到监控的目的的。

Watchdog的run()函数：

@Override
public void run() {
    boolean waitedHalf = false;
    boolean mSFHang = false;
    while (true) {//死循环
        final ArrayList<HandlerChecker> blockedCheckers;
        String subject;
        mSFHang = false;
        if (exceptionHWT != null && waitedHalf == false ) {
            exceptionHWT.WDTMatterJava(300);
        }
        final boolean allowRestart;
        int debuggerWasConnected = 0;

        Slog.w(TAG, "SWT Watchdog before synchronized:" + SystemClock.uptimeMillis());

        synchronized (this) {

            Slog.w(TAG, "SWT Watchdog after synchronized:" + SystemClock.uptimeMillis());

            long timeout = CHECK_INTERVAL;
            long SFHangTime;
            // Make sure we (re)spin the checkers that have become idle within
            // this wait-and-check interval
            for (int i=0; i<mHandlerCheckers.size(); i++) {
                HandlerChecker hc = mHandlerCheckers.get(i);
                hc.scheduleCheckLocked();//逐个的检查
            }

            if (debuggerWasConnected > 0) {
                debuggerWasConnected--;
            }

            // NOTE: We use uptimeMillis() here because we do not want to increment the time we
            // wait while asleep. If the device is asleep then the thing that we are waiting
            // to timeout on is asleep as well and won't have a chance to run, causing a false
            // positive on when to kill things.
            long start = SystemClock.uptimeMillis();
            while (timeout > 0) {
                if (Debug.isDebuggerConnected()) {
                    debuggerWasConnected = 2;
                }
                try {
                    wait(timeout);//等待30秒，或者有notify激活
                } catch (InterruptedException e) {
                    Log.wtf(TAG, e);
                }
                if (Debug.isDebuggerConnected()) {
                    debuggerWasConnected = 2;
                }
                timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);//检查等待时间是否真的到达
            }

                final int waitState = evaluateCheckerCompletionLocked();//检查状态
                if (waitState == COMPLETED) {
                    // The monitors have returned; reset
                    waitedHalf = false;
                    //CputimeEnable(new String("0"));
                    continue;
                } else if (waitState == WAITING) {
                    // still waiting but within their configured intervals; back off and recheck
                   // CputimeEnable(new String("0"));
                    continue;
                } else if (waitState == WAITED_HALF) {
                    if (!waitedHalf) {
              ...
                        waitedHalf = true;
                    }
                    continue;
                }
            // something is overdue!
            blockedCheckers = getBlockedCheckersLocked();
            subject = describeCheckersLocked(blockedCheckers);
            allowRestart = mAllowRestart;
        }

        ...

            Process.killProcess(Process.myPid());
            System.exit(10);
        }

        waitedHalf = false;
    }
}

看这段代码中使用到的几个重要的函数：
public void scheduleCheckLocked() {
if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
mCompleted = true;
return;
}

        if (!mCompleted) {
            // we already have a check in flight, so no need
            return;
        }

        mCompleted = false;
        mCurrentMonitor = null;
        mStartTime = SystemClock.uptimeMillis();
        mHandler.postAtFrontOfQueue(this);
    }

前面说过HandlerChecker使用的是被监控对象的线程handler，所以这里mHandler.postAtFrontOfQueue实际上就是上被监控对象发消息。
看一下postAtFrontOfQueue函数：

public final boolean postAtFrontOfQueue(Runnable r) {
    return this.sendMessageAtFrontOfQueue(getPostMessage(r));
}

private static Message getPostMessage(Runnable r) {
    Message m = Message.obtain();
    m.callback = r;
    return m;
}

public void dispatchMessage(Message msg) {
if(msg.callback != null) {
handleCallback(msg);
} else {
if(this.mCallback != null && this.mCallback.handleMessage(msg)) {
return;
}
this.handleMessage(msg);
}

}

private static void handleCallback(Message message) {
    message.callback.run();
}
所以，postAtFrontOfQueue(r)最终会调用r.run()。
所以，如果被监控对象如果发生消息堵塞，根本就不可能会处理到postAtFrontOfQueue的这个消息。也即被控制对象的monitor()函数不会被调用，最终导致超时未响应。因此，watchdog实际上还有监控消息队例是否堵塞的作用。

总之，watchdog是每过30秒，通过向被监控对象发消息的方式，来检查被监控对象的状态的。

接下来，我们看postAtFrontOfQueue后最终来处理这个消息的地方，即HandlerChecker的run函数:
public void run() {
final int size = mMonitors.size();
for (int i = 0 ; i < size ; i++) {
synchronized (Watchdog.this) {
mCurrentMonitor = mMonitors.get(i);
}
mCurrentMonitor.monitor();
}

        synchronized (Watchdog.this) {
            mCompleted = true;
            mCurrentMonitor = null;
        }
    }

一个重要的地方mCurrentMonitor.monitor()，我们来看一个实际调用的地方，是怎样实现这个接口的：
public void monitor() {
synchronized (this) { }
}
这是ActivityManagerService的实现，什么都没有做，只是用了一个synchronized(this)，如果发生了死锁，那么monitor()就会一直处于等待状态，mCompleted = true;就不会被执行到，那么mCompleted 的值就为false;
未完待续

Fifi_0617

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Android WatchDog分析

Android Watchdog是用于监控其它系统服务是否处于正常工作状态的一种机制。一些重要的系统服务，如果处于死锁等异常状态时，系统已处于非正常的工作状态，这时重启系统来恢复android是非常必要的动作。一、Watchdog的启动。Ｗatchdog 是在SystemServer当中启动的： Slog.i(TAG, “Init Watchdog”);
复制链接

扫一扫

专栏目录