Android WatchDog介绍

最新推荐文章于 2024-04-20 16:04:33 发布

拿节

最新推荐文章于 2024-04-20 16:04:33 发布

阅读量1k

点赞数

分类专栏： Android 文章标签： android

本文链接：https://blog.csdn.net/zhejiang9/article/details/104275843

版权

Android 专栏收录该内容

48 篇文章 2 订阅

订阅专栏

文章目录

- Android WatchDog

WatchDog，在早期的嵌入式系统，设计它是为了防止软件系统跑飞后最后一个挽救措施，就是重启设备，虽然有点暴力，但是一般重启后，对于很多偶现的bug，基本都能临时解决

WatchDog的设计基本都需要包含如下三个功能

投喂机制
dump异常日志
异常修复

投喂机制，又分成

被动 - 等系统来喂"食物"
主动 - 自己主动检查是否有"食物"

不管是主动还是被动，当没"食物"给到WatchDog的时候，都会触发异常，接着dump异常日志，然后尝试修复

早期嵌入式系统，WatchDog一般都是硬件设备，所以会采用软件系统喂的方式

对于为了软件系统而实现WatchDog，由于实现更加灵活，所以投喂机制就可以按需来实现

Android WatchDog

Android系统也存在WatchDog，主要用于监控systemserver内部各服务线程的运行情况，systemserver在初始化启动服务时，会完成WatchDog的初始化配置和启动

private void startOtherServices() {
{
    ...
    final Watchdog watchdog = Watchdog.getInstance();
    watchdog.init(context, mActivityManagerService);
    ...
    mActivityManagerService.systemReady(new Runnable() {
        .....
        Watchdog.getInstance().start();
    })
}

先调用init初始化，然后在AMS.systemReady完成后，启动WatchDog，那怎么往WatchDog配置监控线程或回调呢？直接拿AMS的配置代码举例：

public ActivityManagerService(Context systemContext) {
   ...
   Watchdog.getInstance().addMonitor(this);
   Watchdog.getInstance().addThread(mHandler);
}

在构造函数结束前，添加了监控回调和与监控线程绑定的handler

WatchDog初始化

接着从代码来分析，先看WatchDog的构造函数

public class Watchdog extends Thread {
    private Watchdog() {
        super("watchdog");
        // Initialize handler checkers for each common thread we want to check.  Note
        // that we are not currently checking the background thread, since it can
        // potentially hold longer running operations with no guarantees about the timeliness
        // of operations there.

        // The shared foreground thread is the main checker.  It is where we
        // will also dispatch monitor checks and do other work.
        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
                "foreground thread", DEFAULT_TIMEOUT);
        mHandlerCheckers.add(mMonitorChecker);
        // Add checker for main thread.  We only do a quick check since there
        // can be UI running on the thread.
        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
                "main thread", DEFAULT_TIMEOUT));
        // Add checker for shared UI thread.
        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
                "ui thread", DEFAULT_TIMEOUT));
        // And also check IO thread.
        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
                "i/o thread", DEFAULT_TIMEOUT));
        // And the display thread.
        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
                "display thread", DEFAULT_TIMEOUT));
    }
    ...
}

WatchDog派生自Thread，在构造时，主要初始化

mMonitorChecker - monitor监控回调执行线程绑定的HandlerChecker
mHandlerCheckers - 初始化预置的HandlerCheckers

HandlerChecker实现了对Handler绑定线程执行超时做监控，超时时间可在构造时配置，这个是默认行为，基于Android Handler Looper机制来实现的

除了默认行为，我们还可以通过设置HandlerChecker的monitor回调，来添加自定义的监控行为

WatchDog的monitor回调会被统一保存到mMonitorChecker

HandlerChecker介绍

HandlerChecker的核心实现介绍：

post message到message queue的头部

        public void scheduleCheckLocked() {
            //monitor回调为空并且looper是空闲的，状态置为完成直接返回
            if (mMonitors.size() == 0 && mHandler.getLooper().isIdling()) {
                // If the target looper is or just recently was idling, then
                // there is no reason to enqueue our checker on it since that
                // is as good as it not being deadlocked.  This avoid having
                // to do a context switch to check the thread.  Note that we
                // only do this if mCheckReboot is false and we have no
                // monitors, since those would need to be executed at this point.
                mCompleted = true;
                return;
            }

            if (!mCompleted) {
                // we already have a check in flight, so no need
                return;
            }

            mCompleted = false;
            mCurrentMonitor = null;
            mStartTime = SystemClock.uptimeMillis();
            //往头部插入message
            mHandler.postAtFrontOfQueue(this);
        }

message关联runnable被执行

        public void run() {
            final int size = mMonitors.size();
            //执行monitor回调
            for (int i = 0 ; i < size ; i++) {
                synchronized (Watchdog.this) {
                    mCurrentMonitor = mMonitors.get(i);
                }
                mCurrentMonitor.monitor();
            }
            //设置执行完成状态
            synchronized (Watchdog.this) {
                mCompleted = true;
                mCurrentMonitor = null;
            }
        }

获取执行状态

        public int getCompletionStateLocked() {
            if (mCompleted) {
                return COMPLETED;
            } else {
                long latency = SystemClock.uptimeMillis() - mStartTime;
                if (latency < mWaitMax/2) {
                    return WAITING;
                } else if (latency < mWaitMax) {
                    return WAITED_HALF;
                }
            }
            return OVERDUE;
        }

从上面的代码可以看出，在scheduleCheckLocked()被调用后，能够影响HandlerChecker状态置为COMPLETED就两点

handlerchecker关联的线程阻塞，导致post message关联runnable在超时时间内没被执行
runnable执行了，并配置了monitor回调，monitor回调执行超时了

WatchDog检测逻辑介绍

上头说了，WatchDog自身就是一条线程，在线程启动后触发检测，直接看代码吧

    @Override
    public void run() {
        boolean waitedHalf = false;
        while (true) {
            final ArrayList<HandlerChecker> blockedCheckers;
            final String subject;
            final boolean allowRestart;
            int debuggerWasConnected = 0;
            synchronized (this) {
                //检测间隔，默认半分钟
                long timeout = CHECK_INTERVAL;
                // Make sure we (re)spin the checkers that have become idle within
                // this wait-and-check interval
                //遍历handlerchecker，依次触发检测
                for (int i=0; i<mHandlerCheckers.size(); i++) {
                    HandlerChecker hc = mHandlerCheckers.get(i);
                    hc.scheduleCheckLocked();
                }

                if (debuggerWasConnected > 0) {
                    debuggerWasConnected--;
                }

                // NOTE: We use uptimeMillis() here because we do not want to increment the time we
                // wait while asleep. If the device is asleep then the thing that we are waiting
                // to timeout on is asleep as well and won't have a chance to run, causing a false
                // positive on when to kill things.
                long start = SystemClock.uptimeMillis();
                while (timeout > 0) {
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    try {
                        //线程等待
                        wait(timeout);
                    } catch (InterruptedException e) {
                        Log.wtf(TAG, e);
                    }
                    if (Debug.isDebuggerConnected()) {
                        debuggerWasConnected = 2;
                    }
                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
                }

                final int waitState = evaluateCheckerCompletionLocked();
                if (waitState == COMPLETED) {
                    // The monitors have returned; reset
                    waitedHalf = false;
                    continue;
                } else if (waitState == WAITING) {
                    // still waiting but within their configured intervals; back off and recheck
                    continue;
                } else if (waitState == WAITED_HALF) {
                    if (!waitedHalf) {
                        // We've waited half the deadlock-detection interval.  Pull a stack
                        // trace and wait another half.
                        ArrayList<Integer> pids = new ArrayList<Integer>();
                        pids.add(Process.myPid());
                        ActivityManagerService.dumpStackTraces(true, pids, null, null,
                                NATIVE_STACKS_OF_INTEREST);
                        waitedHalf = true;
                    }
                    continue;
                }

                // 超时了
                blockedCheckers = getBlockedCheckersLocked();
                subject = describeCheckersLocked(blockedCheckers);
                allowRestart = mAllowRestart;
            }

            // If we got here, that means that the system is most likely hung.
            // First collect stack traces from all threads of the system process.
            // Then kill this process so that the system will restart.
            EventLog.writeEvent(EventLogTags.WATCHDOG, subject);

            ArrayList<Integer> pids = new ArrayList<Integer>();
            pids.add(Process.myPid());
            if (mPhonePid > 0) pids.add(mPhonePid);
            // Pass !waitedHalf so that just in case we somehow wind up here without having
            // dumped the halfway stacks, we properly re-initialize the trace file.
            final File stack = ActivityManagerService.dumpStackTraces(
                    !waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);

            // Give some extra time to make sure the stack traces get written.
            // The system's been hanging for a minute, another second or two won't hurt much.
            SystemClock.sleep(2000);

            // Pull our own kernel thread stacks as well if we're configured for that
            if (RECORD_KERNEL_THREADS) {
                dumpKernelStackTraces();
            }

            // Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log
            doSysRq('w');
            doSysRq('l');

            // Try to add the error to the dropbox, but assuming that the ActivityManager
            // itself may be deadlocked.  (which has happened, causing this statement to
            // deadlock and the watchdog as a whole to be ineffective)
            Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
                    public void run() {
                        mActivity.addErrorToDropBox(
                                "watchdog", null, "system_server", null, null,
                                subject, null, stack, null);
                    }
                };
            dropboxThread.start();
            try {
                dropboxThread.join(2000);  // wait up to 2 seconds for it to return.
            } catch (InterruptedException ignored) {}

            IActivityController controller;
            synchronized (this) {
                controller = mController;
            }
            if (controller != null) {
                Slog.i(TAG, "Reporting stuck state to activity controller");
                try {
                    Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
                    // 1 = keep waiting, -1 = kill system
                    int res = controller.systemNotResponding(subject);
                    if (res >= 0) {
                        Slog.i(TAG, "Activity controller requested to coninue to wait");
                        waitedHalf = false;
                        continue;
                    }
                } catch (RemoteException e) {
                }
            }

            // Only kill the process if the debugger is not attached.
            if (Debug.isDebuggerConnected()) {
                debuggerWasConnected = 2;
            }
            if (debuggerWasConnected >= 2) {
                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
            } else if (debuggerWasConnected > 0) {
                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
            } else if (!allowRestart) {
                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
            } else {
                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
                for (int i=0; i<blockedCheckers.size(); i++) {
                    Slog.w(TAG, blockedCheckers.get(i).getName() + " stack trace:");
                    StackTraceElement[] stackTrace
                            = blockedCheckers.get(i).getThread().getStackTrace();
                    for (StackTraceElement element: stackTrace) {
                        Slog.w(TAG, "    at " + element);
                    }
                }
                Slog.w(TAG, "*** GOODBYE!");
                Process.killProcess(Process.myPid());
                System.exit(10);
            }

            waitedHalf = false;
        }
    }

从代码可以很明显的看出整个逻辑

通过无限循环来达到重复检测
在每次检测前，遍历所有的HandlerChecker并调用scheduleCheckLocked
通过调用wait函数并设置超时时间来使线程挂起一段时间
超时后线程继续执行，通过调用evaluateCheckerCompletionLocked获取各个HandlerChecker的最终执行状态，如果返回overdue，说明存在未完成的情况
通过调用ActivityManagerService.dumpStackTraces保存堆栈信息
通过mActivity.addErrorToDropBox将错误日志保存到dropbox
通过Process.killProcess(Process.myPid())和System.exit(10)杀死system server进程，从而触发Android设备的软重启

参考文献

Android7.0 Watchdog机制

拿节

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Android WatchDog介绍

文章目录Android WatchDogWatchDog初始化HandlerChecker介绍WatchDog检测逻辑介绍参考文献WatchDog，在早期的嵌入式系统，设计它是为了防止软件系统跑飞后最后一个挽救措施，就是重启设备，虽然有点暴力，但是一般重启后，对于很多偶现的bug，基本都能临时解决WatchDog的设计基本都需要包含如下三个功能投喂机制dump异常日志异常修复投喂机...
复制链接

扫一扫