Android watchdog分析

watchdog的源码很简单,主要有两个功能

1监控system_server中几个关键的锁,原理就是在android_fg线程中尝试加锁

2监控几个常用线程的执行时间,原理就是在这几个线程中执行任务

主要监控的线程有:

1 FgThread 前台操作,优先级比较高,但是应该尽量少放任务,因为会阻塞后面的任务

2 SystemServer main线程 ui操作线程

3 UiThread 用于AMS ui创建

4 IoThread io操作

5 DisplayThread display相关操作

要监控的monitor(主要是用于获取锁),主要实现是添加montor回调,在一定时间内在fg中执行一遍所有的monitor回调函数,主要的monitor有:

1 BinderThreadMonitor 监控是否有可用的binder进程

 public voidmonitor() {

         Binder.blockUntilThreadAvailable();

    }

2 MountService 两个daemon进程

private final Object mDaemonLock = new Object();
public void monitor() {
        if (mConnector != null) {
            mConnector.monitor();
        }
        if (mCryptConnector != null) {
            mCryptConnector.monitor();
        }
    }

3 NetworkManagementService 一个daemon进程

    public void monitor() {
        if (mConnector != null) {
            mConnector.monitor();
        }
    }

4 ActivityManagerService this锁

public void monitor() {
        synchronized (this) { }
    }

5 PowerManagerService

   public void monitor() {
        // Grab and release lock for watchdog monitor to detect deadlocks.
        synchronized (mLock) {
        }
    }

6 TvRemoteService

 public void monitor() {
        synchronized (mLock) { /* check for deadlock */ }
    }

7 WindowManagerService

 public void monitor() {
        synchronized (mWindowMap) { }
    }

知道了要监控的东西我们来看看如何监控

public class Watchdog extends Thread

Watchdog实现了Thread类,所以是运行在线程中的,我们来看下run方法:

1 synchronized (this) {
2                long timeout = CHECK_INTERVAL;
3                // Make sure we (re)spin the checkers that have become idle within
4                // this wait-and-check interval
5                for (int i=0; i<mHandlerCheckers.size(); i++) {
6                    HandlerChecker hc = mHandlerCheckers.get(i);
7                    hc.scheduleCheckLocked();
8                }
9
10                if (debuggerWasConnected > 0) {
11                    debuggerWasConnected--;
12                }
13
14                // NOTE: We use uptimeMillis() here because we do not want to increment the time we
15                // wait while asleep. If the device is asleep then the thing that we are waiting
16                // to timeout on is asleep as well and won't have a chance to run, causing a false
17                // positive on when to kill things.
18                long start = SystemClock.uptimeMillis();
19                while (timeout > 0) {
20                    if (Debug.isDebuggerConnected()) {
21                        debuggerWasConnected = 2;
22                    }
23                    try {
24                        wait(timeout);
25                    } catch (InterruptedException e) {
26                        Log.wtf(TAG, e);
27                    }
28                   if (Debug.isDebuggerConnected()) {
29                        debuggerWasConnected = 2;
30                    }
31                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
32               }
33
34                final int waitState = evaluateCheckerCompletionLocked();
35                if (waitState == COMPLETED) {
36                    // The monitors have returned; reset
37                    waitedHalf = false;
38                    continue;
39                } else if (waitState == WAITING) {
40                    // still waiting but within their configured intervals; back off and recheck
41                    continue;
42               } else if (waitState == WAITED_HALF) {
43                    if (!waitedHalf) {
44                        // We've waited half the deadlock-detection interval.  Pull a stack
45                        // trace and wait another half.
46                        ArrayList<Integer> pids = new ArrayList<Integer>();
47                        pids.add(Process.myPid());
48                        ActivityManagerService.dumpStackTraces(true, pids, null, null,
                                NATIVE_STACKS_OF_INTEREST);
49                        waitedHalf = true;
50                    }
51                    continue;
52                }

                // something is overdue!
53                blockedCheckers = getBlockedCheckersLocked();
54                subject = describeCheckersLocked(blockedCheckers);
55                allowRestart = mAllowRestart;
56            }

主要逻辑都在这里面,前面说的监控的线程都封装在 HandlerChecker 数据结构里面,保存在mHandlerCheckers成员变量维护的集合中, 第5到7行代码就是拿出来它调用scheduleCheckLocked方法

public void scheduleCheckLocked() {
            if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
                // If the target looper has recently been polling, then
                // there is no reason to enqueue our checker on it since that
                // is as good as it not being deadlocked.  This avoid having
                // to do a context switch to check the thread.  Note that we
                // only do this if mCheckReboot is false and we have no
                // monitors, since those would need to be executed at this point.
                mCompleted = true;
                return;
            }

            if (!mCompleted) {
                // we already have a check in flight, so no need
                return;
            }

            mCompleted = false;
            mCurrentMonitor = null;
            mStartTime = SystemClock.uptimeMillis();
            mHandler.postAtFrontOfQueue(this);
        }

这个方法很简单 就是往要监控的线程里面抛出一个任务(HandlerCheckers 实现Runnable接口)放在监控进程消息队列的最前边:注意isPolling()判断空闲时刻不添加任务

1 public void run() {
2            final int size = mMonitors.size();
3            for (int i = 0 ; i < size ; i++) {
4                synchronized (Watchdog.this) {
5                   mCurrentMonitor = mMonitors.get(i);
6                }
7                mCurrentMonitor.monitor();
8            }
9
10            synchronized (Watchdog.this) {
11               mCompleted = true;
12                mCurrentMonitor = null;
13           }
14        }

如果监控的线程里面没有耗时任务,马上就会执行到run方法,run方法执行到之后就会遍历执行monitors(只有FgThread里面添加了monitor,如一开始所说的monitor)

 public void addMonitor(Monitor monitor) {
        synchronized (this) {
            if (isAlive()) {
                throw new RuntimeException("Monitors can't be added once the Watchdog is running");
            }
            mMonitorChecker.addMonitor(monitor);
        }
    }

如上 mMonitorChecker就是包装了FbThread的HandlerChecker.
执行完monitor方法后 11行设置mCompleted=true.

然后我们回到Watchdog的run方法里面,给各个线程添加完任务后,watchdog线程就进入到睡眠状态,等待30s超时. 30s后再去检查各个线程和monitor状态
使用evaluateCheckerCompletionLocked函数返回HandlerChecker状态

    private int evaluateCheckerCompletionLocked() {
        int state = COMPLETED;
        for (int i=0; i<mHandlerCheckers.size(); i++) {
            HandlerChecker hc = mHandlerCheckers.get(i);
            state = Math.max(state, hc.getCompletionStateLocked());
        }
        return state;
    }
public int getCompletionStateLocked() {
            if (mCompleted) {
                return COMPLETED;
            } else {
                long latency = SystemClock.uptimeMillis() - mStartTime;
                if (latency < mWaitMax/2) {
                    return WAITING;
                } else if (latency < mWaitMax) {
                    return WAITED_HALF;
                }
            }
            return OVERDUE;
        }

函数很简单 总结一下完成任务处理返回COMPLETED ,没有完成但是小于30s(应该极少发生)返回WAITING,超过30s但是少于60s没有完成则返回WAITED_HALF,最后如果60s都没有完成任务,则返回OVERDUE

再次回到watchdog的run函数35到最后都是对这几个状态的处理. 取状态最大值,如果是
COMPLETED状态说明所有线程和锁都没有严重的阻塞情况.
状态为WAITING的话还不到30s不处理继续下一轮检测(还记得吗一轮的超时时间为30s)
状态为WAITED_HALF的时候 打印traces.txt 继续下一轮检测
状态为OVERDUE 说明已经超过一秒没有响应,这种情况系统就完全hang住了,就需要做一些清理工作重启system_server了
还要一种情况比较特殊 就是跑monkey的时候,不指定–kill-process-after-error参数 则不会杀死system_server

log中的关键字:
Blocked in handler on foreground thread (android.fg), Blocked in handler on main thread (main), Blocked in handler on display thread (android.display), Blocked in handler on ActivityManager (ActivityManager)

“Blocked in monitor ” + mCurrentMonitor.getClass().getName() ” on ” + mName + ” (” + getThread().getName() + “)”

日志文件 traces_SystemServer_WDT_${time}.txt

对于Monkey的时候 monkey程序还会抓取当前的bugreport保存在anr_watchdog文件下面

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值