当初游的快了一点

android学习笔记

一起来聊聊Android基础之Watchdog

一起来聊聊Android基础之Watchdog

标签(空格分隔): Android面试知识


资料来源:
WatchDog工作原理
相关文件:

/frameworks/base/services/core/java/com/android/server/Watchdog.java
/frameworks/base/services/java/com/android/server/SystemServer.java


Watchdog时序图:
此处输入图片的描述

扯闲篇

Android系统中,有HW Watchdog用于检测硬件是否正常工作;而System Server Watchdog(SWT)负责检测系统关键服务是否正常工作。

Watchdog机制广泛应用于Linux系统中,系统必须在指定时间内执行喂狗操作,否则就会触发Watchdog超时,从而强行复位系统等操作。

Watchdog初始化

Android中,Watchdog在初始化是在开机阶段,从SystemServer中完成中。
SystemServer.java

// SystemServer.java

private void startOtherServices() {
    // ...
    // 1. 创建Watchdog实例对象
    final Watchdog watchdog = Watchdog.getInstance();
    // 2. 初始化watchdog
    watchdog.init(context, mActivityManagerService);
    // 3. 启动watchdog
    Watchdog.getInstance().start();
    // ...
}

创建Watchdog实例对象

Watchdog.java

// Watchdog.java

215    public static Watchdog getInstance() {
216        if (sWatchdog == null) {
217            sWatchdog = new Watchdog();
218        }
219
220        return sWatchdog;
221    }

82    final ArrayList<HandlerChecker> mHandlerCheckers = new ArrayList<>();

223    private Watchdog() {
224        super("watchdog");
            // ...
232        mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
233                "foreground thread", DEFAULT_TIMEOUT);
234        mHandlerCheckers.add(mMonitorChecker);
235        // Add checker for main thread.  We only do a quick check since there
236        // can be UI running on the thread.
237        mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
238                "main thread", DEFAULT_TIMEOUT));
239        // Add checker for shared UI thread.
240        mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
241                "ui thread", DEFAULT_TIMEOUT));
242        // And also check IO thread.
243        mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
244                "i/o thread", DEFAULT_TIMEOUT));
245        // And the display thread.
246        mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
247                "display thread", DEFAULT_TIMEOUT));
248
249        // Initialize monitor for Binder threads.
250        addMonitor(new BinderThreadMonitor());
251    }

Watchdog采用的是单例模式来创建实例对象。Watchdog继承于Thread,创建的线程名为”watchdog”。mHandlerCheckers队列包括、 主线程,fg, ui, io, display线程的HandlerChecker对象。

初始化watchdog

// Watchdog.java

253    public void init(Context context, ActivityManagerService activity) {
254        mResolver = context.getContentResolver();
255        mActivity = activity;
256
257        context.registerReceiver(new RebootRequestReceiver(),
258                new IntentFilter(Intent.ACTION_REBOOT),
259                android.Manifest.permission.REBOOT, null);
260    }

调用registerReceiver注册ACTION_REBOOT广播,当Watchdog触发超时便重启系统。

202    final class RebootRequestReceiver extends BroadcastReceiver {
203        @Override
204        public void onReceive(Context c, Intent intent) {
205            if (intent.getIntExtra("nowait", 0) != 0) {
206                rebootSystem("Received ACTION_REBOOT broadcast");
207                return;
208            }
209            Slog.w(TAG, "Unsupported ACTION_REBOOT broadcast: " + intent);
210        }
211    }

321    void rebootSystem(String reason) {
322        Slog.i(TAG, "Rebooting system because: " + reason);
323        IPowerManager pms = (IPowerManager)ServiceManager.getService(Context.POWER_SERVICE);
324        try {
325            pms.reboot(false, reason, false);
326        } catch (RemoteException ex) {
327        }
328    }

重启系统调用的是PowerManagerService.reboot()方法。

启动watchdog

// Watchdog.java

348    @Override
349    public void run() {
        // ...
503    }

Watchdog继承自Thread,所以调用start()方法后会回调run()方法。

Watchdog工作机制

Watchdog的工作是在它的run()方法中完成的,主要任务是监测重要进程是否超时,以及超时后打印相关信息,当满足一定条件时重启。

下面详细来分析run()方法:

// Watchdog.java

398    @Override
399    public void run() {
400        boolean waitedHalf = false;
401        while (true) {
402            final ArrayList<HandlerChecker> blockedCheckers;
403            final String subject;
404            final boolean allowRestart;
405            int debuggerWasConnected = 0;
406            synchronized (this) {
                    // CHECK_INTERVAL = 30s
407                long timeout = CHECK_INTERVAL;
408                // Make sure we (re)spin the checkers that have become idle within
409                // this wait-and-check interval
                    // 1. 记录所有Checker的mStartTime
410                for (int i=0; i<mHandlerCheckers.size(); i++) {
411                    HandlerChecker hc = mHandlerCheckers.get(i);
412                    hc.scheduleCheckLocked();
413                }
414
415                if (debuggerWasConnected > 0) {
416                    debuggerWasConnected--;
417                }
418
                    // 2. 等待30s
423                long start = SystemClock.uptimeMillis();
424                while (timeout > 0) {
425                    if (Debug.isDebuggerConnected()) {
426                        debuggerWasConnected = 2;
427                    }
428                    try {
429                        wait(timeout);
430                    } catch (InterruptedException e) {
431                        Log.wtf(TAG, e);
432                    }
433                    if (Debug.isDebuggerConnected()) {
434                        debuggerWasConnected = 2;
435                    }
436                    timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start);
437                }

                    // 3. 评估Checker的状态
439                final int waitState = evaluateCheckerCompletionLocked();
440                if (waitState == COMPLETED) {
441                    // The monitors have returned; reset
442                    waitedHalf = false;
443                    continue;
444                } else if (waitState == WAITING) {
445                    // still waiting but within their configured intervals; back off and recheck
446                    continue;
447                } else if (waitState == WAITED_HALF) {
448                    if (!waitedHalf) {
449                        // We've waited half the deadlock-detection interval.  Pull a stack
450                        // trace and wait another half.
451                        ArrayList<Integer> pids = new ArrayList<Integer>();
452                        pids.add(Process.myPid());
453                        ActivityManagerService.dumpStackTraces(true, pids, null, null,
454                            getInterestingNativePids());
455                        waitedHalf = true;
456                    }
457                    continue;
458                }
459
                    // 4. 有Checker已经超时,获取阻塞的Cherkers。
460                // something is overdue!
461                blockedCheckers = getBlockedCheckersLocked();
462                subject = describeCheckersLocked(blockedCheckers);
463                allowRestart = mAllowRestart;
464            }

1.scheduleCheckLocked()

内部类HandlerChecker实现了Runnable接口,源码对这个类的描述:

/**
 * Used for checking status of handle threads and scheduling monitor callbacks.
 */

 127        public void scheduleCheckLocked() {
128            if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
135                mCompleted = true;
136                return;
137            }
138
139            if (!mCompleted) {
140                // we already have a check in flight, so no need
141                return;
142            }
143
144            mCompleted = false;
145            mCurrentMonitor = null;
146            mStartTime = SystemClock.uptimeMillis();
147            mHandler.postAtFrontOfQueue(this);
148        }

185        @Override
186        public void run() {
187            final int size = mMonitors.size();
188            for (int i = 0 ; i < size ; i++) {
189                synchronized (Watchdog.this) {
190                    mCurrentMonitor = mMonitors.get(i);
191                }
192                mCurrentMonitor.monitor();
193            }
194
195            synchronized (Watchdog.this) {
196                mCompleted = true;
197                mCurrentMonitor = null;
198            }
199        }
200    }

那它是如何来检查handle threads的状态呢?
就是通过计算mStartTime和当前的时间差,和mWaitMax进行对比来判断该threads的状态。

第147行,mHandler.postAtFrontOfQueue(this)会将HanderChecker插入到被监控进程的MessageQueue的队列头,当被监控进程的Looper抽取消息时便会回调HanderChecker的run()方法。
在run()方法中,遍历所有Monitor接口。如果被监控进程由于某种原因,导致monitor()方法迟迟没有执行,就会触发watchdog。

如果有其他消息不断地调用postAtFrontOfQueue()也可能导致watchdog没有机会执行;或者是每个monitor消耗一些时间,累加起来超过1分钟造成的watchdog。 这些都是非常规的Watchdog。

2.等待30S再向下执行

3.evaluateCheckerCompletionLocked()

64    static final int COMPLETED = 0;
65    static final int WAITING = 1;
66    static final int WAITED_HALF = 2;
67    static final int OVERDUE = 3;

330    private int evaluateCheckerCompletionLocked() {
331        int state = COMPLETED;
332        for (int i=0; i<mHandlerCheckers.size(); i++) {
333            HandlerChecker hc = mHandlerCheckers.get(i);
334            state = Math.max(state, hc.getCompletionStateLocked());
335        }
336        return state;
337    }

154        public int getCompletionStateLocked() {
155            if (mCompleted) {
156                return COMPLETED;
157            } else {
158                long latency = SystemClock.uptimeMillis() - mStartTime;
159                if (latency < mWaitMax/2) {
160                    return WAITING;
161                } else if (latency < mWaitMax) {
162                    return WAITED_HALF;
163                }
164            }
165            return OVERDUE;
166        }

这个Checker状态评估规则非常直观,一共有四种,分别是:COMPLETED(已完成),WAITING(等待时间小于mWaitMax/2),WAITED_HALF(等待时间大于mWaitMax/2),OVERDUE(超时)。

当Checker达到WAITED_HALF状态时,将调用ActivityManagerService.dumpStackTraces()方法打印相关进程的堆栈信息。

4.有Checker已经超时

继续来看run()后面的代码:

471            ArrayList<Integer> pids = new ArrayList<>();
472            pids.add(Process.myPid());
473            if (mPhonePid > 0) pids.add(mPhonePid);
                // 第二次以追加的方式,再打印堆栈信息
476            final File stack = ActivityManagerService.dumpStackTraces(
477                    !waitedHalf, pids, null, null, getInterestingNativePids());
478

481            SystemClock.sleep(2000);
483            // Pull our own kernel thread stacks as well if we're configured for that
484            if (RECORD_KERNEL_THREADS) {
                // 输出kernel栈信息
485                dumpKernelStackTraces();
486            }

                // 触发kernel输出所有阻塞线程的堆栈信息
488            // Trigger the kernel to dump all blocked threads, and backtraces on all CPUs to the kernel log
489            doSysRq('w');
490            doSysRq('l');

                // 输出dropbox信息到/data/system/dropbox
495            Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
496                    public void run() {
497                        mActivity.addErrorToDropBox(
498                                "watchdog", null, "system_server", null, null,
499                                subject, null, stack, null);
500                    }
501                };
502            dropboxThread.start();
503            try {
504                dropboxThread.join(2000);  // wait up to 2 seconds for it to return.
505            } catch (InterruptedException ignored) {}

507            IActivityController controller;
508            synchronized (this) {
509                controller = mController;
510            }
511            if (controller != null) {
512                Slog.i(TAG, "Reporting stuck state to activity controller");
513                try {
514                    Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
515                    // 1 = keep waiting, -1 = kill system
516                    int res = controller.systemNotResponding(subject);
517                    if (res >= 0) {
518                        Slog.i(TAG, "Activity controller requested to coninue to wait");
519                        waitedHalf = false;
520                        continue;
521                    }
522                } catch (RemoteException e) {
523                }
524            }

                // 当debugger没有attach时,才杀死进程
526            // Only kill the process if the debugger is not attached.
527            if (Debug.isDebuggerConnected()) {
528                debuggerWasConnected = 2;
529            }
530            if (debuggerWasConnected >= 2) {
531                Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
532            } else if (debuggerWasConnected > 0) {
533                Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
534            } else if (!allowRestart) {
535                Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
536            } else {
537                Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);

                    // 遍历输出阻塞线程的栈信息
538                for (int i=0; i<blockedCheckers.size(); i++) {
539                    Slog.w(TAG, blockedCheckers.get(i).getName() + " stack trace:");
540                    StackTraceElement[] stackTrace
541                            = blockedCheckers.get(i).getThread().getStackTrace();
542                    for (StackTraceElement element: stackTrace) {
543                        Slog.w(TAG, "    at " + element);
544                    }
545                }
546                Slog.w(TAG, "*** GOODBYE!");

                    // 杀死进程system_server
547                Process.killProcess(Process.myPid());
548                System.exit(10);
549            }
550
551            waitedHalf = false;
552        }

当杀死system_server进程,从而导致zygote进程自杀,进而触发init执行重启Zygote进程,这便出现了手机framework重启的现象。

Watchdog监测的进程

69    // Which native processes to dump into dropbox's stack traces
70    public static final String[] NATIVE_STACKS_OF_INTEREST = new String[] {
71        "/system/bin/audioserver",
72        "/system/bin/cameraserver",
73        "/system/bin/drmserver",
74        "/system/bin/mediadrmserver",
75        "/system/bin/mediaserver",
76        "/system/bin/sdcard",
77        "/system/bin/surfaceflinger",
78        "media.extractor", // system/bin/mediaextractor
79        "media.codec", // vendor/bin/hw/android.hardware.media.omx@1.0-service
80        "com.android.bluetooth",  // Bluetooth service
81    };
82
83    public static final List<String> HAL_INTERFACES_OF_INTEREST = Arrays.asList(
84        "android.hardware.audio@2.0::IDevicesFactory",
85        "android.hardware.bluetooth@1.0::IBluetoothHci",
86        "android.hardware.camera.provider@2.4::ICameraProvider",
87        "android.hardware.graphics.composer@2.1::IComposer",
88        "android.hardware.vr@1.0::IVr",
89        "android.hardware.media.omx@1.0::IOmx"
90    );

监控同步锁

能够被Watchdog监控的系统服务都实现了Watchdog.Monitor接口,并实现其中的monitor()方法。运行在android.fg线程, 系统中实现该接口类主要有:

  • ActivityManagerService
  • WindowManagerService
  • InputManagerService
  • PowerManagerService
  • NetworkManagementService
  • MountService
  • NativeDaemonConnector
  • BinderThreadMonitor
  • MediaProjectionManagerService
  • MediaRouterService
  • MediaSessionService
    -BinderThreadMonitor

总结

出处:
- Watchdog是一个运行在system_server进程的名为”watchdog”的线程
- Watchdog运作过程,当阻塞时间超过1分钟则触发一次watchdog,会杀死system_server,触发上层重启;
- mHandlerCheckers记录所有的HandlerChecker对象的列表,包括foreground, main, ui, i/o, display线程的handler;
- mHandlerChecker.mMonitors记录所有Watchdog目前正在监控Monitor,所有的这些monitors都运行在foreground线程。
有两种方式加入Watchdog监控:
- addThread():用于监测Handler线程,默认超时时长为60s.这种超时往往是所对应的handler线程消息处理得慢;
- addMonitor(): 用于监控实现了Watchdog.Monitor接口的服务.这种超时可能是”android.fg”线程消息处理得慢,也可能是monitor迟迟拿不到锁;

以下情况,即使触发了Watchdog,也不会杀掉system_server进程:

  • monkey: 设置IActivityController,拦截systemNotResponding事件, 比如monkey.
  • hang: 执行am hang命令,不重启;
  • debugger: 连接debugger的情况, 不重启;
阅读更多
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/zxc637841323/article/details/79971380
文章标签: Android
个人分类: android
上一篇Android系统进程Zygote启动过程分析
想对作者说点什么? 我来说一句

浅谈团购网宣传推广策略

2011年04月18日 37KB 下载

没有更多推荐了,返回首页

关闭
关闭