Android死机问题之 Watchdog详解

最新推荐文章于 2024-06-20 16:20:13 发布

夏夏_xx

最新推荐文章于 2024-06-20 16:20:13 发布

阅读量2.6k

点赞数 1

分类专栏：移动开发 java 文章标签： android 死机 watch dog

java 同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

移动开发

1 篇文章 0 订阅

订阅专栏

Android Watchdog详解

转自http://www.eoeandroid.com/thread-74512-1-1.html。

      现在的CPU基本上都带有WatchDog功能，这种硬件的WatchDog可以在系统死掉(死锁或者程序跑飞)后重启系统，让系统回到可以工作的状态。WatchDog不能防止系统死掉，但是它能够起死回生，从而提高系统的可用性。
   硬件级的WatchDog也有它的局限性，它只能在系统范围内生效，不能针对单个进程，某个进程死掉了，WatchDog未必知道。对于像Linux这类久经考验的操作系统来说，整个系统死掉概率非常低，所以此时硬件级的WatchDog意义反而不大。
   Android 平台实现了一个软件的WatchDog来监护SystemServer。SystemServer无疑是Android平台中最重要的进程了，里面运行了整个平台中绝大多数的服务。在这个进程中运行着近50个线程，任何一个线程死掉都可能导致整个系统死掉。SystemServer退出反而问题不大，因为 init进程会重新启动它，但是它死锁就麻烦了，因为整个系统就没法动了。
   在 SystemServer里运行的服务中，最重要的几个服务应该数ActivityManager、WindowManager和 PowerManager。软件的WatchDog主要就是确保这几个服务发生死锁之后，退出SystemServer进程，让init进程重启它，让系统回到可用状态。
   每个被监护的Service必须实现Watchdog.Monitor接口，这个接口只要实现一个函数monitor，这个函数实现非常简单，就拿ActivityManager来说吧：

Java代码：
public void monitor() {
synchronized (this) { }
}

   它去锁一下对象，什么也不做，然后就返回。如果对象没有死锁，这个过程就会很顺利。如果对象死锁了，这个函数就会挂在这里。
   当然实现Watchdog.Monitor接口还不够，还要把它注册到WatchDog服务中，在初始化时加这样一行代码就行了：

Java代码：
Watchdog.getInstance().addMonitor(this);

      最后我们看看WatchDog服务的实现。WatchDog服务包括两个方面：
      1.定期调用被监护对象的monitor函数，这是在主线程中完成的。如果被监护对象死锁，则会阻塞在这里。

Java代码：
final int size = mMonitors.size();
for (int i = 0 ; i < size ; i++) {
mCurrentMonitor = mMonitors.get(i);
mCurrentMonitor.monitor();
}

      2.检测是否发生死锁，这是在Watchdog线程中运行的。如果发生死锁而且没有被调试，则退出SystemServer，init进程就会重启SystemServer进程。

Java代码：
if (!Debug.isDebuggerConnected()) {
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + name);
Process.killProcess(Process.myPid());
System.exit(10);
} else {
Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
}

=====================================================================================================

转自http://blog.csdn.net/hxdanya/article/details/6738135

1 Android中的WatchDog

本文主要介绍android framework层中的watchdog，它属于一种软件Watchdog实现。

WatchDog主要作用：

1).接收系统内部reboot请求,重启系统。

2).监护SystemServer进程,防止系统死锁。

2 WatchDog启动

WatchDog是在SystemServer进程中被初始化和启动的。在SystemServer 被Start时，各种Android服务被注册和启动，其中也包括了WatchDog的初始化和启动。代码如下：

Slog.i(TAG, "Init Watchdog");

Watchdog.getInstance().init(context, battery, power, alarm,ActivityManagerService.self());

//Watchdog本身继承Thread，是一个线程类。此为WatchDog初始化。

.在SystemServer Run函数的后半段，将检查系统是否已经准备好运行第三方代码，并通过SystemReady接口通知系统已经就绪。在ActivityManagerService的SystemReady接口的CallBack函数中实现WatchDog的启动

Watchdog.getInstance().start();

//以上代码位于/Frameworks/base/services/java/com/android/server/SystemServer.java中。

3 WatchDog内部架构及主要接口介绍

WatchDog内部主要部件和接口函数为：HeartbeatHandler、RebootReceiver、RebootRequestReceiver、checkReboot、rebootSystem、Monitor、addMonitor。

HeartbeatHandler：此为WatchDog的核心，负责对各个监护对象进行监护。

RebootReceiver：负责接收由AlarManagerService发出的PendingIntent,并进行系统重启。该PendingIntent为WatchDog内部创建，"com.android.service.Watchdog.REBOOT"。

RebootRequestReceiver：负责接收系统内部发出的重启Intent消息，并进行系统重启。

checkReboot：判断是否需要重启系统。

rebootSystem：调用PowerManager的reboot接口重启系统。

Monitor：每个被监护对象必须要实现的接口，由WatchDog在运行中调用，以实现监护功能。

addMonitor：将实现了monitor接口的监护对象注册到WatchDog服务中。

4 WatchDog工作流程

4.1WatchDog监护对象

要实现调用WatchDog对其进行监护，则必须实现

1)WatchDog.Monitor接口，这个接口中只有一个monitor函数。

2)将该对象注册到WatchDog服务中,在初始化中作如下处理：Watchdog.getInstance().addMonitor(this);

在Android中WatchDog运行在SystemServer进程,对其进行监护，而其中监护的服务为以下三个：

ActivityManagerService、WindowManagerService、PowerMangerService。

以ActivityManagerService为例：

/** In this method we try to acquire our lock to make sure that we have not deadlocked */

public void monitor() {

synchronized (this) { }

}

该接口函数其实内部并不做任何处理，只是去锁一下对象，然后返回。如果对象没有死锁，则过程会很顺利，若对象死锁，则该函数就会挂在这里。

其它两个Service对象实现的monitor接口函数与Activity类似，也同样是去获取一下锁而已。

4.2WatchDog监护流程

在WatchDog启动之后，开始跑run函数。该函数内部为一个无限循环。

public void run() {

boolean waitedHalf = false;

while (true) {

mCompleted = false;

mHandler.sendEmptyMessage(MONITOR);

...

while (timeout > 0 && !mForceKillSystem) {

try {

wait(timeout);

} catch (InterruptedException e) {

}

timeout = TIME_TO_WAIT - (SystemClock.uptimeMillis() - start);

//TIME_TO_WAIT的默认时间为30s。此为第一次等待时间，WatchDog判断对象是否死锁的最长处理时间为1Min。

}

...

}

一开始就会发送一个MONITOR的Message，由HeartbeatHandler负责接收并处理。同时会等待30秒，等待HeartbeatHandler的处理结果。然后才会进行下一步动作。

在HeartbeatHandler中将会作如下处理：

public void handleMessage(Message msg) {

switch (msg.what) {

case MONITOR: {

...

final int size = mMonitors.size();

for (int i = 0 ; i < size ; i++) {

mCurrentMonitor = mMonitors.get(i);

mCurrentMonitor.monitor();

}//依次去调用监护对象的monitor接口，实现对其的监护。具体操作内容见4.1。

synchronized (Watchdog.this) {

mCompleted = true;

mCurrentMonitor = null;

}//如果监护的对象都正常，则会很快运行到这里，并对mCompleted赋值为true，表示对象正常返回。mCompleted值初始为false。

同时在run函数中：if (mCompleted && !mForceKillSystem) {

// The monitors have returned.

waitedHalf = false;

continue;

}//如果所有对象在30s内能够返回，则会得到mCompleted = true;则本次监护就结束，返回继续下一轮监护。

如果在30s内，monitor对象未能返回，mCompleted 值即为false，则会运行到该语句：

if (!waitedHalf) {

// We've waited half the deadlock-detection interval. Pull a stack

// trace and wait another half.

ArrayList<Integer> pids = new ArrayList<Integer>();

pids.add(Process.myPid());

ActivityManagerService.dumpStackTraces(true, pids, null, null);

waitedHalf = true;

continue;

}//会调用ActivityManagerService.java中的dumpStackTraces接口函数。

在该接口中，主要会对SystemServer进程的stackTrace的信息dump出来，以及检测目前运行App的CPU使用率。由SystemServer进程发送一个SIGNAL_QUIT的进程信号：

public static File dumpStackTraces(...,...) {

...

Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT);

...

// Next measure CPU usage.

if (processStats != null) {

processStats.init();

System.gc();//运行garbage Collector.

processStats.update();

...

// We'll take the stack crawls of just the top apps using CPU.

final int N = processStats.countWorkingStats();

int numProcs = 0;

for (int i=0; i<N && numProcs<5; i++) {

ProcessStats.Stats stats = processStats.getWorkingStats(i);

if (lastPids.indexOfKey(stats.pid) >= 0) {

numProcs++;

try {

Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT);

}

...

}

该动作发生在第一次等待的30s时间内，monitor对象未返回，由于在调用完ActivityManagerService.java的dumpStackTraces接口函数后，将waitedHalf赋值为true。并返回继续下一轮监护。若紧接着的下一轮监护，在30s内 monitor对象依旧未及时返回，此时

if (mCompleted && !mForceKillSystem){

...

}

if (!waitedHalf){

...

}//此时这两个语句都不会运行，则会直接运行到下面部分。这表示系统的监护对象有死锁现象发生，SystemServer进程需要kill并重启。

// Pass !waitedHalf so that just in case we somehow wind up here without having

// dumped the halfway stacks, we properly re-initialize the trace file.

final File stack = ActivityManagerService.dumpStackTraces(

!waitedHalf, pids, null, null);

// Give some extra time to make sure the stack traces get written.

// The system's been hanging for a minute, another second or two won't hurt much.

SystemClock.sleep(2000);

...

// Only kill the process if the debugger is not attached.

if (!Debug.isDebuggerConnected()) {

Process.killProcess(Process.myPid());

System.exit(10);//在剩下的30s内，做一些收尾工作，如重新初始化trace file。最后直接将SystemServer进程kill，并且退出系统。Init进程会重新启动SystemServer进程，让其回到可用状态。

夏夏_xx

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
Android死机问题之 Watchdog详解

Android Watchdog详解转自http://www.eoeandroid.com/thread-74512-1-1.html。现在的CPU基本上都带有WatchDog功能，这种硬件的WatchDog可以在系统死掉(死锁或者程序跑飞)后重启系统，让系统回到可以工作的状态。WatchDog不能防止系统死掉，但是它能够起死回生，从而提高系统的可用性。硬
复制链接

扫一扫