1.framework watchdog简介
Android 平台实现了一个软件的WatchDog来监护SystemServer。SystemServer无疑是Android平台中最重要的进程了,里面运行了整个平台中绝大多数的服务。在这个进程中运行着近50个线程,任何一个线程死掉都可能导致整个系统死掉。SystemServer退出反而问题不大,因为 init进程会重新启动它,但是它死锁就麻烦了,因为整个系统就没法动了。
在 SystemServer里运行的服务中,最重要的几个服务应该数ActivityManager、WindowManager和 PowerManager。软件的WatchDog主要就是确保这几个服务发生死锁之后,退出SystemServer进程,让init进程重启它,让系统回到可用状态
2.首先介绍下watchdog的原理,所有平台的watchdog其实都原理很简单,死循环去看护一个定时器,定时器需要定时向监护的thread发信号(喂狗),如果监护对象超时没有返回,那就没法进行下轮循环,watchdog咬死系统,framework重启
3.画了一张极丑的图,虽然丑,但是详细~下面所有的code都是围绕这个图展开的,要认真揣摩这张图~
1>首先,watchdog是由system server初始化并启动,分三小步:
1.1.第一小步startOtherServices
private void startOtherServices() {
......
traceBeginAndSlog("InitWatchdog");
final Watchdog watchdog = Watchdog.getInstance();
watchdog.init(context, mActivityManagerService);
Trace.traceEnd(Trace.TRACE_TAG_SYSTEM_SERVER);
......
}
相应的,我们可以在开机log中看到这句
01-26 16:42:25.984 1596 1596 I SystemServer: InitWatchdog
1.2进入watchdog中的getInstance函数
public static Watchdog getInstance() {
if (sWatchdog == null) {
sWatchdog = new Watchdog();
}
return sWatchdog;
}
1.3. 来看watchdog的构造函数Watchdog()
简单说明,就是把一些重要的thread加入监测对象,参照上图右上角部分,default timeout时间是60s
private Watchdog() {
super("watchdog");
// Initialize handler checkers for each common thread we want to check. Note
// that we are not currently checking the background thread, since it can
// potentially hold longer running operations with no guarantees about the timeliness
// of operations there.
// The shared foreground thread is the main checker. It is where we
// will also dispatch monitor checks and do other work.
mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
"foreground thread", DEFAULT_TIMEOUT);
mHandlerCheckers.add(mMonitorChecker);
// Add checker for main thread. We only do a quick check since there
// can be UI running on the thread.
mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
"main thread", DEFAULT_TIMEOUT));
// Add checker for shared UI thread.
mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
"ui thread", DEFAULT_TIMEOUT));
// And also check IO thread.
mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
"i/o thread", DEFAULT_TIMEOUT));
// And the display thread.
mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
"display thread", DEFAULT_TIMEOUT));
// Initialize monitor for Binder threads.
addMonitor(new BinderThreadMonitor());
}
现在需要来了解一个对象mMonitorCheckers,我们所有需要被监测的thread都保存在这个对象里,是个极其重要的List
mHandlerCheckers和mMonitorChecker的关系如下(参照上图右上角部分):
①mHandlerCheckers是一个list,存储的是5个HandlerChecker类型的对象,分别对应fg,main,ui,io,display5个thread
②mMonitorChecker一个是HandlerChecker类型对象,他和mHandlerCheckers的基地址是相同的,也就是说,
也就是说,mMonitorChecker和mHandlerCheckers的fg thread共用一个对象空间
final ArrayList<HandlerChecker> mHandlerCheckers = new ArrayList<>();
final HandlerChecker mMonitorChecker;
Default 60s
static final boolean DB = false;
static final long DEFAULT_TIMEOUT = DB ? 10*1000 : 60*1000;
/**
* Used for checking status of handle threads and scheduling monitor callbacks.
*/
public final class HandlerChecker implements Runnable {
private final Handler mHandler;
private final String mName;
private final long mWaitMax;
//特别注意mMonitor List,用来管理一些monitor对象
private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();
private boolean mCompleted;
private Monitor mCurrentMonitor;
private long mStartTime;
构造函数,mWaitMax就是thread传进来的timeout时间,上文提到过,default是60s
HandlerChecker(Handler handler, String name, long waitMaxMillis) {
mHandler = handler;
mName = name;
mWaitMax = waitMaxMillis;
mCompleted = true;
}
在HandlerChecker内部定义了一个monitor类型的list即mMonitor,所以需要监测的monitor都add到这个list去
public void addMonitor(Monitor monitor) {
mMonitors.add(monitor);
}
1.5这里还要注意一下watchdog提供的接口函数addMonitor
在初始化5个要check的thread之后,调用addMonitor函数将binder加入monitor
①addMonitor是watchdog提供给我们的接口函数,调用mMonitorChecker的addMonitor函数,并传入monitor
所以,如果想要我们的thread被监控,就需要实现自己的monitor函数并调用addMonitor函数将自己添加到mMonitorChecker中
public void addMonitor(Monitor monitor) {
synchronized (this) {
if (isAlive()) {
throw new RuntimeException("Monitors can't be added once the Watchdog is running");
}
mMonitorChecker.addMonitor(monitor);
}
}
②HandlerChecker类中的addMonitor成员函数把传入的monitor参数添加到mMonitors的list,HandlerChecker只为我们提供了一个接口
public void addMonitor(Monitor monitor) {
mMonitors.add(monitor);
}
实现了monitor接口的thread有:
ActivityManagerService
InputManagerService 举个栗子
MountService
NativeDaemonConnector
NetworkManagementService
PowerManagerService
WindowManagerService
③举个InputManagerService的栗子
实现monitor接口,内容就是简单锁一下自己,看是否发生死锁或者block
// Called by the heartbeat to ensure locks are not held indefinitely (for deadlock detection).
@Override
public void monitor() {
synchronized (mInputFilterLock) { }
nativeMonitor(mPtr);
}
......
来解释一下synchronized关键字:可以用于方法中的某个区块中,表示只对这个区块的资源实行互斥访问。
用法是:private final Object mLock = new Object(); ........... synchronized(syncObject){/*区块*/},
它的作用域是当前对象,syncObject可以是类实例或类
如果线程死锁或者阻塞,必然无法正常获取当前锁,monitor无法正常返回
在其start函数中调用watchdog的addmonitor接口函数将自己加入check List
public void start() {
Slog.i(TAG, "Starting input manager");
nativeStart(mPtr);
// Add ourself to the Watchdog monitors.
Watchdog.getInstance().addMonitor(this);
除addMonitor外,watchdog还提供给我们另一个接口函数addThread
顾名思义,addMonitor是把对象加入mMonitorChecker也就是mHanderCheckers中的fg成员中,自然addThread就是把对象加入mHanderCheckers List中
2>终于可以进入第二小步,内容最简单,watchdog.init
watchdog.init(context, mActivityManagerService);
注册broadcast接收系统内部reboot请求,重启系统
public void init(Context context, ActivityManagerService activity) {
mResolver = context.getContentResolver();
mActivity = activity;
context.registerReceiver(new RebootRequestReceiver(),
new IntentFilter(Intent.ACTION_REBOOT),
android.Manifest.permission.REBOOT, null);
mUEventObserver.startObserving(LOG_STATE_MATCH);
}
3>第三小步,Watchdog.getInstance().start()
由于watchdog继承thread,所以start即调用其run函数,run函数是watchdog的功能核心,前面的两小步都是铺垫
我们先来看watchdog的检测机制
public void run() {
boolean waitedHalf = false;
while (true) {
final ArrayList<HandlerChecker> blockedCheckers;
final String subject;
final boolean allowRestart;
int debuggerWasConnected = 0;
synchronized (this) {
CHECK_INTERVAL = DEFAULT_TIMEOUT / 2;即30s
long timeout = CHECK_INTERVAL;
// Make sure we (re)spin the checkers that have become idle within
// this wait-and-check interval
取出mHandlerCheckers的每个成员,执行其scheduleCheckLocked函数
每个被watchdog监测的成员都需要定时喂狗,这就是喂狗的动作
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
hc.scheduleCheckLocked();
}
3.1.1.喂狗scheduleCheckLocked,参见上图watchdog检测机制的喂狗部分
public void scheduleCheckLocked() {
mMonitors.size为0即不处理mMonitors的mHandlerCheckers List对象,即除去fg thread的其他mHandlerCheckers List成员
当其处于polling轮询mode时,代表没有阻塞,设置mCompleted为true并返回
if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
// If the target looper has recently been polling, then
// there is no reason to enqueue our checker on it since that
// is as good as it not being deadlocked. This avoid having
// to do a context switch to check the thread. Note that we
// only do this if mCheckReboot is false and we have no
// monitors, since those would need to be executed at this point.
mCompleted = true;
return;
}
要清楚一个概念,由于mMonitors和mCompeleted都是HanderChecker中的成员,
所以mMonitors中的所有对象都是共用一个mCompeleted变量
如果上一个monitor还在处理中没有返回,那mCompeleted就还是false,这种情况直接返回
if (!mCompleted) {
// we already have a check in flight, so no need
return;
}
真正的喂狗动作:
mCompleted = false;
mCurrentMonitor = null;
mStartTime = SystemClock.uptimeMillis();//记录喂狗开始时间
mHandler.postAtFrontOfQueue(this);//把自己丢给mhander
}
3.1.2.postAtFrontOfQueue
postAtFrontOfQueue(this)==>run( )
该方法输入参数为Runnable对象,根据消息机制, 最终会回调HandlerChecker中的run方法,该方法会循环遍历所有的Monitor接口,具体的服务实现该接口的monitor()方法
public void run() {
很明显这边mMonitor的内容都是关于fg thread的
final int size = mMonitors.size();
for (int i = 0 ; i < size ; i++) {
synchronized (Watchdog.this) {
mCurrentMonitor = mMonitors.get(i);
}
调用每一个被监测的thread(fg checker中)的monitor接口函数
mCurrentMonitor.monitor();
}
如果monitor函数可以正常执行并返回,设mCompleted为true,代表喂狗完毕
synchronized (Watchdog.this) {
mCompleted = true;
mCurrentMonitor = null;
}
}
每隔30秒会检查System_Server中重要的几把锁(包括WindowManagerService、ActivityManagerService、PowerManagerService、NetworkManagementService、MountService、InputManagerService等)、同时还会检测最重要的7个线程消息队列是否空闲(WindowManagerService、PowerManagerService、PackageManagerService、ActivityManagerService、UiThread、IOThread、MainThread),最终根据mCompleted和mStartTime值来判断是否阻塞超时60S,如果发生超时,那么将打印trace日志和kernel trace日志,最后将SystemServer干掉重启
3.1.3.evaluateCheckerCompletionLocked找到最饿的狗
这个函数很简单,遍历mHandlerCheckers成员中寻找wait state值最大的,先来了解一下所有状态值的定义:
static final int COMPLETED = 0;
static final int WAITING = 1;
static final int WAITED_HALF = 2;
static final int OVERDUE = 3;
private int evaluateCheckerCompletionLocked() {
int state = COMPLETED;
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
state = Math.max(state, hc.getCompletionStateLocked());
}
return state;
}
public int getCompletionStateLocked() {
如果mHandlerCheckers成员已经顺利返回并且置mCompleted true,代表没有死锁也没有block,可以返回COMPLETED了
if (mCompleted) {
return COMPLETED;
} else {
mWaitMax是timeout时间即60s,如果mComPleted为false并且等待时间小于30s则return WAITING相安无事,如果等待时间超过30s则return WAITTED_HALF
long latency = SystemClock.uptimeMillis() - mStartTime;
if (latency < mWaitMax/2) {
return WAITING;
} else if (latency < mWaitMax) {
return WAITED_HALF;
}
}
return OVERDUE;//否则等待时间超过60s,return OVERDUE
}
如果wait state中最大值都是0,那说明所有被监控的线程都没有问题,给waitedHalf设false,然后可以结束这轮循环了
这边需要注意下waitedHalf这个变量,他是watchdog run函数中开始while死循环之前定义的,用来记录这轮状态
if (waitState == COMPLETED) {
// The monitors have returned; reset
waitedHalf = false;
continue;
如果wait state为WAITING即等待时间小于30s,就先结束这轮循环并recheck,注意这次没有清waitedHalf变量了,所以waitedHalf中存着上次的状态
} else if (waitState == WAITING) {
// still waiting but within their configured intervals; back off and recheck
continue;
如果等待时间超过30s,并且waitedHalf为false即首次等待时间超过30s,新建一个pids List,并打印堆栈信息,之后把waitedHalf设为ture
} else if (waitState == WAITED_HALF) {
if (!waitedHalf) {
// We've waited half the deadlock-detection interval. Pull a stack
// trace and wait another half.
ArrayList<Integer> pids = new ArrayList<Integer>();
pids.add(Process.myPid());
ActivityManagerService.dumpStackTraces(true, pids, null, null,
NATIVE_STACKS_OF_INTEREST);
waitedHalf = true;
}
continue;
}
evaluateCheckerCompletionLocked函数返回OVERDUE,代表已经超时
// something is overdue!
blockedCheckers = getBlockedCheckersLocked();//找到所有超时的成员加入blockedCheckers List
subject = describeCheckersLocked(blockedCheckers);//将阻塞线程写到字符串中方便打印到event日志
allowRestart = mAllowRestart;//设allowRestart变量为true
}
3.1.5.已超时getBlockedCheckersLocked
private ArrayList<HandlerChecker> getBlockedCheckersLocked() {
ArrayList<HandlerChecker> checkers = new ArrayList<HandlerChecker>();
for (int i=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc = mHandlerCheckers.get(i);
用isOverdueLocked找到所有超时的成员加入checkers List
if (hc.isOverdueLocked()) {
checkers.add(hc);
}
}
return checkers;
}
isOverdueLocked(),很简单,根据mCompleted和msSartTime依mWaitMax为标准判断是否超时
public boolean isOverdueLocked() {
return (!mCompleted) && (SystemClock.uptimeMillis() > mStartTime + mWaitMax);
}
3.1.6.describeCheckersLocked
private String describeCheckersLocked(ArrayList<HandlerChecker> checkers) {
StringBuilder builder = new StringBuilder(128);
for (int i=0; i<checkers.size(); i++) {
if (builder.length() > 0) {
builder.append(", ");
}
builder.append(checkers.get(i).describeBlockedStateLocked());
}
return builder.toString();
}
public String describeBlockedStateLocked() {
注意这里用mCurrentMonitor来判断是monitor还是hander出的问题,因为mCurrentMonitor是HanderChecker类中变量,
mCurrentMonitor是在进行Mmonitors check时才会去设的,并且如果monitor可以成功return后会置null
所以如果mCurrentMonitor为null代表Mmonitors可以正常返回没有异常,所以问题就一定是出在hander了
if (mCurrentMonitor == null) {
return "Blocked in handler on " + mName + " (" + getThread().getName() + ")";
否则mCurrentMonitor不为null,代表mMonitors出问题
} else {
return "Blocked in monitor " + mCurrentMonitor.getClass().getName()
+ " on " + mName + " (" + getThread().getName() + ")";
}
}
3.2Watchdog-run()处理机制
已超时啦
/ If we got here, that means that the system is most likely hung.First collect stack traces from all threads of the system process.
// Then kill this process so that the system will restart.
EventLog.writeEvent(EventLogTags.WATCHDOG, subject);
ArrayList<Integer> pids = new ArrayList<Integer>();
pids.add(Process.myPid());
if (mPhonePid > 0) pids.add(mPhonePid);
// Pass !waitedHalf so that just in case we somehow wind up here without having
// dumped the halfway stacks, we properly re-initialize the trace file.
打印system server和native进程的栈信息
final File stack = ActivityManagerService.dumpStackTraces(!waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);
// Give some extra time to make sure the stack traces get written.
// The system's been hanging for a minute, another second or two won't hurt much.
SystemClock.sleep(2000);
NATIVE_STACKS_OF_INTEREST string数组,指定我们要在trace中打印出来的native process
public static final String[] NATIVE_STACKS_OF_INTEREST = new String[] {
"/system/bin/audioserver",
"/system/bin/cameraserver",
"/system/bin/drmserver",
"/system/bin/mediadrmserver",
"/system/bin/gx_fpd",
"/system/bin/fingerprintd",
"/system/bin/mediaserver",
"/system/bin/sdcard",
"/system/bin/surfaceflinger",
"media.codec", // system/bin/mediacodec
"media.extractor", // system/bin/mediaextractor
"com.android.bluetooth", // Bluetooth service
};
3.2.1.dumpKernelStackTraces打印kernel stack信息
/ Set this to true to have the watchdog record kernel thread stacks when it fires=> static final boolean RECORD_KERNEL_THREADS = true;
// Pull our own kernel thread stacks as well if we're configured for that
if (RECORD_KERNEL_THREADS) {
dumpKernelStackTraces();
}
private File dumpKernelStackTraces() {
这个prop的值是/data/anr/traces.txt
String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
if (tracesPath == null || tracesPath.length() == 0) {
return null;
}
native_dumpKernelStacks(tracesPath);
return new File(tracesPath);
}
private native void native_dumpKernelStacks(String tracesPath);
}
调用jni->android_server_watchdog.cpp
namespace android {
static const JNINativeMethod g_methods[] = {
{ "native_dumpKernelStacks", "(Ljava/lang/String;)V", (void*)dumpKernelStacks },
};
int register_android_server_Watchdog(JNIEnv* env) {
return RegisterMethodsOrDie(env, "com/android/server/Watchdog", g_methods, NELEM(g_methods));
}
}
3.2.2.dumpKernelStacks(android_server_watchdog.cpp)
static void dumpKernelStacks(JNIEnv* env, jobject clazz, jstring pathStr) {
char buf[128];
DIR* taskdir;
ALOGI("dumpKernelStacks"); jni->android_server_watchdog.cpp
if (!pathStr) {
jniThrowException(env, "java/lang/IllegalArgumentException", "Null path");
return; }
const char *path = env->GetStringUTFChars(pathStr, NULL);
打开/data/anr/trace.txt文件
int outFd = open(path, O_WRONLY | O_APPEND | O_CREAT,
S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH);
if (outFd < 0) {
ALOGE("Unable to open stack dump file: %d (%s)", errno, strerror(errno));
goto done; }
把这句话写入trace文件
snprintf(buf, sizeof(buf), "\n----- begin pid %d kernel stacks -----\n", getpid());
write(outFd, buf, strlen(buf));
寻找当前进程中的所有thread,即读取/proc/pid/task目录
// look up the list of all threads in this process
snprintf(buf, sizeof(buf), "/proc/%d/task", getpid());
taskdir = opendir(buf);
if (taskdir != NULL) {
struct dirent * ent;
打印所有thread的stack信息
while ((ent = readdir(taskdir)) != NULL) {
int tid = atoi(ent->d_name);
if (tid > 0 && tid <= 65535) {
// dump each stack trace
dumpOneStack(tid, outFd);
}
}
closedir(taskdir);
}
snprintf(buf, sizeof(buf), "----- end pid %d kernel stacks -----\n", getpid());
write(outFd, buf, strlen(buf));
close(outFd);
done:
env->ReleaseStringUTFChars(pathStr, path);
}
3.2.3.给生成的trace文件加时间戳,Add timestamp for traces
给目前生成的trace文件更名加上时间戳,防止文件后续被覆盖,注意这里是先生成trace.txt再更名的
String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
String traceFileNameAmendment = "_SystemServer_WDT" + mTraceDateFormat.format(new Date());
if (tracesPath != null && tracesPath.length() != 0) {
File traceRenameFile = new File(tracesPath);
String newTracesPath;
int lpos = tracesPath.lastIndexOf (".");
if (-1 != lpos)
newTracesPath = tracesPath.substring (0, lpos) + traceFileNameAmendment + tracesPath.substring (lpos);
else
newTracesPath = tracesPath + traceFileNameAmendment;
Slog.d(TAG, "Watchdog File:2 " + traceRenameFile + " rename to " + newTracesPath);
traceRenameFile.renameTo(new File(newTracesPath));
tracesPath = newTracesPath;
}
final File newFd = new File(tracesPath);
// Try to add the error to the dropbox, but assuming that the ActivityManager
// itself may be deadlocked. (which has happened, causing this statement to
// deadlock and the watchdog as a whole to be ineffective)
Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
public void run() {
mActivity.addErrorToDropBox(
"watchdog", null, "system_server", null, null,
subject, null, newFd, null);
}
};
dropboxThread.start();
try {
dropboxThread.join(2000); // wait up to 2 seconds for it to return.
} catch (InterruptedException ignored) {}
3.2.4.根据属性值判断,触发watchdog后是否要进ramdump, persist.sys.crashOnWatchdog
通过判断persist.sys.crashOnWatchdog prop的值来判定,触发watchdog的时候是否要进ramdump,通过/proc/sysrq-trigger结点实现
// At times, when user space watchdog traces don't give an indication on
// which component held a lock, because of which other threads are blocked,
// (thereby causing Watchdog), crash the device to analyze RAM dumps
boolean crashOnWatchdog = SystemProperties
.getBoolean("persist.sys.crashOnWatchdog", false);
if (crashOnWatchdog) {
// Trigger the kernel to dump all blocked threads, and backtraces
// on all CPUs to the kernel log
Slog.e(TAG, "Triggering SysRq for system_server watchdog");
doSysRq('w');
doSysRq('l');
// wait until the above blocked threads be dumped into kernel log
SystemClock.sleep(3000);
// now try to crash the target
doSysRq('c');
}
判断mController的值
IActivityController controller;
synchronized (this) {
controller = mController;
}
if (controller != null) {
Slog.i(TAG, "Reporting stuck state to activity controller");
try {
Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
// 1 = keep waiting, -1 = kill system
int res = controller.systemNotResponding(subject);
if (res >= 0) {
Slog.i(TAG, "Activity controller requested to coninue to wait");
waitedHalf = false;
continue;
}
} catch (RemoteException e) {
}
}
mController在setActivityController函数中被赋值,内容为函数参数:IActivityController类型的controller
public void setActivityController(IActivityController controller) {
synchronized (this) {
mController = controller;
}
}
3.2.5.1.具体来看monkey的拦截实现
由system server中的setActivityController函数来实现对外接口,并打包watchdog中的setActivityController函数
public void setActivityController(IActivityController controller, boolean imAMonkey) {
enforceCallingPermission(android.Manifest.permission.SET_ACTIVITY_WATCHER,
"setActivityController()");
synchronized (this) {
mController = controller;
mControllerIsAMonkey = imAMonkey;
Watchdog.getInstance().setActivityController(controller);
}
}
<Monkey.java> (cmds\monkey\src\com\android\commands\monkey)
monkey中调用setActivityController接口,传入自身IActivityController类型参数
try {
mAm.setActivityController(new ActivityController(), true);
mNetworkMonitor.register(mAm);
} catch (RemoteException e) {
System.err.println("** Failed talking with activity manager!");
return false;
}
private class ActivityController extends IActivityController.Stub {
public boolean activityStarting(Intent intent, String pkg) {
boolean allow = MonkeyUtils.getPackageFilter().checkEnteringPackage(pkg)
|| (DEBUG_ALLOW_ANY_STARTS != 0);
if (mVerbose > 0) {
3.2.5.2.Monkey-systemNotResponding,拦截后调用的自然是monkey的systemNotResponding函数
public int systemNotResponding(String message) {
StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskWrites();
System.err.println("// WATCHDOG: " + message);
StrictMode.setThreadPolicy(savedPolicy);
synchronized (Monkey.this) {
if (!mIgnoreCrashes) {
mAbort = true;
}
if (mRequestBugreport) {
mRequestWatchdogBugreport = true;
}
mWatchdogWaiting = true;
}
synchronized (Monkey.this) {
while (mWatchdogWaiting) {
try {
Monkey.this.wait();
} catch (InterruptedException e) {
}
}
}
return (mKillProcessAfterError) ? -1 : 1;
}
根据mKillProcessAfterError值决定函数返回结果,此值默认false,但是当monkey中定义了--kill-process-after-error参数时才会设true
所以,上述systemNotResponding函数返回1,自然watchdog会继续wait,继续continue进行下次循环,而不会kill掉system server重启framework
} else if (opt.equals("--kill-process-after-error")) {
mKillProcessAfterError = true;
watchdog和monkey之间通过binder通信,当binder通信异常会释放当前transaction,所以watchdong就会开始kill掉system server进行重启framework了
kill掉system server,system重启
// Only kill the process if the debugger is not attached.
if (Debug.isDebuggerConnected()) {
debuggerWasConnected = 2;
}
if (debuggerWasConnected >= 2) {
Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
} else if (debuggerWasConnected > 0) {
Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
} else if (!allowRestart) {
Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
} else {
Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
for (int i=0; i<blockedCheckers.size(); i++) {
Slog.w(TAG, blockedCheckers.get(i).getName() + " stack trace:");
StackTraceElement[] stackTrace
= blockedCheckers.get(i).getThread().getStackTrace();
for (StackTraceElement element: stackTrace) {
Slog.w(TAG, " at " + element);
}
}
Slog.w(TAG, "*** GOODBYE!");
Process.killProcess(Process.myPid());
System.exit(10);
}
waitedHalf = false;