framework watchdog源码分析

1.framework watchdog简介

Android 平台实现了一个软件的WatchDog来监护SystemServer。SystemServer无疑是Android平台中最重要的进程了,里面运行了整个平台中绝大多数的服务。在这个进程中运行着近50个线程,任何一个线程死掉都可能导致整个系统死掉。SystemServer退出反而问题不大,因为 init进程会重新启动它,但是它死锁就麻烦了,因为整个系统就没法动了。
       在 SystemServer里运行的服务中,最重要的几个服务应该数ActivityManager、WindowManager和 PowerManager。软件的WatchDog主要就是确保这几个服务发生死锁之后,退出SystemServer进程,让init进程重启它,让系统回到可用状态


2.首先介绍下watchdog的原理,所有平台的watchdog其实都原理很简单,死循环去看护一个定时器,定时器需要定时向监护的thread发信号(喂狗),如果监护对象超时没有返回,那就没法进行下轮循环,watchdog咬死系统,framework重启

3.画了一张极丑的图,虽然丑,但是详细~下面所有的code都是围绕这个图展开的,要认真揣摩这张图~


1>首先,watchdog是由system server初始化并启动,分三小步:

1.1.第一小步startOtherServices

private void startOtherServices() {
......
   traceBeginAndSlog("InitWatchdog");
   final Watchdog watchdog = Watchdog.getInstance();
   watchdog.init(context, mActivityManagerService);
   Trace.traceEnd(Trace.TRACE_TAG_SYSTEM_SERVER);
......
}
相应的,我们可以在开机log中看到这句
01-26 16:42:25.984  1596  1596 I SystemServer: InitWatchdog

1.2进入watchdog中的getInstance函数

public static Watchdog getInstance() {
if (sWatchdog == null) {
   sWatchdog = new Watchdog();
}
return sWatchdog;
}

1.3. 来看watchdog的构造函数Watchdog()

简单说明,就是把一些重要的thread加入监测对象,参照上图右上角部分,default timeout时间是60s

private Watchdog() {
       super("watchdog");
       // Initialize handler checkers for each common thread we want to check.  Note
       // that we are not currently checking the background thread, since it can
       // potentially hold longer running operations with no guarantees about the timeliness
       // of operations there.
       // The shared foreground thread is the main checker.  It is where we
       // will also dispatch monitor checks and do other work.
       mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
               "foreground thread", DEFAULT_TIMEOUT);
       mHandlerCheckers.add(mMonitorChecker);
       // Add checker for main thread.  We only do a quick check since there
       // can be UI running on the thread.
       mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
               "main thread", DEFAULT_TIMEOUT));
       // Add checker for shared UI thread.
       mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
               "ui thread", DEFAULT_TIMEOUT));
       // And also check IO thread.
       mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
               "i/o thread", DEFAULT_TIMEOUT));
       // And the display thread.
       mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
               "display thread", DEFAULT_TIMEOUT));
       // Initialize monitor for Binder threads.
       addMonitor(new BinderThreadMonitor());
   }

现在需要来了解一个对象mMonitorCheckers,我们所有需要被监测的thread都保存在这个对象里,是个极其重要的List

mHandlerCheckers和mMonitorChecker的关系如下(参照上图右上角部分):

①mHandlerCheckers是一个list,存储的是5个HandlerChecker类型的对象,分别对应fg,main,ui,io,display5个thread
②mMonitorChecker一个是HandlerChecker类型对象,他和mHandlerCheckers的基地址是相同的,也就是说,
也就是说,mMonitorChecker和mHandlerCheckers的fg thread共用一个对象空间

   final ArrayList<HandlerChecker> mHandlerCheckers = new ArrayList<>();
   final HandlerChecker mMonitorChecker;

Default 60s
static final boolean DB = false;
static final long DEFAULT_TIMEOUT = DB ? 10*1000 : 60*1000;


1.4是不是有点晕?没关系我们来看一下HandlerChecker这个class就会豁然开朗,先来看前面定义变量的部分

   /**
    * Used for checking status of handle threads and scheduling monitor callbacks.
    */
   public final class HandlerChecker implements Runnable {
       private final Handler mHandler;
       private final String mName;
       private final long mWaitMax;
//特别注意mMonitor List,用来管理一些monitor对象
       private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();        
       private boolean mCompleted;
       private Monitor mCurrentMonitor;
       private long mStartTime;

构造函数,mWaitMax就是thread传进来的timeout时间,上文提到过,default是60s
       HandlerChecker(Handler handler, String name, long waitMaxMillis) {
           mHandler = handler;
           mName = name;
           mWaitMax = waitMaxMillis;
           mCompleted = true;
       }

在HandlerChecker内部定义了一个monitor类型的list即mMonitor,所以需要监测的monitor都add到这个list去
       public void addMonitor(Monitor monitor) {
           mMonitors.add(monitor);
       }

1.5这里还要注意一下watchdog提供的接口函数addMonitor

在初始化5个要check的thread之后,调用addMonitor函数将binder加入monitor
①addMonitor是watchdog提供给我们的接口函数,调用mMonitorChecker的addMonitor函数,并传入monitor
所以,如果想要我们的thread被监控,就需要实
现自己的monitor函数并调用addMonitor函数将自己添加到mMonitorChecker中

   public void addMonitor(Monitor monitor) {
       synchronized (this) {
           if (isAlive()) {
               throw new RuntimeException("Monitors can't be added once the Watchdog is running");
           }
           mMonitorChecker.addMonitor(monitor);
       }
   }
②HandlerChecker类中的addMonitor成员函数
把传入的monitor参数添加到mMonitors的list,HandlerChecker只为我们提供了一个接口
       public void addMonitor(Monitor monitor) {
           mMonitors.add(monitor);
       }
实现了monitor接口的thread有:
ActivityManagerService
InputManagerService 举个栗子
MountService
NativeDaemonConnector
NetworkManagementService
PowerManagerService
WindowManagerService

③举个InputManagerService的栗子

实现monitor接口,内容就是简单锁一下自己,看是否发生死锁或者block

// Called by the heartbeat to ensure locks are not held indefinitely (for deadlock detection).
   @Override
   public void monitor() {
       synchronized (mInputFilterLock) { }
       nativeMonitor(mPtr);
   }
......
来解释一下synchronized关键字:可以用于方法中的某个区块中,表示只对这个区块的资源实行互斥访问。
用法是:private final Object mLock = new Object(); ........... synchronized(syncObject){/*区块*/},
它的作用域是当前对象,syncObject可以是类实例或类
如果线程死锁或者阻塞,必然无法正常获取当前锁,monitor无法正常返回

在其start函数中调用watchdog的addmonitor接口函数将自己加入check List
   public void start() {
       Slog.i(TAG, "Starting input manager");
       nativeStart(mPtr);
       // Add ourself to the Watchdog monitors.
       Watchdog.getInstance().addMonitor(this);
除addMonitor外,watchdog还提供给我们另一个接口函数addThread
顾名思义,addMonitor是把对象加入mMonitorChecker也就是mHanderCheckers中的fg成员中,自然addThread就是把对象加入mHanderCheckers List中

2>终于可以进入第二小步,内容最简单,watchdog.init 

watchdog.init(context, mActivityManagerService);

注册broadcast接收系统内部reboot请求,重启系统
   public void init(Context context, ActivityManagerService activity) {
       mResolver = context.getContentResolver();
       mActivity = activity;
       context.registerReceiver(new RebootRequestReceiver(),
               new IntentFilter(Intent.ACTION_REBOOT),
               android.Manifest.permission.REBOOT, null);
       mUEventObserver.startObserving(LOG_STATE_MATCH);
   }

3>第三小步,Watchdog.getInstance().start()

由于watchdog继承thread,所以start即调用其run函数,run函数是watchdog的功能核心,前面的两小步都是铺垫

我们先来看watchdog的检测机制

public void run() {
       boolean waitedHalf = false;
       while (true) {
           final ArrayList<HandlerChecker> blockedCheckers;
           final String subject;
           final boolean allowRestart;
           int debuggerWasConnected = 0;
           synchronized (this) {

CHECK_INTERVAL = DEFAULT_TIMEOUT / 2;即30s

               long timeout = CHECK_INTERVAL;
               // Make sure we (re)spin the checkers that have become idle within
               // this wait-and-check interval

取出mHandlerCheckers的每个成员,执行其scheduleCheckLocked函数
每个被watchdog监测的成员都需要定时喂狗,这就是喂狗的动作
               for (int i=0; i<mHandlerCheckers.size(); i++) {
                   HandlerChecker hc = mHandlerCheckers.get(i);
                   hc.scheduleCheckLocked();
               }
3.1.1.喂狗scheduleCheckLocked,参见上图watchdog检测机制的喂狗部分

       public void scheduleCheckLocked() {
mMonitors.size为0即不处理mMonitors的mHandlerCheckers List对象,即除去fg thread的其他mHandlerCheckers List成员
当其处于polling轮询mode时,代表没有阻塞,设置mCompleted为true并返回
           if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
               // If the target looper has recently been polling, then
               // there is no reason to enqueue our checker on it since that
               // is as good as it not being deadlocked.  This avoid having
               // to do a context switch to check the thread.  Note that we
               // only do this if mCheckReboot is false and we have no
               // monitors, since those would need to be executed at this point.
               mCompleted = true;
               return;
           }
要清楚一个概念,由于mMonitors和mCompeleted都是HanderChecker中的成员,
所以mMonitors中的所有对象都是共用一个mCompeleted变量
如果上一个monitor还在处理中没有返回,那mCompeleted就还是false,这种情况直接返回
           if (!mCompleted) {
               // we already have a check in flight, so no need
               return;
           }
真正的喂狗动作:
           mCompleted = false;
           mCurrentMonitor = null;
           mStartTime = SystemClock.uptimeMillis();//记录喂狗开始时间
           mHandler.postAtFrontOfQueue(this);//把自己丢给mhander
       }

3.1.2.postAtFrontOfQueue

postAtFrontOfQueue(this)==>run( )
该方法输入参数为Runnable对象,根据消息机制, 最终会回调HandlerChecker中的run方法,该方法会循环遍历所有的Monitor接口,具体的服务实现该接口的monitor()方法

       public void run() {
很明显这边mMonitor的内容都是关于fg thread的
           final int size = mMonitors.size();
           for (int i = 0 ; i < size ; i++) {
               synchronized (Watchdog.this) {
                   mCurrentMonitor = mMonitors.get(i);
               }
调用每一个被监测的thread(fg checker中)的monitor接口函数
               mCurrentMonitor.monitor();
           }
如果monitor函数可以正常执行并返回,设mCompleted为true,代表喂狗完毕

           synchronized (Watchdog.this) {
               mCompleted = true;
               mCurrentMonitor = null;
           }
       }

每隔30秒会检查System_Server中重要的几把锁(包括WindowManagerService、ActivityManagerService、PowerManagerService、NetworkManagementService、MountService、InputManagerService等)、同时还会检测最重要的7个线程消息队列是否空闲(WindowManagerService、PowerManagerService、PackageManagerService、ActivityManagerService、UiThread、IOThread、MainThread),最终根据mCompleted和mStartTime值来判断是否阻塞超时60S,如果发生超时,那么将打印trace日志和kernel trace日志,最后将SystemServer干掉重启


3.1.3.evaluateCheckerCompletionLocked找到最饿的狗

这个函数很简单,遍历mHandlerCheckers成员中寻找wait state值最大的,先来了解一下所有状态值的定义:
    static final int COMPLETED = 0;
    static final int WAITING = 1;
    static final int WAITED_HALF = 2;
    static final int OVERDUE = 3;
   private int evaluateCheckerCompletionLocked() {
       int state = COMPLETED;
       for (int i=0; i<mHandlerCheckers.size(); i++) {
           HandlerChecker hc = mHandlerCheckers.get(i);
           state = Math.max(state, hc.getCompletionStateLocked());
       }
       return state;
   }

   public int getCompletionStateLocked() {
如果mHandlerCheckers成员已经顺利返回并且置mCompleted true,代表没有死锁也没有block,可以返回COMPLETED了
           if (mCompleted) {
               return COMPLETED;
           } else {
mWaitMax是timeout时间即60s,如果mComPleted为false并且等待时间小于30s则return WAITING相安无事,如果等待时间超过30s则return WAITTED_HALF
               long latency = SystemClock.uptimeMillis() - mStartTime;
               if (latency < mWaitMax/2) {
                   return WAITING;
               } else if (latency < mWaitMax) {
                   return WAITED_HALF;
               }
           }
           return OVERDUE;//否则等待时间超过60s,return OVERDUE
     }


3.1.4.Watchdog.run- result,分析一下上一步的结果

如果wait state中最大值都是0,那说明所有被监控的线程都没有问题,给waitedHalf设false,然后可以结束这轮循环了
这边需要注意下waitedHalf这个变量,他是watchdog run函数中开始while死循环之前定义的,用来记录这轮状态
               if (waitState == COMPLETED) {
                   // The monitors have returned; reset
                   waitedHalf = false;
                   continue;

如果wait state为WAITING即等待时间小于30s,就先结束这轮循环并recheck,注意这次没有清waitedHalf变量了,所以waitedHalf中存着上次的状态
               } else if (waitState == WAITING) {
                   // still waiting but within their configured intervals; back off and recheck
                   continue;

如果等待时间超过30s,并且waitedHalf为false即首次等待时间超过30s,新建一个pids List,并打印堆栈信息,之后把waitedHalf设为ture
               } else if (waitState == WAITED_HALF) {
                   if (!waitedHalf) {
                       // We've waited half the deadlock-detection interval.  Pull a stack
                       // trace and wait another half.
                       ArrayList<Integer> pids = new ArrayList<Integer>();
                       pids.add(Process.myPid());
                       ActivityManagerService.dumpStackTraces(true, pids, null, null,
                               NATIVE_STACKS_OF_INTEREST);
                       waitedHalf = true;
                   }
                   continue;
               }
evaluateCheckerCompletionLocked函数返回OVERDUE,代表已经超时
               // something is overdue!
               blockedCheckers = getBlockedCheckersLocked();//找到所有超时的成员加入blockedCheckers List
               subject = describeCheckersLocked(blockedCheckers);//将阻塞线程写到字符串中方便打印到event日志
               allowRestart = mAllowRestart;//设allowRestart变量为true
           }

3.1.5.已超时getBlockedCheckersLocked

   private ArrayList<HandlerChecker> getBlockedCheckersLocked() {
       ArrayList<HandlerChecker> checkers = new ArrayList<HandlerChecker>();
       for (int i=0; i<mHandlerCheckers.size(); i++) {
           HandlerChecker hc = mHandlerCheckers.get(i);
用isOverdueLocked找到所有超时的成员加入checkers List
           if (hc.isOverdueLocked()) {
               checkers.add(hc);
           }
       }
       return checkers;
   }

isOverdueLocked(),很简单,根据mCompleted和msSartTime依mWaitMax为标准判断是否超时        
                public boolean isOverdueLocked() {
           return (!mCompleted) && (SystemClock.uptimeMillis() > mStartTime + mWaitMax);
       }
3.1.6.describeCheckersLocked

   private String describeCheckersLocked(ArrayList<HandlerChecker> checkers) {
       StringBuilder builder = new StringBuilder(128);
       for (int i=0; i<checkers.size(); i++) {
           if (builder.length() > 0) {
               builder.append(", ");
           }
           builder.append(checkers.get(i).describeBlockedStateLocked());
       }
       return builder.toString();
   }


       public String describeBlockedStateLocked() {
注意这里用mCurrentMonitor来判断是monitor还是hander出的问题,因为mCurrentMonitor是HanderChecker类中变量,
mCurrentMonitor是在进行Mmonitors check时才会去设的,并且如果monitor可以成功return后会置null
所以如果mCurrentMonitor为null代表Mmonitors可以正常返回没有异常,所以问题就一定是出在hander了
           if (mCurrentMonitor == null) {
               return "Blocked in handler on " + mName + " (" + getThread().getName() + ")";

否则mCurrentMonitor不为null,代表mMonitors出问题
           } else {
               return "Blocked in monitor " + mCurrentMonitor.getClass().getName()
                       + " on " + mName + " (" + getThread().getName() + ")";
           }
       }

3.2Watchdog-run()处理机制

已超时啦

/ If we got here, that means that the system is most likely hung.First collect stack traces from all threads of the system process.
// Then kill this process so that the system will restart.
           EventLog.writeEvent(EventLogTags.WATCHDOG, subject);
           ArrayList<Integer> pids = new ArrayList<Integer>();
           pids.add(Process.myPid());
           if (mPhonePid > 0) pids.add(mPhonePid);
           // Pass !waitedHalf so that just in case we somehow wind up here without having
           // dumped the halfway stacks, we properly re-initialize the trace file.

打印system server和native进程的栈信息
             final File stack = ActivityManagerService.dumpStackTraces(!waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);
           // Give some extra time to make sure the stack traces get written.
           // The system's been hanging for a minute, another second or two won't hurt much.
           SystemClock.sleep(2000);
NATIVE_STACKS_OF_INTEREST string数组,指定我们要在trace中打印出来的native process
   public static final String[] NATIVE_STACKS_OF_INTEREST = new String[] {
       "/system/bin/audioserver",
       "/system/bin/cameraserver",
       "/system/bin/drmserver",
       "/system/bin/mediadrmserver",
"/system/bin/gx_fpd",
       "/system/bin/fingerprintd",
       "/system/bin/mediaserver",
       "/system/bin/sdcard",
       "/system/bin/surfaceflinger",
       "media.codec",     // system/bin/mediacodec
       "media.extractor", // system/bin/mediaextractor
       "com.android.bluetooth",  // Bluetooth service
   };

3.2.1.dumpKernelStackTraces打印kernel stack信息

/ Set this to true to have the watchdog record kernel thread stacks when it fires=> static final boolean RECORD_KERNEL_THREADS = true;

           // Pull our own kernel thread stacks as well if we're configured for that
           if (RECORD_KERNEL_THREADS) {
               dumpKernelStackTraces();
           }

   private File dumpKernelStackTraces() {
这个prop的值是/data/anr/traces.txt
       String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
       if (tracesPath == null || tracesPath.length() == 0) {
           return null;
       }
       native_dumpKernelStacks(tracesPath);
       return new File(tracesPath);
   }
   private native void native_dumpKernelStacks(String tracesPath);
}

调用jni->android_server_watchdog.cpp
namespace android {
static const JNINativeMethod g_methods[] = {
   { "native_dumpKernelStacks", "(Ljava/lang/String;)V", (void*)dumpKernelStacks },
};
int register_android_server_Watchdog(JNIEnv* env) {
   return RegisterMethodsOrDie(env, "com/android/server/Watchdog", g_methods, NELEM(g_methods));
}
}

3.2.2.dumpKernelStacks(android_server_watchdog.cpp)

static void dumpKernelStacks(JNIEnv* env, jobject clazz, jstring pathStr) {		
   char buf[128];
   DIR* taskdir;
   ALOGI("dumpKernelStacks");				 jni->android_server_watchdog.cpp
   if (!pathStr) {
       jniThrowException(env, "java/lang/IllegalArgumentException", "Null path");
       return; }
   const char *path = env->GetStringUTFChars(pathStr, NULL);
打开/data/anr/trace.txt文件
   int outFd = open(path, O_WRONLY | O_APPEND | O_CREAT,
       S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH);
   if (outFd < 0) {
       ALOGE("Unable to open stack dump file: %d (%s)", errno, strerror(errno));
       goto done; }
把这句话写入trace文件
   snprintf(buf, sizeof(buf), "\n----- begin pid %d kernel stacks -----\n", getpid());
   write(outFd, buf, strlen(buf));
寻找当前进程中的所有thread,即读取/proc/pid/task目录
   // look up the list of all threads in this process
   snprintf(buf, sizeof(buf), "/proc/%d/task", getpid());
   taskdir = opendir(buf);
   if (taskdir != NULL) {
       struct dirent * ent;
打印所有thread的stack信息
       while ((ent = readdir(taskdir)) != NULL) {
           int tid = atoi(ent->d_name);
           if (tid > 0 && tid <= 65535) {
               // dump each stack trace
               dumpOneStack(tid, outFd);
           }
       }
       closedir(taskdir);
   }
   snprintf(buf, sizeof(buf), "----- end pid %d kernel stacks -----\n", getpid());
   write(outFd, buf, strlen(buf));
   close(outFd);
done:
   env->ReleaseStringUTFChars(pathStr, path);
}

3.2.3.给生成的trace文件加时间戳,Add timestamp for traces

给目前生成的trace文件更名加上时间戳,防止文件后续被覆盖,注意这里是先生成trace.txt再更名的
           String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
           String traceFileNameAmendment = "_SystemServer_WDT" + mTraceDateFormat.format(new Date());
           if (tracesPath != null && tracesPath.length() != 0) {
               File traceRenameFile = new File(tracesPath);
               String newTracesPath;
               int lpos = tracesPath.lastIndexOf (".");
               if (-1 != lpos)
                   newTracesPath = tracesPath.substring (0, lpos) + traceFileNameAmendment + tracesPath.substring (lpos);
               else
                   newTracesPath = tracesPath + traceFileNameAmendment;
Slog.d(TAG, "Watchdog File:2 " + traceRenameFile + " rename to " + newTracesPath);
               traceRenameFile.renameTo(new File(newTracesPath));
               tracesPath = newTracesPath;
           }
           final File newFd = new File(tracesPath);
           // Try to add the error to the dropbox, but assuming that the ActivityManager
           // itself may be deadlocked.  (which has happened, causing this statement to
           // deadlock and the watchdog as a whole to be ineffective)
           Thread dropboxThread = new Thread("watchdogWriteToDropbox") {
                   public void run() {
                       mActivity.addErrorToDropBox(
                               "watchdog", null, "system_server", null, null,
                               subject, null, newFd, null);
                   }
               };
           dropboxThread.start();
           try {
               dropboxThread.join(2000);  // wait up to 2 seconds for it to return.
           } catch (InterruptedException ignored) {}

3.2.4.根据属性值判断,触发watchdog后是否要进ramdump, persist.sys.crashOnWatchdog

通过判断persist.sys.crashOnWatchdog prop的值来判定,触发watchdog的时候是否要进ramdump,通过/proc/sysrq-trigger结点实现

           // At times, when user space watchdog traces don't give an indication on
           // which component held a lock, because of which other threads are blocked,
           // (thereby causing Watchdog), crash the device to analyze RAM dumps
           boolean crashOnWatchdog = SystemProperties
                                       .getBoolean("persist.sys.crashOnWatchdog", false);
           if (crashOnWatchdog) {
               // Trigger the kernel to dump all blocked threads, and backtraces
               // on all CPUs to the kernel log
               Slog.e(TAG, "Triggering SysRq for system_server watchdog");
               doSysRq('w');
               doSysRq('l');
               // wait until the above blocked threads be dumped into kernel log
               SystemClock.sleep(3000);
               // now try to crash the target
               doSysRq('c');
      }


3.2.5.,最后,是monkey对watchdog的拦截部分

判断mController的值     

                 IActivityController controller;
           synchronized (this) {
               controller = mController;
           }
           if (controller != null) {
               Slog.i(TAG, "Reporting stuck state to activity controller");
               try {
                   Binder.setDumpDisabled("Service dumps disabled due to hung system process.");
                   // 1 = keep waiting, -1 = kill system
                   int res = controller.systemNotResponding(subject);
                   if (res >= 0) {
                       Slog.i(TAG, "Activity controller requested to coninue to wait");
                       waitedHalf = false;
                       continue;
                   }
               } catch (RemoteException e) {
               }
           }

mController在setActivityController函数中被赋值,内容为函数参数:IActivityController类型的controller   
public void setActivityController(IActivityController controller) {
       synchronized (this) {
           mController = controller;
       }
   }

3.2.5.1.具体来看monkey的拦截实现

由system server中的setActivityController函数来实现对外接口,并打包watchdog中的setActivityController函数
   public void setActivityController(IActivityController controller, boolean imAMonkey) {
       enforceCallingPermission(android.Manifest.permission.SET_ACTIVITY_WATCHER,
               "setActivityController()");
       synchronized (this) {
           mController = controller;
           mControllerIsAMonkey = imAMonkey;
           Watchdog.getInstance().setActivityController(controller);
       }
   }

<Monkey.java> (cmds\monkey\src\com\android\commands\monkey)
monkey中调用setActivityController接口,传入自身IActivityController类型参数
       try {
           mAm.setActivityController(new ActivityController(), true);
           mNetworkMonitor.register(mAm);
       } catch (RemoteException e) {
           System.err.println("** Failed talking with activity manager!");
           return false;
       }

private class ActivityController extends IActivityController.Stub {
       public boolean activityStarting(Intent intent, String pkg) {
           boolean allow = MonkeyUtils.getPackageFilter().checkEnteringPackage(pkg)
                   || (DEBUG_ALLOW_ANY_STARTS != 0);
           if (mVerbose > 0) {

3.2.5.2.Monkey-systemNotResponding,拦截后调用的自然是monkey的systemNotResponding函数

       public int systemNotResponding(String message) {
           StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskWrites();
           System.err.println("// WATCHDOG: " + message);
           StrictMode.setThreadPolicy(savedPolicy);
           synchronized (Monkey.this) {
               if (!mIgnoreCrashes) {
                   mAbort = true;
               }
               if (mRequestBugreport) {
                   mRequestWatchdogBugreport = true;
               }
               mWatchdogWaiting = true;
           }
           synchronized (Monkey.this) {
               while (mWatchdogWaiting) {
                   try {
                       Monkey.this.wait();
                   } catch (InterruptedException e) {
                   }
               }
           }
           return (mKillProcessAfterError) ? -1 : 1;
       }
根据mKillProcessAfterError值决定函数返回结果,此值默认false,但是当monkey中定义了--kill-process-after-error参数时才会设true
所以,上述systemNotResponding函数返回1,自然watchdog会继续wait,继续continue进行下次循环,而不会kill掉system server重启framework
               } else if (opt.equals("--kill-process-after-error")) {
                   mKillProcessAfterError = true;

watchdog和monkey之间通过binder通信,当binder通信异常会释放当前transaction,所以watchdong就会开始kill掉system server进行重启framework了


3.2.6.最后的最后,如果没有monkey拦截,就是framework的重启了。Kill system_server& reboot framework

kill掉system server,system重启
           // Only kill the process if the debugger is not attached.
           if (Debug.isDebuggerConnected()) {
               debuggerWasConnected = 2;
           }
           if (debuggerWasConnected >= 2) {
               Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");
           } else if (debuggerWasConnected > 0) {
               Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");
           } else if (!allowRestart) {
               Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");
           } else {
               Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
               for (int i=0; i<blockedCheckers.size(); i++) {
                   Slog.w(TAG, blockedCheckers.get(i).getName() + " stack trace:");
                   StackTraceElement[] stackTrace
                           = blockedCheckers.get(i).getThread().getStackTrace();
                   for (StackTraceElement element: stackTrace) {
                       Slog.w(TAG, "    at " + element);
                   }
               }
               Slog.w(TAG, "*** GOODBYE!");
               Process.killProcess(Process.myPid());
               System.exit(10);
           }
           waitedHalf = false;










评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值