背景
项目中跑monkey发现anr分析log时,发现进程出现anr时,进程直接被杀掉了,因为需要在anr时抓取内存信息,结果因为进程重启导致抓到的内存信息并不是出问题时的信息。因此研究了一波Android出现anr时的处理逻辑。这里基于mtk平台,部分源码可能有差异。
Android ANR触发流程
一、ANR后触发dump等操作的代码
frameworks/base/services/core/java/com/android/server/am/AppErrors.java
接口为appNotResponding()
final void appNotResponding(ProcessRecord app, ActivityRecord activity,
ActivityRecord parent, boolean aboveSystem, final String annotation) {
ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
Slog.d("testview", "=================================== appNotResponding " + (mService.mController == null ? "controller is null" : "controller not null"));
if (mService.mController != null) {
try {
// 0 == continue, -1 = kill process immediately
int res = mService.mController.appEarlyNotResponding(
app.processName, app.pid, annotation);
Slog.d("testview", "=================================== appNotResponding res is " + res);
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
Slog.d("testview", "=================================== app.kill");
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
......
Slog.d("testview", "=================================== startAnrDump");
/// M: ANR Debug Mechanism
if (mService.mAnrManager.startAnrDump(mService, app, activity, parent, aboveSystem,
annotation, showBackground)) {
Slog.d("testview", "=================================== startAnrDump finished");
return;
}
Slog.d("testview", "=================================== next");
if (mService.mController != null) {
try {
// 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
int res = mService.mController.appNotResponding(
app.processName, app.pid, info.toString());
if (res != 0) {
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
Slog.d("testview", "=================================== res < 0 && app.kill");
} else {
synchronized (mService) {
mService.mServices.scheduleServiceTimeoutLocked(app);
}
}
return;
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
synchronized (mService) {
......
// Bring up the infamous App Not Responding dialog
Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
msg.obj = new AppNotRespondingDialog.Data(app, activity, aboveSystem);
mService.mUiHandler.sendMessage(msg);
}
}
此块代码有四个重要逻辑:
1、根据mService.mController.appEarlyNotResponding返回的值,确定要不要杀掉进程
2、mService.mAnrManager.startAnrDump, dump相关信息
3、根据mService.mController.appNotResponding返回的值,确定要不要杀掉进程
4、mService.mUiHandler.sendMessage(msg)发送handler弹出ANR弹窗
二、调用AppErrors.appNotResponding()的地方
1、BroadcastQueue.java广播超时调用
2、ContentProviderClient.java内容提供者超时调用
3、ActiveServices.java服务超时调用
4、ActivityThread.java input响应超时调用
有兴趣可以执行根据路径研究源码
增加日志排查进程被杀原因
一、如上代码增加的日志,发现跑monkey时,弹出ANR弹窗的日志没打印
代码路径
frameworks/base/services/core/java/com/android/server/am/AppErrors.java
接口为handleShowAnrUi(),是从AMS中调用的ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG
void handleShowAnrUi(Message msg) {
Slog.d("testview", "========================= appErrors handleShowAnrUi");
Dialog dialogToShow = null;
synchronized (mService) {
AppNotRespondingDialog.Data data = (AppNotRespondingDialog.Data) msg.obj;
final ProcessRecord proc = data.proc;
if (proc == null) {
Slog.e(TAG, "handleShowAnrUi: proc is null");
return;
}
if (proc.anrDialog != null) {
Slog.e(TAG, "App already has anr dialog: " + proc);
MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_ANR,
AppNotRespondingDialog.ALREADY_SHOWING);
return;
}
Intent intent = new Intent("android.intent.action.ANR");
if (!mService.mProcessesReady) {
intent.addFlags(Intent.FLAG_RECEIVER_REGISTERED_ONLY
| Intent.FLAG_RECEIVER_FOREGROUND);
}
mService.broadcastIntentLocked(null, null, intent,
null, null, 0, null, null, null, AppOpsManager.OP_NONE,
null, false, false, MY_PID, Process.SYSTEM_UID, 0 /* TODO: Verify */);
boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;
if (mService.canShowErrorDialogs() || showBackground) {
dialogToShow = new AppNotRespondingDialog(mService, mContext, data);
proc.anrDialog = dialogToShow;
} else {
MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_ANR,
AppNotRespondingDialog.CANT_SHOW);
// Just kill the app if there is no dialog to be shown.
mService.killAppAtUsersRequest(proc, null);
}
}
// If we've created a crash dialog, show it without the lock held
if (dialogToShow != null) {
dialogToShow.show();
}
}
二、可以看到唯一怀疑点就是mService.mController.appNotResponding,如果有兴趣研究mController是什么的可以从上边的代码路径中去研究。
查找Monkey源码中是否自定义了controller
一、发现monkey的源码类Monkey.java自定义了controller
private class ActivityController extends IActivityController.Stub {
public boolean activityStarting(Intent intent, String pkg) {
boolean allow = checkEnteringPackage(pkg) || (DEBUG_ALLOW_ANY_STARTS != 0);
if (mVerbose > 0) {
// StrictMode's disk checks end up catching this on
// userdebug/eng builds due to PrintStream going to a
// FileOutputStream in the end (perhaps only when
// redirected to a file?) So we allow disk writes
// around this region for the monkey to minimize
// harmless dropbox uploads from monkeys.
StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskWrites();
System.out.println(" // " + (allow ? "Allowing" : "Rejecting") + " start of "
+ intent + " in package " + pkg);
StrictMode.setThreadPolicy(savedPolicy);
}
currentPackage = pkg;
currentIntent = intent;
return allow;
}
public boolean activityResuming(String pkg) {
StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskWrites();
System.out.println(" // activityResuming(" + pkg + ")");
boolean allow = checkEnteringPackage(pkg) || (DEBUG_ALLOW_ANY_RESTARTS != 0);
if (!allow) {
if (mVerbose > 0) {
System.out.println(" // " + (allow ? "Allowing" : "Rejecting")
+ " resume of package " + pkg);
}
}
currentPackage = pkg;
StrictMode.setThreadPolicy(savedPolicy);
return allow;
}
public boolean appCrashed(String processName, int pid,
String shortMsg, String longMsg,
long timeMillis, String stackTrace) {
StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskWrites();
System.err.println("// CRASH: " + processName + " (pid " + pid + ")");
System.err.println("// Short Msg: " + shortMsg);
System.err.println("// Long Msg: " + longMsg);
System.err.println("// Build Label: " + Build.FINGERPRINT);
System.err.println("// Build Changelist: " + Build.VERSION.INCREMENTAL);
System.err.println("// Build Time: " + Build.TIME);
System.err.println("// " + stackTrace.replace("\n", "\n// "));
StrictMode.setThreadPolicy(savedPolicy);
if (!mIgnoreCrashes || mRequestBugreport) {
synchronized (Monkey.this) {
if (!mIgnoreCrashes) {
mAbort = true;
}
if (mRequestBugreport){
mRequestAppCrashBugreport = true;
mReportProcessName = processName;
}
}
return !mKillProcessAfterError;
}
return false;
}
public int appEarlyNotResponding(String processName, int pid, String annotation) {
return 0;
}
public int appNotResponding(String processName, int pid, String processStats) {
StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskWrites();
System.err.println("// NOT RESPONDING: " + processName + " (pid " + pid + ")");
System.err.println(processStats);
StrictMode.setThreadPolicy(savedPolicy);
synchronized (Monkey.this) {
mRequestAnrTraces = true;
mRequestDumpsysMemInfo = true;
mRequestProcRank = true;
if (mRequestBugreport){
mRequestAnrBugreport = true;
mReportProcessName = processName;
}
}
if (!mIgnoreTimeouts) {
synchronized (Monkey.this) {
mAbort = true;
}
}
return (mKillProcessAfterError) ? -1 : 1;
}
}
controller中对几个接口进行了定制,这里主要看appNotResponding
二、appNotResponding会根据mKillProcessAfterError返回是否-1,AppErrors.java如果mService.mController.appNotResponding返回的值为-1就会杀死进程,也不会弹出ANR弹窗,找到问题原因了。
三、mKillProcessAfterError如何被赋值的
private boolean processOptions() {
// quick (throwaway) check for unadorned command
if (mArgs.length < 1) {
showUsage();
return false;
}
try {
String opt;
while ((opt = nextOption()) != null) {
if (opt.equals("-s")) {
mSeed = nextOptionLong("Seed");
} else if (opt.equals("--kill-process-after-error")) {
mKillProcessAfterError = true;
......
–kill-process-after-error参数会决定要不要杀死出错进程。后来找测试同学确定,他们的monkey测试确实加了这个参数。
结论
最终发现是测试同学误加参数导致,–kill-process-after-error参数会决定要不要杀死出错进程。
如果系统本身相对ANR异常处理,比如不弹出ANR弹窗,自定义实现Controller即可。