一、概述
ANR(Application Not responding),是指应用程序未响应,Android系统对于一些事件需要在一定的时间范围内完成,如果超过预定时间能未能得到有效响应或者响应时间过长,都会造成ANR。一般地,这时往往会弹出一个提示框,告知用户当前xxx未响应,用户可选择继续等待或者Force Close,并不是所有的ANR都会有提示框,文字后面会给出答案
那么哪些场景会造成ANR呢?
- Service Timeout:比如前台服务在20s内未执行完成,后台服务60s未完成;
- BroadcastQueue Timeout:比如前台广播在10s内未执行完成,后台广播200s未完成
- ContentProvider Timeout:内容提供者,在publish过超时10s;
- InputDispatching Timeout: 输入事件分发超时5s,包括按键和触摸事件。
二、Service Timeout的情况:
2.1
Service Timeout是位于”ActivityManager”线程中的AMS.MainHandler收到SERVICE_TIMEOUT_MSG
消息时触发。
对于Service有两类:
-
对于前台服务,则超时为SERVICE_TIMEOUT = 20s;
-
对于后台服务,则超时为SERVICE_BACKGROUND_TIMEOUT = 200s
由变量ProcessRecord.execServicesFg来决定是否前台启动
2.2 startService
其中在Service进程attach到system_server进程的过程中会调用realStartServiceLocked()
方法
private final void realStartServiceLocked(ServiceRecord r,
ProcessRecord app, boolean execInFg) throws RemoteException {
if (app.thread == null) {
throw new RemoteException();
}
if (DEBUG_MU)
Slog.v(TAG_MU, "realStartServiceLocked, ServiceRecord.uid = " + r.appInfo.uid
+ ", ProcessRecord.uid = " + app.uid);
r.setProcess(app);
r.restartTime = r.lastActivity = SystemClock.uptimeMillis();
final boolean newService = app.services.add(r);
//
bumpServiceExecutingLocked(r, execInFg, "create");
mAm.updateLruProcessLocked(app, false, null);
updateServiceForegroundLocked(r.app, /* oomAdj= */ false);
mAm.updateOomAdjLocked(OomAdjuster.OOM_ADJ_REASON_START_SERVICE);
boolean created = false;
try {
if (LOG_SERVICE_START_STOP) {
String nameTerm;
int lastPeriod = r.shortInstanceName.lastIndexOf('.');
nameTerm = lastPeriod >= 0 ? r.shortInstanceName.substring(lastPeriod)
: r.shortInstanceName;
EventLogTags.writeAmCreateService(
r.userId, System.identityHashCode(r), nameTerm, r.app.uid, r.app.pid);
}
StatsLog.write(StatsLog.SERVICE_LAUNCH_REPORTED, r.appInfo.uid, r.name.getPackageName(),
r.name.getClassName());
synchronized (r.stats.getBatteryStats()) {
r.stats.startLaunchedLocked();
}
mAm.notifyPackageUse(r.serviceInfo.packageName,
PackageManager.NOTIFY_PACKAGE_USE_SERVICE);
app.forceProcessStateUpTo(ActivityManager.PROCESS_STATE_SERVICE);
//进入ActivityThread,启动service ,ActivityThread是什么呢?所有通过zygote 孵化出来的APP进程启动入口,APP进程孵化出来会执行ActivityThread.main.
app.thread.scheduleCreateService(r, r.serviceInfo,
mAm.compatibilityInfoForPackage(r.serviceInfo.applicationInfo),
app.getReportedProcState());
r.postNotification();
created = true;
.......
}
上述代码有两个方法比较重要:bumpServiceExecutingLocked和scheduleCreateService,bumpServiceExecutingLocked方法最终会调用 mAm.mHandler.sendMessageDelayed(msg,
proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT); 发送一个延时message。所以ANR的计时时间就是在bumpServiceExecutingLocked。scheduleCreateService方法是开始把代码挂载到进程中。
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, ">>> EXECUTING "
+ why + " of " + r + " in app " + r.app);
else if (DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING, ">>> EXECUTING "
+ why + " of " + r.shortInstanceName);
// For b/34123235: Services within the system server won't start until SystemServer
// does Looper.loop(), so we shouldn't try to start/bind to them too early in the boot
// process. However, since there's a little point of showing the ANR dialog in that case,
// let's suppress the timeout until PHASE_THIRD_PARTY_APPS_CAN_START.
//
// (Note there are multiple services start at PHASE_THIRD_PARTY_APPS_CAN_START too,
// which technically could also trigger this timeout if there's a system server
// that takes a long time to handle PHASE_THIRD_PARTY_APPS_CAN_START, but that shouldn't
// happen.)
boolean timeoutNeeded = true;
if ((mAm.mBootPhase < SystemService.PHASE_THIRD_PARTY_APPS_CAN_START)
&& (r.app != null) && (r.app.pid == android.os.Process.myPid())) {
Slog.w(TAG, "Too early to start/bind service in system_server: Phase=" + mAm.mBootPhase
+ " " + r.getComponentName());
timeoutNeeded = false;
}
long now = SystemClock.uptimeMillis();
if (r.executeNesting == 0) {
r.executeFg = fg;
ServiceState stracker = r.getTracker();
if (stracker != null) {
stracker.setExecuting(true, mAm.mProcessStats.getMemFactorLocked(), now);
}
if (r.app != null) {
r.app.executingServices.add(r);
r.app.execServicesFg |= fg;
if (timeoutNeeded && r.app.executingServices.size() == 1) {
scheduleServiceTimeoutLocked(r.app);
}
}
} else if (r.app != null && fg && !r.app.execServicesFg) {
r.app.execServicesFg = true;
if (timeoutNeeded) {
scheduleServiceTimeoutLocked(r.app);
}
}
r.executeFg |= fg;//executeFg 判断是否前后进程的依据
r.executeNesting++;
r.executingStart = now;//记录service的启动时间
}
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
//发生延时message SERVICE_TIMEOUT_MSG
mAm.mHandler.sendMessageDelayed(msg,
proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
/// M: ANR Debug Mechanism
mAm.mAnrManager.sendServiceMonitorMessage();
}
2.3 remove SERVICE_TIMEOUT_MSG
那在什么地方会remove SERVICE_TIMEOUT_MSG? 按我们理解应该是servcie 在执行oncreate的时候吧。结果也是
private void handleCreateService(CreateServiceData data) {
// If we are getting ready to gc after going to the background, well
// we are back active so skip it.
unscheduleGcIdler();
LoadedApk packageInfo = getPackageInfoNoCheck(
data.info.applicationInfo, data.compatInfo);
Service service = null;
try {
java.lang.ClassLoader cl = packageInfo.getClassLoader();
service = packageInfo.getAppFactory()
.instantiateService(cl, data.info.name, data.intent);
} catch (Exception e) {
if (!mInstrumentation.onException(service, e)) {
throw new RuntimeException(
"Unable to instantiate service " + data.info.name
+ ": " + e.toString(), e);
}
}
try {
if (localLOGV) Slog.v(TAG, "Creating service " + data.info.name);
ContextImpl context = ContextImpl.createAppContext(this, packageInfo);
context.setOuterContext(service);
//创建Application对象
Application app = packageInfo.makeApplication(false, mInstrumentation);
service.attach(context, this, data.info.name, data.token, app,
ActivityManager.getService());
service.onCreate(); //调用服务onCreate()方法
mServices.put(data.token, service);
try {
//移除 SERVICE_TIMEOUT_MSG
ActivityManager.getService().serviceDoneExecuting(
data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
} catch (RemoteException e) {
throw e.rethrowFromSystemServer();
}
} catch (Exception e) {
if (!mInstrumentation.onException(service, e)) {
throw new RuntimeException(
"Unable to create service " + data.info.name
+ ": " + e.toString(), e);
}
}
}
public void serviceDoneExecuting(IBinder token, int type, int startId, int res) {
synchronized(this) {
if (!(token instanceof ServiceRecord)) {
Slog.e(TAG, "serviceDoneExecuting: Invalid service token=" + token);
throw new IllegalArgumentException("Invalid service token");
}
mServices.serviceDoneExecutingLocked((ServiceRecord)token, type, startId, res);
}
}
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
boolean finishing) {
if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "<<< DONE EXECUTING " + r
+ ": nesting=" + r.executeNesting
+ ", inDestroying=" + inDestroying + ", app=" + r.app);
else if (DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING,
"<<< DONE EXECUTING " + r.shortInstanceName);
r.executeNesting--;
if (r.executeNesting <= 0) {
if (r.app != null) {
if (DEBUG_SERVICE) Slog.v(TAG_SERVICE,
"Nesting at 0 of " + r.shortInstanceName);
r.app.execServicesFg = false;
r.app.executingServices.remove(r);//service执行完毕就移除。service的添加是在bumpServiceExecutingLocked方法中执行的。
if (r.app.executingServices.size() == 0) {
if (DEBUG_SERVICE || DEBUG_SERVICE_EXECUTING) Slog.v(TAG_SERVICE_EXECUTING,
"No more executingServices of " + r.shortInstanceName);
//移除SERVICE_TIMEOUT_MSG
mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
/// M: ANR Debug Mechanism
mAm.mAnrManager.removeServiceMonitorMessage();
} else if (r.executeFg) {
// Need to re-evaluate whether the app still needs to be in the foreground.
for (int i=r.app.executingServices.size()-1; i>=0; i--) {
if (r.app.executingServices.valueAt(i).executeFg) {
r.app.execServicesFg = true;
break;
}
}
service ANR的触发原理很简单,那么思考一个问题,什么样的情况下会触发servcieANR?
答:1.执行代码块的时候Timeout,意思就是,startservcie 到service.oncreate 这段时间超时,比较常见的可能性是oncreate 里面执行比较耗时的代码,或者死锁。2 还有一种情况是机器性能有关,就是当cpu使用率比较高的时候,拿不到或者分配的cpu 时间片段比较小。导致执行超时,这部分就得看trace的打印或者cpuinfo信息。
2.4 发生ANR的情况
思考两个问题:1 如果发生了ANR ,系统会怎么反应,做哪些动作。2.如果发生了ANR系统会输出哪些log,我们从哪些log能判断此时系统是ANR的情况。
final class MainHandler extends Handler {
public MainHandler(Looper looper) {
super(looper, null, true);
}
@Override
public void handleMessage(Message msg) {
switch (msg.what) {
case GC_BACKGROUND_PROCESSES_MSG: {
synchronized (ActivityManagerService.this) {
performAppGcsIfAppropriateLocked();
}
} break;
case SERVICE_TIMEOUT_MSG: {
/// M: ANR Debug Mechanism @{
if (mAnrManager.delayMessage(mHandler, msg, SERVICE_TIMEOUT_MSG,
ActiveServices.SERVICE_TIMEOUT))
return; /// @}
mServices.serviceTimeout((ProcessRecord)msg.obj);
} break;
void serviceTimeout(ProcessRecord proc) {
String anrMessage = null;
synchronized(mAm) {
if (proc.isDebugging()) {
// The app's being debugged, ignore timeout.
return;
}
if (proc.executingServices.size() == 0 || proc.thread == null) {
return;
}
final long now = SystemClock.uptimeMillis();
//区分前台服务还是后台服务
final long maxTime = now -
(proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
ServiceRecord timeout = null;
long nextTime = 0;
// sr.executingStart是表示servcie 启动的时间,这个时间的赋值是 bumpServiceExecutingLocked 函数里面 r.executingStart = now;
//proc.executingServices 启动的服务是什么时候被添加到proc.executingServices里面呢?bumpServiceExecutingLocked r.app.executingServices.add(r);
//serviceDoneExecutingLocked r.app.executingServices.remove(r);
for (int i=proc.executingServices.size()-1; i>=0; i--) {
ServiceRecord sr = proc.executingServices.valueAt(i);
if (sr.executingStart < maxTime) {
timeout = sr;找到timeout的servcie
break;
}
if (sr.executingStart > nextTime) {
nextTime = sr.executingStart;
}
}
//如果有timeout的进程那么就生成anrMessage 最后appNotResponding
if (timeout != null && mAm.mProcessList.mLruProcesses.contains(proc)) {
Slog.w(TAG, "Timeout executing service: " + timeout);
StringWriter sw = new StringWriter();
PrintWriter pw = new FastPrintWriter(sw, false, 1024);
pw.println(timeout);
timeout.dump(pw, " ");
pw.close();
mLastAnrDump = sw.toString();
mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
anrMessage = "executing service " + timeout.shortInstanceName;
} else {
//如果没有找到timeout的service,那么在nextTime+SERVICE_TIMEOUT时间后再发送SERVICE_TIMEOUT_MSG广播。
Message msg = mAm.mHandler.obtainMessage(
ActivityManagerService.SERVICE_TIMEOUT_MSG);
msg.obj = proc;
mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg
? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT));
}
}
if (anrMessage != null) {//当找到timeout的service时前面创建好的anrMessage ,把这个anrMessage 交给 proc.appNotResponding处理
proc.appNotResponding(null, null, null, null, false, anrMessage);
}
}
void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo,
String parentShortComponentName, WindowProcessController parentProcess,
boolean aboveSystem, String annotation) {
ArrayList<Integer> firstPids = new ArrayList<>(5);
SparseArray<Boolean> lastPids = new SparseArray<>(20);
//如果有调用ActivityManager.getService().setActivityController方法,也就是自定义发生ANR时,用户处理,那么系统就不做处理,并且杀死进程。
mWindowProcessController.appEarlyNotResponding(annotation, () -> kill("anr", true));
long anrTime = SystemClock.uptimeMillis();
if (isMonitorCpuUsage()) {
mService.updateCpuStatsNow();//更新CPU的状态
}
synchronized (mService) {
// PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.
if (mService.mAtmInternal.isShuttingDown()) {//正在执行关机流程
Slog.i(TAG, "During shutdown skipping ANR: " + this + " " + annotation);
return;
} else if (isNotResponding()) {//相同的进程已经处理过ANR的问题了。
Slog.i(TAG, "Skipping duplicate ANR: " + this + " " + annotation);
return;
} else if (isCrashing()) {//APP crash ,已经被杀死。
Slog.i(TAG, "Crashing app skipping ANR: " + this + " " + annotation);
return;
} else if (killedByAm) {//被AMS杀死,那么什么情况会被AMS杀死呢?内存不足的时候,或者发生了OOM的情况下。AMS 会根据Adj的值来杀死进程,特别是后台进程。
Slog.i(TAG, "App already killed by AM skipping ANR: " + this + " " + annotation);
return;
} else if (killed) {//进程已经死掉,比如之前的 mWindowProcessController.appEarlyNotResponding(annotation, () -> kill("anr", true));
Slog.i(TAG, "Skipping died app ANR: " + this + " " + annotation);
return;
}
// In case we come through here for the same app before completing
// this one, mark as anring now so we will bail out.
setNotResponding(true);
// Log the ANR to the event log.
//输出Event log 关键字是am_anr
EventLog.writeEvent(EventLogTags.AM_ANR, userId, pid, processName, info.flags,
annotation);
// Dump thread traces as quickly as we can, starting with "interesting" processes.
firstPids.add(pid);//收集进程pid,为后续的dump打印做准备。
// Don't dump other PIDs if it's a background ANR
//如果是后台进程就不做处理。
if (!isSilentAnr()) {
int parentPid = pid;
if (parentProcess != null && parentProcess.getPid() > 0) {
parentPid = parentProcess.getPid();
}
if (parentPid != pid) firstPids.add(parentPid);
if (MY_PID != pid && MY_PID != parentPid) firstPids.add(MY_PID);
for (int i = getLruProcessList().size() - 1; i >= 0; i--) {
ProcessRecord r = getLruProcessList().get(i);
if (r != null && r.thread != null) {
int myPid = r.pid;
if (myPid > 0 && myPid != pid && myPid != parentPid && myPid != MY_PID) {
if (r.isPersistent()) {
firstPids.add(myPid);
if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
} else if (r.treatLikeActivity) {
firstPids.add(myPid);
if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
} else {
lastPids.put(myPid, Boolean.TRUE);
if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
}
}
}
}
}
}
final ProcessRecord parentPr = parentProcess != null
? (ProcessRecord) parentProcess.mOwner : null;
/// M: ANR Debug Mechanism
//如果ANRManager处理了,后续就不处理。谷歌的AnrManage.startAnrDump 是没有任何实现方式,所以这部分由各个芯片厂商来自己定义。
//mtk会开启persist.vendor.anr.enhancement这个属性值来是否自己处理,一般情况是不开启。所以正常情况走谷歌ANR处理流程
if (mService.mAnrManager.startAnrDump(mService, this, activityShortComponentName, aInfo,
parentShortComponentName, parentPr, aboveSystem, annotation, getShowBackground(),
anrTime))
return;
// Log the ANR to the main log.
//ANR log信息的保存。
StringBuilder info = new StringBuilder();
info.setLength(0);
info.append("ANR in ").append(processName);
if (activityShortComponentName != null) {
info.append(" (").append(activityShortComponentName).append(")");
}
info.append("\n");
info.append("PID: ").append(pid).append("\n");
if (annotation != null) {
info.append("Reason: ").append(annotation).append("\n");
}
if (parentShortComponentName != null
&& parentShortComponentName.equals(activityShortComponentName)) {
info.append("Parent: ").append(parentShortComponentName).append("\n");
}
ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
// don't dump native PIDs for background ANRs unless it is the process of interest
String[] nativeProcs = null;
//如果是后台进程,就不dump natvie pid 信息。
if (isSilentAnr()) {
for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
if (NATIVE_STACKS_OF_INTEREST[i].equals(processName)) {
nativeProcs = new String[] { processName };
break;
}
}
} else {
nativeProcs = NATIVE_STACKS_OF_INTEREST;
}
int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs);
ArrayList<Integer> nativePids = null;
if (pids != null) {
nativePids = new ArrayList<>(pids.length);
for (int i : pids) {
nativePids.add(i);
}
}
// For background ANRs, don't pass the ProcessCpuTracker to
// avoid spending 1/2 second collecting stats to rank lastPids.
//开始输出ANR信息到data/anr/trace.txt 文件。,trace.txt 这个文件名称的格式是:anr_yyyy-MM-dd-HH-mm-ss-SSS ,具体的实现方式可以看AMS createAnrDumpFile方法
File tracesFile = ActivityManagerService.dumpStackTraces(firstPids,
(isSilentAnr()) ? null : processCpuTracker, (isSilentAnr()) ? null : lastPids,
nativePids);
String cpuInfo = null;
if (isMonitorCpuUsage()) {
mService.updateCpuStatsNow();//更新cpu信息
synchronized (mService.mProcessCpuTracker) {
cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
}
info.append(processCpuTracker.printCurrentLoad());
info.append(cpuInfo);
}
info.append(processCpuTracker.printCurrentState(anrTime));
Slog.e(TAG, info.toString());//打印cpu trace 信息。
if (tracesFile == null) {//如果创建data/anr/trace.txt的文件失败的话,就杀死进程
// There is no trace file, so dump (only) the alleged culprit's threads to the log
Process.sendSignal(pid, Process.SIGNAL_QUIT);
}
StatsLog.write(StatsLog.ANR_OCCURRED, uid, processName,
activityShortComponentName == null ? "unknown": activityShortComponentName,
annotation,
(this.info != null) ? (this.info.isInstantApp()
? StatsLog.ANROCCURRED__IS_INSTANT_APP__TRUE
: StatsLog.ANROCCURRED__IS_INSTANT_APP__FALSE)
: StatsLog.ANROCCURRED__IS_INSTANT_APP__UNAVAILABLE,
isInterestingToUserLocked()
? StatsLog.ANROCCURRED__FOREGROUND_STATE__FOREGROUND
: StatsLog.ANROCCURRED__FOREGROUND_STATE__BACKGROUND,
getProcessClassEnum(),
(this.info != null) ? this.info.packageName : "");
//把信息添加到DropBoxManager ,这个时候在main log中能看到 DropBoxManager输出的ANR信息
mService.addErrorToDropBox("anr", this, processName, activityShortComponentName,
parentShortComponentName, parentPr, annotation, cpuInfo, tracesFile, null);
if (mWindowProcessController.appNotResponding(info.toString(), () -> kill("anr", true),
() -> {
synchronized (mService) {
mService.mServices.scheduleServiceTimeoutLocked(this);
}
})) {
return;
}
synchronized (mService) {
// mBatteryStatsService can be null if the AMS is constructed with injector only. This
// will only happen in tests.
if (mService.mBatteryStatsService != null) {
mService.mBatteryStatsService.noteProcessAnr(processName, uid);//通知BatteryStatsService
}
if (isSilentAnr() && !isDebugging()) {//如果是后台进程就直接杀死,不会有弹框的情况
kill("bg anr", true);
return;
}
// Set the app's notResponding state, and look up the errorReportReceiver
makeAppNotRespondingLocked(activityShortComponentName,
annotation != null ? "ANR " + annotation : "ANR", info.toString());
// mUiHandler can be null if the AMS is constructed with injector only. This will only
// happen in tests.
//发送handle 弹出一个AppNotResponding 的对话框。
if (mService.mUiHandler != null) {
// Bring up the infamous App Not Responding dialog
Message msg = Message.obtain();
msg.what = ActivityManagerService.SHOW_NOT_RESPONDING_UI_MSG;
msg.obj = new AppNotRespondingDialog.Data(this, aInfo, aboveSystem);
mService.mUiHandler.sendMessage(msg);
}
}
}
public boolean appNotResponding(String info, Runnable killAppCallback,
Runnable serviceTimeoutCallback) {
Runnable targetRunnable = null;
synchronized (mAtm.mGlobalLock) {
if (mAtm.mController == null) {
return false;
}
try {
// 0 == show dialog, 1 = keep waiting, -1 = kill process immediately
int res = mAtm.mController.appNotResponding(mName, mPid, info);
if (res != 0) {
if (res < 0 && mPid != MY_PID) {
targetRunnable = killAppCallback;
} else {
targetRunnable = serviceTimeoutCallback;
}
}
} catch (RemoteException e) {
mAtm.mController = null;
Watchdog.getInstance().setActivityController(null);
return false;
}
}
if (targetRunnable != null) {
targetRunnable.run();
return true;
}
return false;
}
public static File dumpStackTraces(ArrayList<Integer> firstPids,
ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids,
ArrayList<Integer> nativePids) {
ArrayList<Integer> extraPids = null;
Slog.i(TAG, "dumpStackTraces pids=" + lastPids + " nativepids=" + nativePids);//输出sys log .
// Measure CPU usage as soon as we're called in order to get a realistic sampling
// of the top users at the time of the request.
//如果是后台进程processCpuTracker=null,也就是后台进程的情况不输出CPU信息
if (processCpuTracker != null) {
processCpuTracker.init();
try {
Thread.sleep(200);
} catch (InterruptedException ignored) {
}
processCpuTracker.update();
// We'll take the stack crawls of just the top apps using CPU.
final int N = processCpuTracker.countWorkingStats();
extraPids = new ArrayList<>();
for (int i = 0; i < N && extraPids.size() < 5; i++) {
ProcessCpuTracker.Stats stats = processCpuTracker.getWorkingStats(i);
if (lastPids.indexOfKey(stats.pid) >= 0) {
if (DEBUG_ANR) Slog.d(TAG, "Collecting stacks for extra pid " + stats.pid);
extraPids.add(stats.pid);
} else {
Slog.i(TAG, "Skipping next CPU consuming process, not a java proc: "
+ stats.pid);
}
}
}
final File tracesDir = new File(ANR_TRACE_DIR);//创建data/anr 文件。
// Each set of ANR traces is written to a separate file and dumpstate will process
// all such files and add them to a captured bug report if they're recent enough.
maybePruneOldTraces(tracesDir);
// NOTE: We should consider creating the file in native code atomically once we've
// gotten rid of the old scheme of dumping and lot of the code that deals with paths
// can be removed.
File tracesFile = createAnrDumpFile(tracesDir);//创建anr_yyyy-MM-dd-HH-mm-ss-SSS 格式文件
if (tracesFile == null) {
return null;
}
dumpStackTraces(tracesFile.getAbsolutePath(), firstPids, nativePids, extraPids);
return tracesFile;
}
public static void dumpStackTraces(String tracesFile, ArrayList<Integer> firstPids,
ArrayList<Integer> nativePids, ArrayList<Integer> extraPids) {
Slog.i(TAG, "Dumping to " + tracesFile);//输出sys log
// We don't need any sort of inotify based monitoring when we're dumping traces via
// tombstoned. Data is piped to an "intercept" FD installed in tombstoned so we're in full
// control of all writes to the file in question.
// We must complete all stack dumps within 20 seconds.
long remainingTime = 20 * 1000;
// First collect all of the stacks of the most important pids.
//输出java 进程的trace 信息到tracesFile
if (firstPids != null) {
int num = firstPids.size();
for (int i = 0; i < num; i++) {
Slog.i(TAG, "Collecting stacks for pid " + firstPids.get(i));
final long timeTaken = dumpJavaTracesTombstoned(firstPids.get(i), tracesFile,
remainingTime);
remainingTime -= timeTaken;
if (remainingTime <= 0) {
Slog.e(TAG, "Aborting stack trace dump (current firstPid=" + firstPids.get(i) +
"); deadline exceeded.");
return;
}
if (DEBUG_ANR) {
Slog.d(TAG, "Done with pid " + firstPids.get(i) + " in " + timeTaken + "ms");
}
}
}
// Next collect the stacks of the native pids
//输出native 进程的trace 信息到tracesFile
if (nativePids != null) {
for (int pid : nativePids) {
Slog.i(TAG, "Collecting stacks for native pid " + pid);
final long nativeDumpTimeoutMs = Math.min(NATIVE_DUMP_TIMEOUT_MS, remainingTime);
final long start = SystemClock.elapsedRealtime();
Debug.dumpNativeBacktraceToFileTimeout(
pid, tracesFile, (int) (nativeDumpTimeoutMs / 1000));
final long timeTaken = SystemClock.elapsedRealtime() - start;
remainingTime -= timeTaken;
if (remainingTime <= 0) {
Slog.e(TAG, "Aborting stack trace dump (current native pid=" + pid +
"); deadline exceeded.");
return;
}
if (DEBUG_ANR) {
Slog.d(TAG, "Done with native pid " + pid + " in " + timeTaken + "ms");
}
}
}
// Lastly, dump stacks for all extra PIDs from the CPU tracker.
if (extraPids != null) {
for (int pid : extraPids) {
Slog.i(TAG, "Collecting stacks for extra pid " + pid);
final long timeTaken = dumpJavaTracesTombstoned(pid, tracesFile, remainingTime);
remainingTime -= timeTaken;
if (remainingTime <= 0) {
Slog.e(TAG, "Aborting stack trace dump (current extra pid=" + pid +
"); deadline exceeded.");
return;
}
if (DEBUG_ANR) {
Slog.d(TAG, "Done with extra pid " + pid + " in " + timeTaken + "ms");
}
}
}
Slog.i(TAG, "Done dumping");//输出完毕。
}
private static synchronized File createAnrDumpFile(File tracesDir) {
if (sAnrFileDateFormat == null) {
sAnrFileDateFormat = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss-SSS");
}
final String formattedDate = sAnrFileDateFormat.format(new Date());
final File anrFile = new File(tracesDir, "anr_" + formattedDate);
try {
if (anrFile.createNewFile()) {
FileUtils.setPermissions(anrFile.getAbsolutePath(), 0600, -1, -1); // -rw-------
return anrFile;
} else {
Slog.w(TAG, "Unable to create ANR dump file: createNewFile failed");
}
} catch (IOException ioe) {
Slog.w(TAG, "Exception creating ANR dump file:", ioe);
}
return null;
}
看完上面的代码对刚才提的两个问题已经找到答案,再总结一下
一、 如果发生了ANR ,系统会怎么反应:1.判断是前台进程还是后台进程,如果是前台进程判断超时时间是否大于SERVICE_TIMEOUT(20s),如果是后台进程判断超时时间是否大于SERVICE_BACKGROUND_TIMEOUT(200s)。如果时间没到的话就, mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg? (nextTime+SERVICE_TIMEOUT) (nextTime + SERVICE_BACKGROUND_TIMEOUT))。
2.如果超时,不管是前台进程还是后台进程。创建anrMessage,输出 Slog.w(TAG, "Timeout executing service: " + timeout);把anrMessage给proc.appNotResponding处理。
3.如果用户有调用ActivityManager.getService().setActivityController,也就是用户自定义处理ANR信息的情况,杀死进程kill("anr", true)。
4.更新CPU的状态,一些特殊情况下不会输出data/anr/trace 文件,具体情况看前面的代码说明。
5.输出Event log 关键字是am_anr
6 对后台进程会做一些特殊处理,不输出cpu信息,以及pid的trace信息。
7.如果芯片厂商有自定义了ANRManager 的情况:如果ANRManager处理了,后续就不处理,包括data/anr/trace和弹框。谷歌的AnrManage.startAnrDump 是没有任何实现方式,所以这部分由各个芯片厂商来自己定义。mtk会开启persist.vendor.anr.enhancement这个属性值来是否自己处理,一般情况是不开启。所以正常情况走谷歌ANR处理流程
8 开始输出ANR信息到data/anr/trace.txt 文件。,trace.txt 这个文件名称的格式是:anr_yyyy-MM-dd-HH-mm-ss-SSS ,具体的实现方式可以看AMS createAnrDumpFile方法
9 如果是后台进程就直接杀死,不会有弹框的情况。如果是前台进程,发送handle ,弹出一个AppNotResponding 的对话框。
所以在上面需要注意几点信息:1.不是所有的ANR都会有data/anr/trace生成,当用户自己处理ANR的时候,还有trace文件创建失败的情况。2.后台进程是不会有弹框的,会被直接杀死。而且后台进程没有相应的PID trace信息。
二、.如果发生了ANR系统会输出哪些log,我们从哪些log能判断此时系统是ANR的情况:
1 . Slog.w(TAG, "Timeout executing service: " + timeout);TAG:ActivityManager
2. EventLog.writeEvent(EventLogTags.AM_ANR, userId, pid, processName, info.flags,
annotation); 这个log的输出在某些情况是不会输出的,用户定义了ANR的处理。还有就是另外几种情况,isShuttingDown ,isNotResponding ,isCrashing ,killedByAm,killed
3. 生成 data/anr/文件,里面保存了一些pid的trace 信息以及cpu信息。
4.DropBoxManager也会输出一些anr的trace信息。
5 如果是前台进程还会有一个AppNotRespondingDialog。