转载自:http://wenku.baidu.com/link?url=fUXG2wO3ECS2nZorr6MtTH8_PNCQESk-h8UpQdt1NH0_AjtgwbwW2b7PIO7oJ0OyJJzr62uB0202rXyNBK3tQtqGV3hxr_WVw6UrB7imO_a
内容目录
1 ANR是如何产生的
1.1 ANR产生条件
ANR的产生需要同时满足三个条件:
- 主线程:只有应用程序进程的主线程响应超时才会产生ANR;
- 超时时间:产生ANR的上下文不同,超时时间也不同,但只要超过这个时间上限没有响应就会产生ANR;
- 输入事件/特定操作:输入事件是指按键、触屏等设备输入事件,特定操作是指BroadcastReceiver和Service的生命周期中的各个函数调用。
产生ANR的上下文不同,导致ANR原因也不同,主要有以下三种情况:
- 应用进程的主线程对输入事件在5s内没有处理完毕;
- 应用进程的主线程在执行BroadcastRecevier的onReceive函数时10s内没有处理完毕;
- 应用进程的主线程在执行Service的各个生命周期函数时20s内没有处理完毕;
1.2 引起ANR的根本原因
引起ANR的根本原因,总的来说可以归纳为两类:
- 应用进程自身引起的,比如:主线程阻塞、挂起、死循环,执行耗时操作等;
- 其他进程引起的,比如:其他进程CPU占用率过高,导致当前应用进程无法抢占到CPU时间片。常见的问题如文件读写频繁,io进程CPU占用率过高,导致当前应用出现ANR;
2 ANR原因分析方法
2.1 通过logcat日志分析原因
举一个开发过程中的ANR实例,《从云盘上下载视频到本地,进入下载中心删除任务和文件,进入文件管理提示‘文件管理无响应’》,在logcat日志中输出了以下ANR信息:
01-21 23:58:40.265 975 991 E ActivityManager: ANR in com.letv.filemanager 01-21 23:58:40.265 975 991 E ActivityManager: Reason: Executing service com.letv.filemanager/.netstorage.service.DlnaService 01-21 23:58:40.265 975 991 E ActivityManager: Load: 5.62 / 4.3 / 3.55 01-21 23:58:40.265 975 991 E ActivityManager: CPU usage from 30331ms to 0ms ago: 01-21 23:58:40.265 975 991 E ActivityManager: 26% 1760/com.letv.airplay: 24% user + 1.6% kernel / faults: 1094 minor 01-21 23:58:40.265 975 991 E ActivityManager: 15% 787/cdnclient: 0% user + 15% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 13% 298/kswapd0: 0% user + 13% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 9.5% 1437/com.letv.dmr: 9% user + 0.5% kernel / faults: 572 minor 01-21 23:58:40.265 975 991 E ActivityManager: 9.1% 953/flush-179:0: 0% user + 9.1% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 5.8% 788/collector: 0.8% user + 4.9% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 5% 419/mmcqd: 0% user + 5% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 2.9% 814/ld-linux.so.3: 1.1% user + 1.8% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 2.5% 822/dd: 0.2% user + 2.2% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 1.4% 2533/com.letv.threeScreen: 0.7% user + 0.7% kernel / faults: 134 minor 01-21 23:58:40.265 975 991 E ActivityManager: 0.5% 817/logcat: 0% user + 0.4% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0.4% 7/events/0: 0% user + 0.4% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0.4% 1254/com.letv.t2.launcher: 0.4% user + 0% kernel / faults: 20 minor 01-21 23:58:40.265 975 991 E ActivityManager: 0.2% 975/system_server: 0% user + 0.1% kernel / faults: 34 minor 1 major 01-21 23:58:40.265 975 991 E ActivityManager: 0.2% 1452/com.letv.downloads: 0.1% user + 0.1% kernel / faults: 40 minor 01-21 23:58:40.265 975 991 E ActivityManager: 0.1% 808/virtualkeypad: 0% user + 0.1% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0.1% 1292/RTW_CMD_THREAD: 0% user + 0.1% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 753/jbd2/mmcblk9-8: 0% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 8/events/1: 0% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 177/bdi-default: 0% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 786/letvmanager: 0% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 1472/com.letv.filemanager: 0% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0% 1737/com.letv.medialog: 0% user + 0% kernel / faults: 48 minor 01-21 23:58:40.265 975 991 E ActivityManager: 62% TOTAL: 7.2% user + 10% kernel + 43% iowait + 1% softirq 01-21 23:58:40.265 975 991 E ActivityManager: CPU usage from 3195ms to 3733ms later: 01-21 23:58:40.265 975 991 E ActivityManager: 15% 975/system_server: 3.7% user + 11% kernel / faults: 1 minor 01-21 23:58:40.265 975 991 E ActivityManager: 15% 991/ActivityManager: 3.7% user + 11% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 7.5% 1760/com.letv.airplay: 7.5% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 5.6% 2615/SocketListener(: 5.6% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 5.5% 419/mmcqd: 0% user + 5.5% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 3.7% 298/kswapd0: 0% user + 3.7% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 3.7% 788/collector: 0% user + 3.7% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 3.7% 866/collector: 0% user + 3.7% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 1.8% 865/collector: 0% user + 1.8% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 3.7% 814/ld-linux.so.3: 1.8% user + 1.8% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 1.8% 1009/814Linux hotplu: 1.8% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 1.8% 1013/IRQThread_157: 0% user + 1.8% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 0.3% 1254/com.letv.t2.launcher: 0.3% user + 0% kernel / faults: 1 minor 01-21 23:58:40.265 975 991 E ActivityManager: 0.3% 1254/etv.t2.launcher: 0.3% user + 0% kernel 01-21 23:58:40.265 975 991 E ActivityManager: 54% TOTAL: 2.9% user + 5.9% kernel + 45% iowait |
我们可以通过ActivityManagerService的appNotResponding方法看一下ANR信息是如何记录到logcat日志中的:
.\framework\base\services\Java\com\Android\server\am\ActivityManagerService.java
final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) { ...... synchronized (this) { // PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down. if (mShuttingDown) { //重启时忽略ANR Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation); return; } else if (app.notResponding) { //之前已处理过则不再处理 Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation); return; } else if (app.crashing) { //已经Crash的进程则不再处理ANR Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation); return; }
//设置ANR标志,用于防止重复处理 // In case we come through here for the same app before completing // this one, mark as anring now so we will bail out. app.notResponding = true;
// Log the ANR to the event log. EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid, app.processName, app.info.flags, annotation);
//将发生ANR的应用pid放入firstPids集合中 // Dump thread traces as quickly as we can, starting with "interesting" processes. firstPids.add(app.pid);
//将父PID也放入firstPids集合 int parentPid = app.pid; if (parent != null && parent.app != null && parent.app.pid > 0) parentPid = parent.app.pid; if (parentPid != app.pid) firstPids.add(parentPid);
//将主PID放入firstPids集合 if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
//将persistent进程放入firstPids集合 for (int i = mLruProcesses.size() - 1; i >= 0; i--) { ProcessRecord r = mLruProcesses.get(i); if (r != null && r.thread != null) { int pid = r.pid; if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) { if (r.persistent) { firstPids.add(pid); } else { lastPids.put(pid, Boolean.TRUE); } } } } } // 将ANR信息写入log // Log the ANR to the main log. StringBuilder info = new StringBuilder(); info.setLength(0); info.append("ANR in ").append(app.processName); if (activity != null && activity.shortComponentName != null) { info.append(" (").append(activity.shortComponentName).append(")"); } info.append("\n"); info.append("PID: ").append(app.pid).append("\n"); if (annotation != null) { info.append("Reason: ").append(annotation).append("\n"); } if (parent != null && parent != activity) { info.append("Parent: ").append(parent.shortComponentName).append("\n"); }
final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids, NATIVE_STACKS_OF_INTEREST);
String cpuInfo = null; if (MONITOR_CPU_USAGE) { updateCpuStatsNow(); synchronized (mProcessCpuThread) { cpuInfo = mProcessCpuTracker.printCurrentState(anrTime); } info.append(processCpuTracker.printCurrentLoad()); info.append(cpuInfo); }
info.append(processCpuTracker.printCurrentState(anrTime));
Slog.e(TAG, info.toString()); ...... }
|
AMS在发生ANR时,会按以下格式将ANR信息记录到logcat日志中:
ActivityManager: ANR in [Process Name]([short component name]) ActivityManager: PID: [Application Pid] ActivityManager: Reason: [Annotation] ActivityManager: Parent: [short component name of parent] [Process CPU state] |
其中的Reason信息可以给出ANR产生原因的一些详细信息,
· 输入事件处理引起的ANR,提示信息格式为:“Inputdispatching timed out [anr reason]”
· Service处理引起的ANR,提示信息格式为:“Executingservice [Package name]/[Short class name]”
· Broadcast处理引起的ANR,提示信息格式为:“Broadcast of [Intent focused byBroadcast receiver]”
其中的Process CPU state 信息格式如下:
ActivityManager: Load: [Load1] / [Load5] / [Load15] ActivityManager: CPU usage from [上次采样与现在的时间差] ms to [当前采样与现在的时间差] ms ago/later: ActivityManager: [总的CPU时间占用率]% [PID]/[Process Name]: [用户CPU时间占用率]% user + [系统CPU时间占用率]% kernel + [IO等待CPU时间占用率]% iowait + [硬中断CPU时间占用率]% irq + [软中断CPU时间占用率]% softirq / faults: [次要页错误/主要页错误] minor/major ActivityManager: ..... ActivityManager: [CPU时间占用率合计统计] |
其中Load1, Load5, Load15分别为CPU 1分钟平均任务负载数,5分钟平均任务负载数,15分钟平均任务负载数,平均任务负载数和CPU占用率并没有必然联系,可以作为参考信息。可以通过分析各进程的CPU时间占用率,来判断是否为某些进程长期占用CPU导致该进程无法获取到足够的CPU处理时间,而导致发生ANR。
这里需要重点关注的是Load1,各进程总的CPU时间占用率,用户CPU时间占用率,系统CPU时间占用率,以及iowait CPU时间占用率。
2.2 通过trace文件分析原因
ActivityManagerService的appNotResponding方法在写入logcat日志的同时,还会将ANR时的stack trace信息写入到trace文件,具体代码流程如下:
final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) { ...... File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids, NATIVE_STACKS_OF_INTEREST); ...... } public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids, ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) { //判断系统属性,确认trace文件路径 String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null); if (tracesPath == null || tracesPath.length() == 0) { return null; } //若trace文件不存在则创建,并设置可读写属性 File tracesFile = new File(tracesPath); try { File tracesDir = tracesFile.getParentFile(); if (!tracesDir.exists()) { tracesFile.mkdirs(); if (!SELinux.restorecon(tracesDir)) { return null; } } FileUtils.setPermissions(tracesDir.getPath(), 0775, -1, -1); // drwxrwxr-x
if (clearTraces && tracesFile.exists()) tracesFile.delete(); tracesFile.createNewFile(); FileUtils.setPermissions(tracesFile.getPath(), 0666, -1, -1); // -rw-rw-rw- } catch (IOException e) { Slog.w(TAG, "Unable to prepare ANR traces file: " + tracesPath, e); return null; } //执行dump操作 dumpStackTraces(tracesPath, firstPids, processCpuTracker, lastPids, nativeProcs); return tracesFile; }
private static void dumpStackTraces(String tracesPath, ArrayList<Integer> firstPids, ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) { //设置file observer // Use a FileObserver to detect when traces finish writing. // The order of traces is considered important to maintain for legibility. FileObserver observer = new FileObserver(tracesPath, FileObserver.CLOSE_WRITE) { @Override public synchronized void onEvent(int event, String path) { notify(); } };
try { observer.startWatching(); //向firstPids进程集合发送关闭信号,虚拟机接收到SIGNAL_QUIT信号后会将进程中各个线程的函//数堆栈信息输出到traces.txt文件中 // First collect all of the stacks of the most important pids. if (firstPids != null) { try { int num = firstPids.size(); for (int i = 0; i < num; i++) { synchronized (observer) { Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT); observer.wait(200); // Wait for write-close, give up after 200msec } } } catch (InterruptedException e) { Log.wtf(TAG, e); } }
// Next collect the stacks of the native pids if (nativeProcs != null) { int[] pids = Process.getPidsForCommands(nativeProcs); if (pids != null) { for (int pid : pids) { Debug.dumpNativeBacktraceToFile(pid, tracesPath); } } } //进行CPU占用率采样,并将占用率排在前几位的进程的stack trace信息dump到文件 // Lastly, measure CPU usage. if (processCpuTracker != null) { processCpuTracker.init(); System.gc(); processCpuTracker.update(); try { synchronized (processCpuTracker) { processCpuTracker.wait(500); // measure over 1/2 second. } } catch (InterruptedException e) { } processCpuTracker.update();
// We'll take the stack crawls of just the top apps using CPU. final int N = processCpuTracker.countWorkingStats(); int numProcs = 0; for (int i=0; i<N && numProcs<5; i++) { ProcessCpuTracker.Stats stats = processCpuTracker.getWorkingStats(i); if (lastPids.indexOfKey(stats.pid) >= 0) { numProcs++; try { synchronized (observer) { Process.sendSignal(stats.pid, Process.SIGNAL_QUIT); observer.wait(200); // Wait for write-close, give up after 200msec } } catch (InterruptedException e) { Log.wtf(TAG, e); }
} } } } finally { observer.stopWatching(); } } |
其中firstPids的Pid内容可以参考appNotResponding方法中的实现逻辑。
3 ANR解决方法
因为ANR主要是因为主线程由于耗时操作被阻塞而产生的,所以常见的解决方法是不在主线程做耗时操作,具体实现时需要注意以下几点:
· 主线程需要做耗时操作时,比如网络访问、数据库操作及位图变换等,必须启动一子线程处理,并利用handler来更新UI;
· 子线程尽量使用Android提供的API,比如HandlerThread,AsyncTask,AsyncQueryHandler等,这些API都提供了对于线程的系统级管理。如果应用直接使用Thread实现的话,则需要对这些子线程进行显式管理,比如线程池及线程周期的控制,以防止系统资源和内存泄漏;
· Broadcast Receiver中如果有耗时操作,可以放到service中来处理;
· 在后台子线程处理耗时操作时,为了提高用户体验,可以在前台界面显示某些动画或者progress bar;
4 参考资料
http://rayleeya.iteye.com/blog/1955652