引起ANR问题的根本原因,总的来说可以归纳为两类:
- 应用进程自身引起的,例如: 主线程阻塞、挂起、死循环 应用进程的其他线程的CPU占用率高,使得主线程无法抢占到CPU时间片 2.其他进程间接引起的,例如: 当前应用进程进行进程间通信请求其他进程,其他进程的操作长时间没有反馈 其他进程的CPU占用率高,使得当前应用进程无法抢占到CPU时间片 分析ANR问题时,以上述可能的2种原因为线索,通过分析各种日志信息,大多数情况下你就可以很容易找到问题所在了。
从log分析
在BottomBarManager里, 当点击find_in_page button时, sleep 10秒,模拟一个ANR。 用"ActivityManager"作为filter tag, 可以得到下面的信息。(btw, ultraedit有个filter 字符串的功能非常好, 看看linux下有没有类似的功能?) 可以得到ANR发生在哪个进程, reason, 在ANR发生前一段时间和发生后的一段时间, CPU占用率的统计信息.
06-02 11:39:42.260 743-943/? E/ActivityManager﹕ ANR in com.qihoo.browser (com.qihoo.browser/org.chromium. chrome.browser.ChromeTabbedActivity) Reason: keyDispatchingTimedOut //CPU前一分钟、五分钟、十五分钟的CPU平均负载, //CPU平均负载可以理解为一段时间内正在使用和等待使用CPU的活动进程的平均数量。 Load: 5.16 / 9.69 / 30.66 //请注意ago,表示ANR发生之前的一段时间内的CPU使用率,并不是某一时刻的值 CPU usage from 34388ms to -1ms ago: 4.1% 32614/com.qihoo.browser: 2.5% user + 1.6% kernel / faults: 465 minor 1 major 3% 379/adbd: 0.2% user + 2.8% kernel / faults: 1653 minor 2.6% 743/system_server: 1.8% user + 0.7% kernel / faults: 689 minor 2.5% 2326/com.qihoo.daemon: 2.2% user + 0.2% kernel / faults: 601 minor 2.4% 1009/RX_Thread: 0% user + 2.4% kernel 2% 280/surfaceflinger: 1.2% user + 0.8% kernel / faults: 1361 minor 1.8% 2675/com.lbe.security:service: 1.7% user + 0.1% kernel / faults: 749 minor 1.1% 1768/mpdecision: 0.1% user + 1% kernel 0.9% 29165/kworker/0:2: 0% user + 0.9% kernel 0.7% 23576/com.qihoo.appstore: 0.5% user + 0.1% kernel / faults: 1401 minor 0.5% 28500/kworker/0:0: 0% user + 0.5% kernel 0.4% 1055/com.android.systemui: 0.3% user + 0% kernel / faults: 48 minor 0.4% 6850/kworker/u:1: 0% user + 0.4% kernel 0.3% 23077/com.qihoo.appstore:PluginP01: 0.2% user + 0.1% kernel / faults: 200 minor 0.2% 271/mediaserver: 0.2% user + 0% kernel / faults: 46 minor 0.3% 6883/kworker/u:37: 0% user + 0.3% kernel 0.2% 302/illumination_service: 0% user + 0.2% kernel 0.2% 134/mmcqd/0: 0% user + 0.2% kernel 0.1% 32727/com.qihoo.browser:privileged_process0: 0% user + 0% kernel / faults: 288 minor 0.1% 2885/com.qihoo.browser:sandboxed_process1: 0% user + 0% kernel 0.1% 3378/com.sonyericsson.playnowchina.android: 0.1% user + 0% kernel 0.1% 2027/com.cleanmaster.mguard_cn:service: 0% user + 0.1% kernel / faults: 3 minor 0.1% 1008/TX_Thread: 0% user + 0.1% kernel 0.1% 28341/com.sonyericsson.dlna: 0.1% user + 0% kernel 0% 130/cfinteractive: 0% user + 0% kernel 0% 293/ric: 0% user + 0% kernel / faults: 93 minor 0% 2468/com.qihoo360.mobilesafe.clean: 0% user + 0% kernel / faults: 45 minor 0% 4861/com.tencent.mm: 0% user + 0% kernel / faults: 9 minor 0% 127/irq/294-clearpa: 0% user + 0% kernel 0% 281/zygote: 0% user + 0% kernel / faults: 129 minor 0% 1007/MC_Thread: 0% user + 0% kernel 0% 2648/com.qihoo.appstore_CoreDaemon: 0% user + 0% kernel 0% 29640/com.lbe.security: 0% user + 0% kernel / faults: 1 minor 0% 155/jbd2/mmcblk0p26: 0% user + 0% kernel 0% 170/flush-179:0: 0% user + 0% kernel 0% 269/servicemanager: 0% user + 0% kernel 0% 276/system_monitor: 0% user + 0% kernel 0% 291/qrngd: 0% user + 0% kernel 0% 365/sdcard: 0% user + 0% kernel 0% 1041/wpa_supplicant: 0% user + 0% kernel 0% 1360/com.android.nfc: 0% user + 0% kernel 0% 2958/lbesec.loader: 0% user + 0% kernel / faults: 14 minor 0% 5128/com.tencent.mm:push: 0% user + 0% kernel / faults: 2 minor 0% 6028/com.UCMobile:SearchBackgroundProcess: 0% user + 0% kernel +0% 3318/com.sonyericsson.textinput.chinese: 0% user + 0% kernel +0% 3544/sh: 0% user + 0% kernel +0% 3546/logcat: 0% user + 0% kernel +0% 3682/migration/1: 0% user + 0% kernel +0% 3683/kworker/1:0: 0% user + 0% kernel +0% 3684/ksoftirqd/1: 0% user + 0% kernel 9.7% TOTAL: 4.8% user + 4.2% kernel + 0.3% iowait + 0.3% softirq //这里是later,表示ANR发生之后 CPU usage from 1656ms to 2187ms later: 8.7% 743/system_server: 0% user + 8.7% kernel / faults: 4 minor 7% 943/InputDispatcher: 0% user + 7% kernel 1.7% 1199/Binder_6: 0% user + 1.7% kernel 5.2% 379/adbd: 0% user + 5.2% kernel / faults: 27 minor 3.5% 379/adbd: 0% user + 3.5% kernel 1.7% 1009/RX_Thread: 0% user + 1.7% kernel 1.7% 1768/mpdecision: 0% user + 1.7% kernel 1.7% 1784/mpdecision: 0% user + 1.7% kernel 1.2% 6883/kworker/u:37: 0% user + 1.2% kernel 1.3% 29165/kworker/0:2: 0% user + 1.3% kernel 2.3% TOTAL: 0.1% user + 0.3% kernel + 1.8% iowait
06-02 12:00:15.753 743-825/? I/ActivityManager﹕ Killing ProcessRecord{4190b0d8 11880:com.qihoo.browser/u0a10192}: user's request 06-02 12:00:15.813 743-1332/? I/ActivityManager﹕ Process com.qihoo.browser (pid 11880) has died.
用"Proces"作为filter tag,可以看到
//向浏览器进程(11880)发送signal3, 也就是SIGNAL_QUIT. 06-02 12:00:10.889 743-943/? I/Process﹕ Sending signal. PID: 11880 SIG: 3
用"dalvikvm"作为filter tag,可以看到
com.qihoo.browser进程处理SIGNAL_QUIT signal, 并把anr信息写入traces.txt文件.
进程的虚拟机实例接收到SIGNAL_QUIT信号后会由"Signal Catcher"线程将进程中各个线程的函数堆栈信息输出到traces.txt文件中.
值得注意的是, 当前运行的其他进程也会把当前的函数堆栈信息输出到traces.txt文件中, 但发生ANR的进程正常情况下会第一个输出,所以一般情况下我们只看traces.txt的开头就行了。06-02 11:51:31.252 8857-8862/com.qihoo.browser I/dalvikvm﹕ threadid=3: reacting to signal 3 06-02 11:51:31.422 8857-8862/com.qihoo.browser I/dalvikvm﹕ Wrote stack traces to '/data/anr/traces.txt' 06-02 11:51:31.422 743-748/? I/dalvikvm﹕ threadid=3: reacting to signal 3 06-02 11:51:32.243 1275-1280/? I/dalvikvm﹕ Wrote stack traces to '/data/anr/traces.txt' 06-02 11:51:32.253 1282-1287/? I/dalvikvm﹕ threadid=3: reacting to signal 3 06-02 11:51:32.283 1282-1287/? I/dalvikvm﹕ Wrote stack traces to '/data/anr/traces.txt' 06-02 11:51:32.283 1348-1353/? I/dalvikvm﹕ threadid=3: reacting to signal 3 06-02 11:51:32.283 1360-1365/? I/dalvikvm﹕ Wrote stack traces to '/data/anr/traces.txt'
从以上的分析能够知道, 发生ANR时Android为我们提供了两种“利器”:traces文件和CPU使用率。
分析CPU负载和CPU使用率
这部分讲的还不是很清楚, 就先不做总结了, 以后有需要再系统的学习这小节.
从traces.txt分析
只需要看开头,每次发生ANR, 这个文件都会被清空,写入新的内容. 如果想查看以前发生ANR的信息, 可以去查看DB文件.
----- pid 11880 at 2016-06-02 12:00:10 ----- Cmd line: com.qihoo.browser DALVIK THREADS: (mutexes: tll=0 tsl=0 tscl=0 ghl=0) "main" prio=5 tid=1 TIMED_WAIT | group="main" sCount=1 dsCount=0 obj=0x41458b50 self=0x414481f0 | sysTid=11880 nice=-6 sched=0/0 cgrp=apps handle=1075152348 | state=S schedstat=( 8455108583 3379699755 13157 ) utm=664 stm=181 core=0 at java.lang.VMThread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:1031) at java.lang.Thread.sleep(Thread.java:1013) at com.qihoo.browser.BottomBarManager.handlePopupMenuAction(BottomBarManager.java:1220) at com.qihoo.browser.BottomBarManager.actionPerformed(BottomBarManager.java:629) at com.qihoo.browser.view.PopupMenu.onClick(PopupMenu.java:514) at android.view.View.performClick(View.java:4278) at android.view.View$PerformClick.run(View.java:17430) at android.os.Handler.handleCallback(Handler.java:725) at android.os.Handler.dispatchMessage(Handler.java:92) at android.os.Looper.loop(Looper.java:213) at android.app.ActivityThread.main(ActivityThread.java:5092) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:511) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:797) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:564) at dalvik.system.NativeStart.main(Native Method) "pool-9-thread-1" prio=5 tid=52 WAIT | group="main" sCount=1 dsCount=0 obj=0x41e3a6b8 self=0x7ce144e0 | sysTid=12261 nice=0 sched=0/0 cgrp=apps handle=2008331720 | state=S schedstat=( 37597659 95306395 149 ) utm=2 stm=1 core=1 at java.lang.Object.wait(Native Method) - waiting on <0x41e3a7e0> (a java.lang.VMThread) held by tid=52 (pool-9-thread-1) at java.lang.Thread.parkFor(Thread.java:1231) at sun.misc.Unsafe.park(Unsafe.java:323) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:159) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2019) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:413) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1013) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1073) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:573) at java.lang.Thread.run(Thread.java:856) //“Signal Catcher”负责接收和处理kernel发送的各种信号,例如SIGNAL_QUIT、SIGNAL_USR1等就是被该线程 //接收到并处理的,traces.txt 文件中的内容就是由该线程负责输出的,可以看到它的状态是RUNNABLE. "Signal Catcher" daemon prio=5 tid=3 RUNNABLE | group="system" sCount=0 dsCount=0 obj=0x41735a58 self=0x6256b008 | sysTid=11885 nice=0 sched=0/0 cgrp=apps handle=1074396464 | state=R schedstat=( 41015624 121002197 61 ) utm=1 stm=3 core=0 at dalvik.system.NativeStart.run(Native Method) //JDWP线程是支持虚拟机调试的线程,不需要关心 "JDWP" daemon prio=5 tid=4 VMWAIT | group="system" sCount=1 dsCount=0 obj=0x41735b50 self=0x72d47008 | sysTid=11886 nice=0 sched=0/0 cgrp=apps handle=1074396224 | state=S schedstat=( 3051760 9521482 15 ) utm=0 stm=0 core=1 #00 pc 000171ec /system/lib/libc.so (select+20) #01 pc 00062faf /system/lib/libdvm.so #02 pc 00065b41 /system/lib/libdvm.so #03 pc 0005637b /system/lib/libdvm.so #04 pc 0000e4b8 /system/lib/libc.so (__thread_entry+72) #05 pc 0000dba4 /system/lib/libc.so (pthread_create+160) at dalvik.system.NativeStart.run(Native Method)
线程有很多状态,了解这些状态的意义对分析ANR的原因是有帮助的.
DropBox中的log
traces.txt只保留最后一次发生ANR时的信息, android 2.2开始增加了DropBox功能, 保留历史上发生的所有ANR的log.
“/data/system/dropbox”是DB指定的文件存放位置.
日志保存的最长时间, 默认是3天.root@android:/data/system/dropbox # ls SYSTEM_BOOT@1464834107910.txt SYSTEM_TOMBSTONE@1464595870188.txt.gz SYSTEM_TOMBSTONE@1464835783944.txt.gz data_app_anr@1464666793921.txt.gz data_app_anr@1464838359233.txt.gz data_app_anr@1464838782715.txt.gz data_app_anr@1464839493673.txt.gz data_app_anr@1464840013776.txt.gz system_app_anr@1464838619773.txt.gz system_app_anr@1464838693689.txt.gz system_app_anr@1464838742647.txt.gz root@android:/data/system/dropbox #
SystemServer在启动时, 会创建并添加DROPBOX_SERVICE.
//SystemServer.java private void startOtherServices() { ... ... try { Slog.i(TAG, "DropBox Service"); ServiceManager.addService(Context.DROPBOX_SERVICE, new DropBoxManagerService(context, new File("/data/system/dropbox"))); } catch (Throwable e) { reportWtf("starting DropBoxManagerService", e); } ... ... }
framework是如何输出ANR信息到traces.txt文件中去的
ActivityManagerService.java中的appNotResponding()方法.
final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) { //firstPids和lastPids两个集合存放那些将会在traces中输出信息的进程的进程号 ArrayList<Integer> firstPids = new ArrayList<Integer>(5); SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20); //mController是IActivityController接口的实例,是为Monkey测试程序预留的,默认为null if (mController != null) { try { // 0 == continue, -1 = kill process immediately int res = mController.appEarlyNotResponding(app.processName, app.pid, annotation); if (res < 0 && app.pid != MY_PID) { app.kill("anr", true); } } catch (RemoteException e) { mController = null; Watchdog.getInstance().setActivityController(null); } } long anrTime = SystemClock.uptimeMillis(); if (MONITOR_CPU_USAGE) { updateCpuStatsNow(); //更新CPU使用率 } synchronized (this) { //一些特定条件下会忽略ANR // PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down. if (mShuttingDown) { Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation); return; } else if (app.notResponding) { Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation); return; } else if (app.crashing) { Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation); return; } // In case we come through here for the same app before completing // this one, mark as anring now so we will bail out. app.notResponding = true; // Log the ANR to the event log. EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid, app.processName, app.info.flags, annotation); //当前发生ANR的应用进程被第一个添加进firstPids集合中. //所以会第一个向traces文件中写入信息。反过来说,traces文件中出现的第一个进程正常情况下就是发生ANR的那个进程。 // Dump thread traces as quickly as we can, starting with "interesting" processes. firstPids.add(app.pid); int parentPid = app.pid; if (parent != null && parent.app != null && parent.app.pid > 0) parentPid = parent.app.pid; if (parentPid != app.pid) firstPids.add(parentPid); if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID); for (int i = mLruProcesses.size() - 1; i >= 0; i--) { ProcessRecord r = mLruProcesses.get(i); if (r != null && r.thread != null) { int pid = r.pid; if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) { if (r.persistent) { firstPids.add(pid); } else { lastPids.put(pid, Boolean.TRUE); } } } } } // Log the ANR to the main log. StringBuilder info = new StringBuilder(); info.setLength(0); info.append("ANR in ").append(app.processName); if (activity != null && activity.shortComponentName != null) { info.append(" (").append(activity.shortComponentName).append(")"); } info.append("\n"); info.append("PID: ").append(app.pid).append("\n"); if (annotation != null) { info.append("Reason: ").append(annotation).append("\n"); } if (parent != null && parent != activity) { info.append("Parent: ").append(parent.shortComponentName).append("\n"); } final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true); //dumpStackTraces()是输出ANR信息到traces文件的函数 File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids, NATIVE_STACKS_OF_INTEREST); String cpuInfo = null; if (MONITOR_CPU_USAGE) { updateCpuStatsNow(); synchronized (mProcessCpuTracker) { cpuInfo = mProcessCpuTracker.printCurrentState(anrTime); } //输出ANR发生前一段时间内的CPU使用率 info.append(processCpuTracker.printCurrentLoad()); info.append(cpuInfo); } //输出ANR发生后一段时间内的CPU使用率 info.append(processCpuTracker.printCurrentState(anrTime)); //此处, 把ANR的信息输出到main log中. Slog.e(TAG, info.toString()); if (tracesFile == null) { // There is no trace file, so dump (only) the alleged culprit's threads to the log Process.sendSignal(app.pid, Process.SIGNAL_QUIT); } //将ANR信息同时输出到DropBox中 addErrorToDropBox("anr", app, app.processName, activity, parent, annotation, cpuInfo, tracesFile, null); //在Android4.0中可以设置是否不显示ANR提示对话框,如果设置的话就不会显示对话框,并且会杀掉ANR进程 // Unless configured otherwise, swallow ANRs in background processes & kill the process. boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(), Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0; synchronized (this) { mBatteryStatsService.noteProcessAnr(app.processName, app.uid); if (!showBackground && !app.isInterestingToUserLocked() && app.pid != MY_PID) { app.kill("bg anr", true); return; } // Set the app's notResponding state, and look up the errorReportReceiver makeAppNotRespondingLocked(app, activity != null ? activity.shortComponentName : null, annotation != null ? "ANR " + annotation : "ANR", info.toString()); // 显示ANR提示对话框 // Bring up the infamous App Not Responding dialog Message msg = Message.obtain(); HashMap<String, Object> map = new HashMap<String, Object>(); msg.what = SHOW_NOT_RESPONDING_MSG; msg.obj = map; msg.arg1 = aboveSystem ? 1 : 0; map.put("app", app); if (activity != null) { map.put("activity", activity); } mUiHandler.sendMessage(msg); } }
public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids, ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) { //系统属性“dalvik.vm.stack-trace-file”用来配置trace信息输出文件 //之所以trace信息会输出到“/data/anr/traces.txt”文件中,就是系统属性“dalvik.vm.stack-trace-file”设置的 //adb shell下, 可以使用setprop和getprop对系统属性进行设置和读取 //getprop dalvik.vm.stack-trace-file //setprop dalvik.vm.stack-trace-file /tmp/stack-traces.txt String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null); if (tracesPath == null || tracesPath.length() == 0) { return null; } File tracesFile = new File(tracesPath); try { File tracesDir = tracesFile.getParentFile(); if (!tracesDir.exists()) { tracesDir.mkdirs(); if (!SELinux.restorecon(tracesDir)) { return null; } } //FileUtils.setPermissions是个很有用的函数,设置文件属性时经常会用到 FileUtils.setPermissions(tracesDir.getPath(), 0775, -1, -1); // drwxrwxr-x //clearTraces为true,会删除旧文件,创建新文件 if (clearTraces && tracesFile.exists()) tracesFile.delete(); tracesFile.createNewFile(); FileUtils.setPermissions(tracesFile.getPath(), 0666, -1, -1); // -rw-rw-rw- } catch (IOException e) { Slog.w(TAG, "Unable to prepare ANR traces file: " + tracesPath, e); return null; } dumpStackTraces(tracesPath, firstPids, processCpuTracker, lastPids, nativeProcs); return tracesFile; }
private static void dumpStackTraces(String tracesPath, ArrayList<Integer> firstPids, ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) { // Use a FileObserver to detect when traces finish writing. // The order of traces is considered important to maintain for legibility. //使用FileObserver监听SignalCatcher线程是否已经完成写入traces文件的操作 FileObserver observer = new FileObserver(tracesPath, FileObserver.CLOSE_WRITE) { @Override public synchronized void onEvent(int event, String path) { notify(); //让其他阻塞在observer.wait()这行代码的线程继续往下执行. } }; try { observer.startWatching(); //首先输出firstPids集合中指定的进程,这些也是对ANR问题来说最重要的进程 // First collect all of the stacks of the most important pids. if (firstPids != null) { try { int num = firstPids.size(); for (int i = 0; i < num; i++) { synchronized (observer) { //前面提到的SIGNAL_QUIT, 由SignalCatcher线程处理Process.SIGNAL_QUIT信号, 并把进程中的各个线程输出到traces.txt文件中. //具体代码可以查看/dalvik/vm/目录下的SignalCatcher.cpp::logThreadStacks函数和Thread.cpp:: dvmDumpAllThreadsEx函数。 Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT); observer.wait(200); // Wait for write-close, give up after 200msec //block here, 直到另一个线程调用了observer.notify() or observer.notifyAll()方法, 再或是等待了200毫秒超时后, 才会继续往下执行代码. } } } catch (InterruptedException e) { Slog.wtf(TAG, e); } } } finally { observer.stopWatching(); } }
Object.java final void notify() //Causes a thread which is waiting on this object's monitor (by means of calling one of the wait() methods) to be woken up. final void notifyAll() //Causes all threads which are waiting on this object's monitor (by means of calling one of the wait() methods) to be woken up. final void wait() //Causes the calling thread to wait until another thread calls the notify() or notifyAll() method of this object. final void wait(long millis) //Causes the calling thread to wait until another thread calls the notify() or notifyAll() method //of this object or until the specified timeout expires.
refer to :
作者:ahking17
链接:http://www.jianshu.com/p/8964812972be
來源:简书
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
android ANR分析
最新推荐文章于 2024-07-27 20:57:44 发布