Android-ANR

一. ANR概述

什么是ANR

ANR,是“Application Not Responding”的缩写,即“应用程序无响应”。在Android中,如果应用程序主线程在超时时间内对输入事件没有处理完毕,或者对特定操作没有执行完毕,就会出现ANR。对于输入事件没有处理完毕产生的ANR,Android会显示一个对话框,提示用户当前应用程序没有响应,用户可以选择继续等待或者关闭这个应用程序。

ANR类型

ANR一般有以下几种场景:
在这里插入图片描述

二.ANR原理

ANR触发流程

ANR是一套监控Android应用响应是否及时的机制,可以把发生ANR比作是引爆炸弹,那么整个流程包含三部分组成:

1.埋定时炸弹:中控系统(system_server进程)启动倒计时,在规定时间内如果目标(应用进程)没有干完所有的活,则中控系统会定向炸毁(杀进程)目标。
2.拆炸弹:在规定的时间内干完工地的所有活,并及时向中控系统报告完成,请求解除定时炸弹,则幸免于难。
3.引爆炸弹:中控系统立即封装现场,抓取快照,搜集目标执行慢的罪证(traces),便于后续的案件侦破(调试分析),最后是炸毁目标。
下面通过service来讲解整个ANR的触发流程,其他BroadcastReceiver、ContentProvider触发流程大同小异,可自行了解。

service超时机制

下面来看看埋炸弹与拆炸弹在整个服务启动(startService)过程所处的环节。
在这里插入图片描述

图解1:

1.客户端(App进程)向中控系统(system_server进程)发起启动服务的请求
2.中控系统派出一名空闲的通信员(binder_1线程)接收该请求,紧接着向组件管家(ActivityManager线程)发送消息,埋下定时炸弹
3.通讯员1号(binder_1)通知工地(service所在进程)的通信员准备开始干活
4.通讯员3号(binder_3)收到任务后转交给包工头(main主线程),加入包工头的任务队列(MessageQueue)
5.包工头经过一番努力干完活(完成service启动的生命周期)
6.包工头执行完成后,立刻向中控系统汇报工作已完成
7.中控系统的通讯员2号(binder_2)收到包工头的完工汇报后,立刻拆除炸弹。如果在炸弹倒计时结束前拆除炸弹则相安无事,否则会引发爆炸(触发ANR)

埋炸弹

在Service进程attach到system_server进程的过程中会调用realStartServiceLocked()方法来埋下炸弹.

2.3.1 AS.realStartServiceLocked

private final void realStartServiceLocked(ServiceRecord r, ProcessRecord app, boolean execInFg) throws RemoteException {
    ...
    //发送delay消息(SERVICE_TIMEOUT_MSG)
    bumpServiceExecutingLocked(r, execInFg, "create");
    try {
        ...
        //最终执行服务的onCreate()方法
        app.thread.scheduleCreateService(r, r.serviceInfo,
                mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
                app.repProcState);
    } catch (DeadObjectException e) {
        mAm.appDiedLocked(app);
        throw e;
    } finally {
        ...
    }
}

2.3.2 AS.bumpServiceExecutingLocked

private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
    ... 
    scheduleServiceTimeoutLocked(r.app);
}

void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    if (proc.executingServices.size() == 0 || proc.thread == null) {
        return;
    }
    long now = SystemClock.uptimeMillis();
    Message msg = mAm.mHandler.obtainMessage(
            ActivityManagerService.SERVICE_TIMEOUT_MSG);
    msg.obj = proc;
    
    //当超时后仍没有remove该SERVICE_TIMEOUT_MSG消息,则执行service Timeout流程
    mAm.mHandler.sendMessageAtTime(msg,
        proc.execServicesFg ? (now+SERVICE_TIMEOUT) : (now+ SERVICE_BACKGROUND_TIMEOUT));
}

该方法的主要工作发送delay消息(SERVICE_TIMEOUT_MSG). 炸弹已埋下。

拆炸弹

那么什么时候会拆除这颗炸弹的引线呢? 经过Binder等层层调用进入目标进程的主线程handleCreateService()的过程.

2.4.1 AT.handleCreateService

private void handleCreateService(CreateServiceData data) {
	...
	java.lang.ClassLoader cl = packageInfo.getClassLoader();
	Service service = (Service) cl.loadClass(data.info.name).newInstance();
	...

	try {
		//创建ContextImpl对象
		ContextImpl context = ContextImpl.createAppContext(this, packageInfo);
		context.setOuterContext(service);
		//创建Application对象
		Application app = packageInfo.makeApplication(false, mInstrumentation);
		service.attach(context, this, data.info.name, data.token, app,
				ActivityManagerNative.getDefault());
		//调用服务onCreate()方法 
		service.onCreate();
		
		//拆除炸弹引线
		ActivityManagerNative.getDefault().serviceDoneExecuting(
				data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
	} catch (Exception e) {
		...
	}
}

在这个过程会创建目标服务对象,以及回调onCreate()方法, 紧接再次经过多次调用回到system_server来执行serviceDoneExecuting.

2.4.2 AS.serviceDoneExecutingLocked

private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying, boolean finishing) {
    ...
    if (r.executeNesting <= 0) {
        if (r.app != null) {
            r.app.execServicesFg = false;
            r.app.executingServices.remove(r);
            if (r.app.executingServices.size() == 0) {
                //当前服务所在进程中没有正在执行的service
                mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
        ...
    }
    ...
}

该方法的主要工作是当service启动完成,则移除服务超时消息SERVICE_TIMEOUT_MSG。

引爆炸弹

在system_server进程中有一个Handler线程, 名叫”ActivityManager”.当倒计时结束便会向该Handler线程发送 一条信息SERVICE_TIMEOUT_MSG。

2.5.1 MainHandler.handleMessage

final class MainHandler extends Handler {
    public void handleMessage(Message msg) {
        switch (msg.what) {
            case SERVICE_TIMEOUT_MSG: {
                ...
                mServices.serviceTimeout((ProcessRecord)msg.obj);
            } break;
            ...
        }
        ...
    }
}

2.5.2 AS.serviceTimeout

void serviceTimeout(ProcessRecord proc) {
    String anrMessage = null;

    synchronized(mAm) {
        if (proc.executingServices.size() == 0 || proc.thread == null) {
            return;
        }
        final long now = SystemClock.uptimeMillis();
        final long maxTime =  now -
                (proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
        ServiceRecord timeout = null;
        long nextTime = 0;
        for (int i=proc.executingServices.size()-1; i>=0; i--) {
            ServiceRecord sr = proc.executingServices.valueAt(i);
            if (sr.executingStart < maxTime) {
                timeout = sr;
                break;
            }
            if (sr.executingStart > nextTime) {
                nextTime = sr.executingStart;
            }
        }
        if (timeout != null && mAm.mLruProcesses.contains(proc)) {
            Slog.w(TAG, "Timeout executing service: " + timeout);
            StringWriter sw = new StringWriter();
            PrintWriter pw = new FastPrintWriter(sw, false, 1024);
            pw.println(timeout);
            timeout.dump(pw, " ");
            pw.close();
            mLastAnrDump = sw.toString();
            mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
            mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
            anrMessage = "executing service " + timeout.shortName;
        }
    }

    if (anrMessage != null) {
        //当存在timeout的service,则执行appNotResponding
        mAm.appNotResponding(proc, null, null, false, anrMessage);
    }
}

当出现ANR时,都是调用到AMS.appNotResponding()方法。

Input类型ANR

当应用程序的窗口处于活动状态并且能够接收输入事件(例如按键事件、触摸事件等)时,系统底层上报的事件就会被 InputDispatcher 分发给该应用。开发者通常根据需要接收并处理不同的事件,或者创建自定义的视图控件来处理事件。
InputDispatcher 运行在 system_server 进程的一个子线程中,每当接收到一个新的输入事件,InputDispatcher 就会检测前一个已经发给应用程序的输入事件是否已经处理完毕(InBoundQueue, OutBoundQueue,WaitQueue),如果超时,会通过一系列的回调通知AMS报告ANR发生,需要注意的是,产生这种 ANR 的前提是要有输入事件,如果没有输入事件, 即使主线程阻塞了也不会报告 ANR。从设计的角度看,此时系统会推测用户没有继续点击该应用,寄希望于一段时间后阻塞会自行消失,因此会暂时“隐瞒不报”。

此类 ANR 发生时的关键字是:”Input event dispatching timed out sending to”,其中会有多种Reason,此类 ANR的超时时间默认为5秒,小内存设备通常会根据情况做适当的延长调整。

ANR处理流程

appNotResponding处理流程

final void appNotResponding(ProcessRecord app, ActivityRecord activity, ActivityRecord parent, boolean aboveSystem, final String annotation) {
    ...
    updateCpuStatsNow(); //第一次 更新cpu统计信息,目前京东崩溃统计没有event.log信息
    synchronized (this) {
      //PowerManager.reboot() 会阻塞很长时间,因此忽略关机时的ANR
      if (mShuttingDown) {
          return;
      } else if (app.notResponding) {
          return;
      } else if (app.crashing) {
          return;
      }
      //记录ANR到EventLog
      EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
              app.processName, app.info.flags, annotation);
              
      // 将当前进程添加到firstPids
      firstPids.add(app.pid);
      int parentPid = app.pid;
      
      //将system_server进程添加到firstPids
      if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
      
      for (int i = mLruProcesses.size() - 1; i >= 0; i--) {
          ProcessRecord r = mLruProcesses.get(i);
          if (r != null && r.thread != null) {
              int pid = r.pid;
              if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
                  if (r.persistent) {
                      firstPids.add(pid); //将persistent进程添加到firstPids
                  } else {
                      lastPids.put(pid, Boolean.TRUE); //其他进程添加到lastPids
                  }
              }
          }
      }
    }
    
    // 记录ANR输出到main log
    StringBuilder info = new StringBuilder();
    info.setLength(0);
    info.append("ANR in ").append(app.processName);
    if (activity != null && activity.shortComponentName != null) {
        info.append(" (").append(activity.shortComponentName).append(")");
    }
    info.append("\n");
    info.append("PID: ").append(app.pid).append("\n");
    if (annotation != null) {
        info.append("Reason: ").append(annotation).append("\n");
    }
    if (parent != null && parent != activity) {
        info.append("Parent: ").append(parent.shortComponentName).append("\n");
    }
    
    //创建CPU tracker对象
    final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
    //输出traces信息
    File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, 
            lastPids, NATIVE_STACKS_OF_INTEREST);
            
    updateCpuStatsNow(); //第二次更新cpu统计信息
    //记录当前各个进程的CPU使用情况
    synchronized (mProcessCpuTracker) {
        cpuInfo = mProcessCpuTracker.printCurrentState(anrTime);
    }
    //记录当前CPU负载情况
    info.append(processCpuTracker.printCurrentLoad());
    info.append(cpuInfo);
    //记录从anr时间开始的Cpu使用情况
    info.append(processCpuTracker.printCurrentState(anrTime));
    //输出当前ANR的reason,以及CPU使用率、负载信息
    Slog.e(TAG, info.toString()); 
    
    //将traces文件 和 CPU使用率信息保存到dropbox,即data/system/dropbox目录
    addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
            cpuInfo, tracesFile, null);

    synchronized (this) {
        ...
        //后台ANR的情况, 则直接杀掉
        if (!showBackground && !app.isInterestingToUserLocked() && app.pid != MY_PID) {
            app.kill("bg anr", true);
            return;
        }

        //设置app的ANR状态,病查询错误报告receiver
        makeAppNotRespondingLocked(app,
                activity != null ? activity.shortComponentName : null,
                annotation != null ? "ANR " + annotation : "ANR",
                info.toString());

        //重命名trace文件
        String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
        if (tracesPath != null && tracesPath.length() != 0) {
            //traceRenameFile = "/data/anr/traces.txt"
            File traceRenameFile = new File(tracesPath);
            String newTracesPath;
            int lpos = tracesPath.lastIndexOf (".");
            if (-1 != lpos)
                // 新的traces文件= /data/anr/traces_进程名_当前日期.txt
                newTracesPath = tracesPath.substring (0, lpos) + "_" + app.processName + "_" + mTraceDateFormat.format(new Date()) + tracesPath.substring (lpos);
            else
                newTracesPath = tracesPath + "_" + app.processName;

            traceRenameFile.renameTo(new File(newTracesPath));
        }
                
        //弹出ANR对话框
        Message msg = Message.obtain();
        HashMap<String, Object> map = new HashMap<String, Object>();
        msg.what = SHOW_NOT_RESPONDING_MSG;
        msg.obj = map;
        msg.arg1 = aboveSystem ? 1 : 0;
        map.put("app", app);
        if (activity != null) {
            map.put("activity", activity);
        }
        
        //向ui线程发送,内容为SHOW_NOT_RESPONDING_MSG的消息
        mUiHandler.sendMessage(msg);
    }
    
}

当发生ANR时, 会按顺序依次执行:

1.输出ANR Reason信息到EventLog. 也就是说ANR触发的时间点最接近的就是EventLog中输出的am_anr信息;
2.收集并输出重要进程列表中的各个线程的traces信息,该方法较耗时;
3.输出当前各个进程的CPU使用情况以及CPU负载情况;
4.将traces文件和 CPU使用情况信息保存到dropbox,即data/system/dropbox目录
5.根据进程类型,来决定直接后台杀掉,还是弹框告知用户.

ANR输出重要进程的traces信息,这些进程包含:
1.firstPids队列:第一个是ANR进程,第二个是system_server,剩余是所有persistent进程;
2.Native队列:是指/system/bin/目录的mediaserver,sdcard 以及surfaceflinger进程;
3.lastPids队列: 是指mLruProcesses中的不属于firstPids的所有进程。

三.ANR分析

指导思想:通过各种线索,寻求主线程在过去一段时间内发生阻塞的可能原因。

trace文件解读

----- pid 888 at 2021-11-11 22:22:22 -----
Cmd line: system_server
ABI: arm
Build type: optimized
Zygote loaded classes=4113 post zygote classes=3239
Intern table: 57550 strong; 9315 weak
//共加载16动态库 
JNI: CheckJNI is off; globals=2418 (plus 115 weak)
Libraries: /system/lib/libandroid.so /system/lib/libandroid_servers.so /system/lib/libaudioeffect_jni.so /system/lib/libcompiler_rt.so /system/lib/libjavacrypto.so /system/lib/libjnigraphics.so /system/lib/libmedia_jni.so /system/lib/librs_jni.so /system/lib/libsechook.so /system/lib/libshell_jni.so /system/lib/libsoundpool.so /system/lib/libwebviewchromium_loader.so /system/lib/libwifi-service.so /vendor/lib/libalarmservice_jni.so /vendor/lib/liblocationservice.so libjavacore.so (16)
//已分配堆内存大小40MB,其中29M已用,总分配207772个对象 
Heap: 27% free, 29MB/40MB; 307772 objects
... //省略GC相关信息

//当前进程总99个线程
DALVIK THREADS (99):
//主线程调用栈
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x75bd9fb0 self=0x5573d4f770
  | sysTid=12078 nice=-2 cgrp=default sched=0/0 handle=0x7fa75fafe8
  | state=S schedstat=( 5907843636 827600677 5112 ) utm=453 stm=137 core=0 HZ=100
  | stack=0x7fd64ef000-0x7fd64f1000 stackSize=8MB
  | held mutexes=
  //内核栈
  kernel: __switch_to+0x70/0x7c
  kernel: SyS_epoll_wait+0x2a0/0x324
  kernel: SyS_epoll_pwait+0xa4/0x120
  kernel: cpu_switch_to+0x48/0x4c
  native: #00 pc 0000000000069be4 /system/lib64/libc.so (__epoll_pwait+8)
  native: #01 pc 000000000001cca4 /system/lib64/libc.so (epoll_pwait+32)
  native: #02 pc 000000000001ad74 /system/lib64/libutils.so (_ZN7android6Looper9pollInnerEi+144)
  native: #03 pc 000000000001b154 /system/lib64/libutils.so (_ZN7android6Looper8pollOnceEiPiS1_PPv+80)
  native: #04 pc 00000000000d4bc0 /system/lib64/libandroid_runtime.so (_ZN7android18NativeMessageQueue8pollOnceEP7_JNIEnvP8_jobjecti+48)
  native: #05 pc 000000000000082c /data/dalvik-cache/arm64/system@framework@boot.oat (Java_android_os_MessageQueue_nativePollOnce__JI+144)
  at android.os.MessageQueue.nativePollOnce(Native method)
  at android.os.MessageQueue.next(MessageQueue.java:323)
  at android.os.Looper.loop(Looper.java:135)
  at com.android.server.SystemServer.run(SystemServer.java:290)
  at com.android.server.SystemServer.main(SystemServer.java:175)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:738)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:628)

//线程名: Binder_1(如有daemon则代表守护线程);prio: 线程优先级;tid: 线程内部id;线程状态: NATIVE
"Binder_1" prio=5 tid=8 Native
//group: 线程所属的线程组;sCount: 线程挂起次数;dsCount: 用于调试的线程挂起次数;obj: 当前线程关联的java线程对象;self: 当前线程地址
  | group="main" sCount=1 dsCount=0 obj=0x12c610a0 self=0x5573e5c750
  | sysTid=12092 nice=0 cgrp=default sched=0/0 handle=0x7fa2743450
  //schedstat: CPU调度时间统计
  | state=S schedstat=( 796240075 863170759 3586 ) utm=50 stm=29 core=1 HZ=100
  | stack=0x7fa2647000-0x7fa2649000 stackSize=1013KB
  | held mutexes=
... //此处省略剩余的N个线程.

主要信息说明:
main:main标识是主线程。其他线程“Binder_1”,1表示线程id,逐步递增
prio:线程优先级,默认是5
tid:tid不是线程的id,是线程唯一标识ID
group:是线程组名称
sCount:该线程被挂起的次数
dsCount:是线程被调试器挂起的次数
obj:对象地址
self:该线程Native的地址
sysTid:是线程号(主线程的线程号和进程号相同)
nice:是线程的调度优先级
sched:分别标志了线程的调度策略和优先级
cgrp:调度归属组
handle:线程处理函数的地址
state:是调度状态
schedstat:从 /proc/[pid]/task/[tid]/schedstat读出,三个值分别表示线程在cpu上执行的时间、线程的等待时间和线程执行的时间片长度,不支持这项信息的三个值都是0;
utm:是线程用户态下使用的时间值(单位是jiffies)
stm:是内核态下的调度时间值
core:是最后执行这个线程的cpu核的序号。

分析流程

除了trace日志,我们还有很多其他日志信息可以用来帮助分析。

1.要确定ANR的种类。

2.查看CPU的占用,看IOWait的占比是否很高,判断是否在等待IO。

3.找到anr的第一时间点,对于Input类型ANR来说,可以看到“Input event dispatching timed out sending to ”,这里是 ANR被触发的第一时间点。在明确了这个时间点后,再根据不同ANR的类型分析。如广播超时需要分析第一时间点前10秒(60秒)的广播队列信息。

4.检查主线程是否存在死锁、阻塞、主线程耗时操作等问题,结合源码分析排查问题。

分析 ANR 问题时通常需要通盘考虑,综合log中各方面的信息。切忌只看问题的一个侧面就轻率作出结论。而且traces.txt的可用性并不是很高,可能反应的问题并不是根本原因。确实有一些ANR问题很难调查清楚,因为整个系统不稳定的因素很多。

实例分析

实例1:

"main" prio=5 tid=1 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x72e03498 self=0x792f010800
| sysTid=8250 nice=-4 cgrp=default sched=0/0 handle=0x79b5f18ed0
| state=S schedstat=( 933532789 105856790 3028 ) utm=62 stm=30 core=6 HZ=100
| stack=0x7ff5db9000-0x7ff5dbb000 stackSize=8192KB
| held mutexes=
at com.tencent.map.geolocation.TencentLocationManager.getInstance(unavailable:-1)
- waiting to lock <0x0d50926a> (a java.lang.Class<com.tencent.map.geolocation.TencentLocationManager>) held by thread 66
at com.jingdong.common.lbs.jdlocation.JDLocationTencentSDK.getInstance(unavailable:-1)
- locked <0x0fe6d35b> (a java.lang.Class<com.jingdong.common.lbs.jdlocation.JDLocationTencentSDK>)
at com.jingdong.common.lbs.jdlocation.JDLocationSDK$14.run(unavailable:-1)
at android.os.Handler.handleCallback(Handler.java:900)
at android.os.Handler.dispatchMessage(Handler.java:103)
at android.os.Looper.loop(Looper.java:219)
at android.app.ActivityThread.main(ActivityThread.java:8347)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:513)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1055)
//省略

"pool-6-thread-2" prio=5 tid=66 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x1898b040 self=0x78b5bf9800
| sysTid=8584 nice=0 cgrp=default sched=0/0 handle=0x788faf2d50
| state=S schedstat=( 22107813 25765621 59 ) utm=0 stm=1 core=4 HZ=100
| stack=0x788f9f0000-0x788f9f2000 stackSize=1039KB
| held mutexes=
kernel: (couldn't read /proc/self/task/8584/stack)
native: #00 pc 00000000000baf14  /apex/com.android.runtime/lib64/bionic/libc.so (wait4+4)
native: #01 pc 00000000001faaa4  /apex/com.android.runtime/lib64/libart.so (art::ExecAndReturnCode(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*)+760)
native: #02 pc 00000000001fae58  /apex/com.android.runtime/lib64/libart.so (art::Exec(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*)+40)
native: #03 pc 000000000046b1b0  /apex/com.android.runtime/lib64/libart.so (art::OatFileAssistant::Dex2Oat(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*)+2528)
native: #04 pc 0000000000467fec  /apex/com.android.runtime/lib64/libart.so (art::OatFileAssistant::GenerateOatFileNoChecks(art::OatFileAssistant::OatFileInfo&, art::CompilerFilter::Filter, art::ClassLoaderContext const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*)+3496)
native: #05 pc 0000000000466ed0  /apex/com.android.runtime/lib64/libart.so (art::OatFileAssistant::MakeUpToDate(bool, art::ClassLoaderContext*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*)+416)
native: #06 pc 000000000046eb68  /apex/com.android.runtime/lib64/libart.so (art::OatFileManager::OpenDexFilesFromOat(char const*, _jobject*, _jobjectArray*, art::OatFile const**, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>*)+340)
native: #07 pc 00000000004208d4  /apex/com.android.runtime/lib64/libart.so (art::DexFile_openDexFileNative(_JNIEnv*, _jclass*, _jstring*, _jstring*, int, _jobject*, _jobjectArray*)+140)
at dalvik.system.DexFile.openDexFileNative(Native method)
at dalvik.system.DexFile.openDexFile(DexFile.java:365)
at dalvik.system.DexFile.<init>(DexFile.java:107)
at dalvik.system.DexFile.<init>(DexFile.java:80)
at dalvik.system.DexPathList.loadDexFile(DexPathList.java:444)
at dalvik.system.DexPathList.makeDexElements(DexPathList.java:393)
at dalvik.system.DexPathList.<init>(DexPathList.java:164)
at dalvik.system.BaseDexClassLoader.<init>(BaseDexClassLoader.java:131)
at dalvik.system.BaseDexClassLoader.<init>(BaseDexClassLoader.java:89)
at dalvik.system.DexClassLoader.<init>(DexClassLoader.java:55)
at com.tencent.map.geolocation.a.a.b.a(unavailable:-1)
- locked <0x072360c5> (a com.tencent.map.geolocation.a.a.b)
at com.tencent.map.geolocation.a.a.b.a(unavailable:-1)
at com.tencent.map.geolocation.a.a.f.a(unavailable:-1)
- locked <0x0eea6a1a> (a com.tencent.map.geolocation.a.a.f)
at com.tencent.map.geolocation.TencentLocationManager.<init>(unavailable:-1)
at com.tencent.map.geolocation.TencentLocationManager.getInstance(unavailable:-1)
- locked <0x0d50926a> (a java.lang.Class<com.tencent.map.geolocation.TencentLocationManager>)
at com.jingdong.common.lbs.jdlocation.JDLocationSDK$1.run(unavailable:-1)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.lang.Thread.run(Thread.java:929)

找到main线程,发现waiting to lock <0x0d50926a> (a java.lang.Class<com.tencent.map.geolocation.TencentLocationManager>) held by thread 66,main 线程正在尝试获取线程66的0x0d50926a lock,全局索索0x0d50926a找到对应的线程66。发现线程66 lock了0x0d50926a并开始openDexFileNative,现在需要翻看源码。

public static synchronized TencentLocationManager getInstance(Context var0) {
    if (e == null) {
        if (var0 == null) {
            throw new NullPointerException("context is null");
        }

        if (var0.getApplicationContext() == null) {
            throw new NullPointerException("application context is null");
        }

        e = new TencentLocationManager(var0.getApplicationContext());
    }

    return e;
}

synchronized修饰单例中,子线程进行初始化,主线程获取实例时需等子线程完成,子线程有操作耗时,造成主线程卡顿。此类问题可以考虑是否可以放到子线程中操作,类似问题有readDeviceUUID。

实例2:

 "main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 obj=0x73ee6470 self=0xb4d76500
  | sysTid=22831 nice=0 cgrp=default sched=0/0 handle=0xb6f4bc00
  | state=S schedstat=( 0 0 0 ) utm=22 stm=22 core=0 HZ=100
  | stack=0xbe283000-0xbe285000 stackSize=8MB
  | held mutexes=
  native: #00 pc 000410ac  /system/lib/libc.so (__ioctl+8)
  native: #01 pc 000477e5  /system/lib/libc.so (ioctl+14)
  native: #02 pc 0001e7c5  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+132)
  native: #03 pc 0001ee17  /system/lib/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+38)
  native: #04 pc 0001efcd  /system/lib/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+124)
  native: #05 pc 00019fb7  /system/lib/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+30)
  native: #06 pc 00086de9  /system/lib/libandroid_runtime.so (???)
  native: #07 pc 00d94629  /data/dalvik-cache/arm/system@framework@boot.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+140)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:503)
  at android.net.INetworkPolicyManager$Stub$Proxy.getNetworkPolicies(INetworkPolicyManager.java:410)
  at android.net.NetworkPolicyManager.getNetworkPolicies(NetworkPolicyManager.java:174)
  at com.android.settings.net.NetworkPolicyEditor.read(NetworkPolicyEditor.java:57)
  at com.android.settings.DataUsageSummary.onCreate(DataUsageSummary.java:361)
  at android.app.Fragment.performCreate(Fragment.java:2202)
  at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:942)
  at android.app.FragmentManagerImpl.moveToState(FragmentManager.java:1148)
  at android.app.BackStackRecord.run(BackStackRecord.java:793)
  at android.app.FragmentManagerImpl.execPendingActions(FragmentManager.java:1535)
  at android.app.FragmentManagerImpl.executePendingTransactions(FragmentManager.java:562)
  at com.android.settings.SettingsActivity.switchToFragment(SettingsActivity.java:1084)
  at com.android.settings.SettingsActivity.onCreate(SettingsActivity.java:657)
  at android.app.Activity.performCreate(Activity.java:6251)
  at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1110)
  at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2370)
  at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2477)
  at android.app.ActivityThread.-wrap11(ActivityThread.java:-1)
  at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1345)
  at android.os.Handler.dispatchMessage(Handler.java:102)
  at android.os.Looper.loop(Looper.java:148)
  at android.app.ActivityThread.main(ActivityThread.java:5438)
  at java.lang.reflect.Method.invoke!(Native method)
  at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:762)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:652)
  主线程被对端block,而对端是在systemserver中的NetworkPolicyManager。
  接下来继续去查NetworkPolicyManager为何会被block。
  通常情况下,可以搜索getNetworkPolicies,一般的对端的函数并不会修改函数名:
  看到我们的对端:
  "Binder_4" prio=5 tid=56 Blocked
  | group="main" sCount=1 dsCount=0 obj=0x1321a0a0 self=0xad31e200
  | sysTid=2491 nice=0 cgrp=default sched=0/0 handle=0x9fd00930
  | state=S schedstat=( 0 0 0 ) utm=46441 stm=46582 core=1 HZ=100
  | stack=0x9fc04000-0x9fc06000 stackSize=1014KB
  | held mutexes=
  at com.android.server.net.NetworkPolicyManagerService.getNetworkPolicies(NetworkPolicyManagerService.java:1696)
  - waiting to lock <0x07439315> (a java.lang.Object) held by thread 35
  at android.net.INetworkPolicyManager$Stub.onTransact(INetworkPolicyManager.java:145)
  at android.os.Binder.execTransact(Binder.java:453
  被Tid=35的人拿住了一把锁(0x07439315),那么继续看tid=35是谁,有两种方法:
  1. 搜索tid=35
  2. 搜索0x07439315,找到  - locked <0x07439315> (a java.lang.Object)
  
  "NetworkPolicy" prio=5 tid=35 TimedWaiting
  | group="main" sCount=1 dsCount=0 obj=0x12d98940 self=0x9f91f700
  | sysTid=2415 nice=0 cgrp=default sched=0/0 handle=0xa0f33930
  | state=S schedstat=( 0 0 0 ) utm=7681 stm=7783 core=0 HZ=100
  | stack=0xa0e31000-0xa0e33000 stackSize=1038KB
  | held mutexes=
  at java.lang.Object.wait!(Native method)
  - waiting on <0x02580c1b> (a java.lang.Object)
  at java.lang.Thread.parkFor$(Thread.java:1220)
  - locked <0x02580c1b> (a java.lang.Object)
  at sun.misc.Unsafe.park(Unsafe.java:299)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2053)
  at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:372)
  at com.android.server.NativeDaemonConnector$ResponseQueue.remove(NativeDaemonConnector.java:634)
  at com.android.server.NativeDaemonConnector.executeForList(NativeDaemonConnector.java:426)
  at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:345)
  at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:340)
  at com.android.server.NetworkManagementService.setInterfaceQuota(NetworkManagementService.java:1712)
  - locked <0x0b0f91b8> (a java.lang.Object)
  at com.android.server.net.NetworkPolicyManagerService.setInterfaceQuota(NetworkPolicyManagerService.java:2421)
  at com.android.server.net.NetworkPolicyManagerService.updateNetworkRulesLocked(NetworkPolicyManagerService.java:1232)
  at com.android.server.net.NetworkPolicyManagerService$14.onReceive(NetworkPolicyManagerService.java:1060)
  - locked <0x07439315> (a java.lang.Object)
  at android.app.LoadedApk$ReceiverDispatcher$Args.run(LoadedApk.java:881)
  at android.os.Handler.handleCallback(Handler.java:739)
  at android.os.Handler.dispatchMessage(Handler.java:95)
  at android.os.Looper.loop(Looper.java:148)
  at android.os.HandlerThread.run(HandlerThread.java:61)
  
  可以看到,NetworkPolicy在通过NativeDaemonConnector和netd通信(setInterfaceQuota)
  我们结合log来看下是否有有用信息,在发生ANR的前后,查找netd相关的打印:
  06-19 15:29:00.997  1235  1270 I am_anr  : [0,22831,com.android.settings,818429509,Input dispatching timed out (Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)]
  06-19 15:29:05.683  1235  2415 E NetdConnector: NDC Command {55445 bandwidth setiquota seth_w0 9223372036854775807} took too long (4755ms
  06-19 15:29:05.723  1235  2491 I dvm_lock_sample: [system_server,1,Binder_4,4919,NetworkPolicyManagerService.java,1696,-,1056,100]
  
  排查日志时,定位到问题点后,需根据anr类型向前排查,排查多长时间需根据anr类型确定,比如前台广播的anr需要向前找10s,查看当时处理的消息内容。  

  我们可以猜测ANR的原因是netd的cmd耗时太久导致的,那么在netd可能无法修改的情况下,我们应该如何去解决这个问题呢,将可能存在block的操作放到非UI线程中去做。  

四.如何避免ANR

1.合理使用主线程,主线程又叫UI线程,UI线程只做跟UI相关的工作。耗时的工作(比如数据库操作,I/O,连接网络或者别的有可能阻碍UI线程的block)把它放入单独的线程处理。

2.合理使用并遵循Android生命周期,避免在onCreate()和onResume()等做过多耗时的事情。

3.sharedPreference的使用:sharedPreference的commit()方法是同步的,apply()方法一般是异步执行的。在主线程尽量使用apply()替换。sharedPreference是全量写,所以尽量统一执行apply()(apply()方法在Activity stop的时候主线程会等待写入完成),sp存储内容不宜过大。

4.如果I/O阻塞,一般来说就是文件读写或数据库操作执行在主线程了,可以通过开辟子线程的方式异步执行。

5.减少复杂的layout,避免布局深层嵌套,使用merge和include减少层级,使用ViewStub懒加载减少不必要渲染。可使用Hierarchy Viewer工具检测View性能。

6.避免频繁过多的创建线程或者其它大对象。

7.避免加载过大数据和图片。对于图片展示较多的情况,应当考虑加载缩略图等方案。
8.避免滥用广播,如果处理一个广播需要花费较长时间,应当通过Service来完成。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值