Android 进阶——性能优化之因Handler引起句柄泄漏导致ANR的定位和解决

引言

在这里插入图片描述

一次测试在验证某种特殊情况下的用例时引Handler最终导致ANR,

一、场景重现

具体场景是包名为com.crazyai.whiteboard.systemui 的进程通过Binder 与包名为com.crazyai.voiceframework 进程通信,具体逻辑是包名为com.crazyai.whiteboard.systemui的进程通过直接调用com.crazyai.voiceframework.core.AIVoiceManager.disableWaken方法。

//com.crazyai.voiceframework.core.AIVoiceManager 
public int disableWaken() {
    MoLogUtil.d(TAG,"DUI#FRAMEWORK stop waken done");
    return postMessage(new WakeupMessage()
                       .setAction(Constants.Wakeup.ACTION.STOP)
                       .toRemoteMessage());
}

private int postMessage(@NonNull RemoteMessage message) {
    if (checkPrepare()) {
        try {
            //{@link RemoteInteractImpl #callIPC}
            return getBaseManager().callIPC(message);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    return ResultCode.RESULT_BIND_GET_ERROR;
} 

而com.lamy.aivoiceserver.core.RemoteInteractImpl部分代码如下:

@Override
    public synchronized int callIPC(RemoteMessage remoteMessage) throws RemoteException {
         return processCall(remoteMessage);
    }

    private int processCall(RemoteMessage message) {
        if(message!=null) {
        ...
            if(!TextUtils.isEmpty(msg)) {
                BaseCommand command = null;
                JSONObject object = null;
                try {
                    object = new JSONObject(msg);
                    command = new WakeupCommand(object);
                } catch (JSONException e) {
                    return ResultCode.RESULT_UNKNOWN_ERROR;
                }
                if (command != null) {
                    command.callIPC(object);
                }
                return ResultCode.RESULT_OK;
            }
        }
        return ResultCode.RESULT_UNKNOWN_ERROR;
    }

问题就出在这里:

public class WakeupCommand extends BaseCommand {
    protected JSONObject cmdJson;
	 protected HandlerThread handlerThread;
    protected SyncHandler workHandler;
    //##罪魁祸首## 
    public WakeupCommand(@NonNull JSONObject json,RemoteInteractImpl impl) {
    	  super(json,impl);
        this.remoteInteract=impl;
        this.cmdJson = json;
        this.handlerThread = new HandlerThread("VBOT#"+TAG);
        this.handlerThread.start();
        this.workHandler = new SyncHandler(handlerThread.getLooper());
    }
    ...
}

当我这个接口方法被短时间内通过Handler机制频繁调用的时候,一段时间后就会ANR。

二、ANR

1、ANR概述

当APP有一段时间响应不够灵敏,系统会向用户显示一个对话框,这个对话框称作“应用程序无响应”(Application Not Responding)对话框,用户可以选择“等待”而让程序继续运行,也可以选择“强制关闭”。如果开发机器上出现ANR问题时,系统会生成一个traces.txt的文件放在/data/anr下,最新的ANR信息在最开始部分。通过adb命令将其导出到本地:

$adb pull data/anr/traces.txt

2、ANR 的主要类型

  • KeyDispatchTimeout——主要是类型按键或触摸事件在5s内输入系统无响应
  • BroadcastTimeout——BroadcastReceiver在20s内无法处理完成
  • ServiceTimeout——小概率事件 Service在10s内无法处理完成

3、在Android中默认场景限制(超出就会ANR):

  • Activity的最长执行时间是5秒(主要类型)。
  • BroadcastReceiver的最长执行时间的则是10秒。
  • ServiceTimeout的最长执行时间是20秒(少数类型)。

4、当应用程序的UI线程响应超时才会引起ANR 的原因

  • 当前事件没有机会处理,例如UI线程正在响应另外的事件,当前事件被某个事件给阻塞掉了。
  • 当前事件正在处理 但是由于耗时太长没有能及时的完成。
  • 主线程被阻塞。
  • 在BroadcastReceiver里的工作没有在10s内完成(可以使用IntentService替代),Service的任务耗时超过20s。
  • 发生了死锁。
  • 耗时操作的动画需要大量的计算工作,可能导致CPU负载过重。
  • 主线程当中执行了耗时的计算(IO、网络操作等)。比如自定义控件的时候onDraw方法里面经常这么做。
    (同时聊一聊自定义控件的性能优化:在onDraw里面创建对象容易导致内存抖动—绘制动作会大量不断调用,产生大量垃圾对象导致GC很频繁就造成了内存抖动。内存抖动就容易造成UI出现掉帧卡顿的问题)
  • Service执行了耗时的操作,因为Service也是在主线程当中执行的,Service生命周期的各个回调和其他的应用组件一样,是跑在主线程中,会影响到你的UI操作或者阻塞主线程中的其他事情,所以耗时操作应该在service里面开启子线程来做。
  • Activity的onCreate和onResume回调中执行耗时的操作。

UI线程主要包括如下:
Activity:onCreate(), onResume(), onDestroy(), onKeyDown(), onClick()
AsyncTask: onPreExecute(), onProgressUpdate(), onPostExecute(), onCancel()
Mainthread handler: handleMessage(), post(runnable r)

5、ANR优化N部曲

  • 布局优化,使用herarchyviewer(视图层次窗口)工具。
  • 删除无用的空间和层级,尽量避免过度绘制。
  • 选择性能较低的viewgroup,比如在可以选择RelativeLayout也可以使用LinearLayout的情况下,优先使用LinearLayout,因为相对来说RelativeLayout功能较为复杂,会占用更多的CPU资源。
  • 使用标签重用布局、减少层级、进行预加载(用的时候才加载)。
  • 绘制优化,指view在onDraw方法中避免大量的耗时的操作,由于onDraw方法可能会被频繁的调用,尽量onDraw方法中不要创建新的局部变量,
    ondraw方法被频繁的调用,很容易引起GC;ondraw方法不要做耗时的操作;。
  • 线程优化,使用线程池来管理和复用线程,避免程序中出现大量的Thread。
  • 为了执行一个长时间的耗时操作而创建一个工作线程最方便高效的方式是使用AsyncTask,只需要继承AsyncTask并实现doInBackground()方法来执行任务即可
  • 开发过程中使用Thread或者HandlerThread,可以尝试调用Process.setThreadPriority(Process.THREAD_PRIORITY_BACKGROUND)设置较低的优先级,否则仍然会降低程序响应,因为默认Thread的优先级和主线程相同。
  • Activity的onCreate和onResume回调中尽量避免耗时的代码,应该尽可能的做比较少的事情,其实,任何执行在UI线程中的方法都应该尽可能简短快速。类似网络或者DB操作等可能长时间执行的操作,或者是类似调整bitmap大小等需要长时间计算的操作,都应该执行在工作线程中。
  • 使用AsyncTask处理耗时的IO等操作。
  • 使用Thread或者HandlerThread时,使用Process.setThreadPriority(Process.THREAD_PRIORITY_BACKGROUND)或者java.lang.Thread.setPriority (int priority)设置优先级为后台优先级,这样可以让其他的多线程并发消耗CPU的时间会减少,有利于主线程的处理。

三、分析定位

1、分析APLOG

通常发生ANR的时候APLOG 一般会有明显的关键字 ANR,从日志中我们可以得到几条关键的线索:

  • AIVoiceManager 的是在PID为2906的进程中被调用
  • WakeupCommand是运行在PID为3459的进程中
  • 包名为com.crazyai.whiteboard.systemui的进程内部PID为2906 的线程在等待了5531.5 ms 后输入超时未得到相应导致ANR
...
01-07 09:38:26.039 I/AIVoiceManager->->( 2906): ★【IPC_1.0】RemoteMessage={"WakeupMessage":{"action":1}}01-07 09:38:26.039 I/RemoteInteractImpl->( 3549): ★【IPC_1.1】RemoteMessage={"WakeupMessage":{"action":1}}01-07 09:38:26.040 I/CommandFactory->( 3549): ★【IPC_1.2】command=com.crazyai.aivoiceserver.command.WakeupCommand@8ee4aa1★
01-07 09:38:26.041 I/WakeupCommand->->( 3549): ★【IPC_1.3】【Wake】message={"WakeupMessage":{"action":1}}01-07 09:38:26.041 D/DUIBot->( 3549): ¤DUI#TTS init=false,ASR init=false,Wake init=false¤
01-07 09:38:26.041 W/DUIBot->( 3549): caused by $com.crazyai.libduivoice.DUIVoiceBot on line 588 call disableWaken DUI#VBOT no need close!$
01-07 09:38:26.042 D/VoiceAssistant( 2906): VoiceAssistant removeView null
01-07 09:39:16.757 E/ActivityManager( 2420): ANR in com.crazyai.whiteboard.systemui
01-07 09:39:16.757 E/ActivityManager( 2420): PID: 2906
01-07 09:39:16.757 E/ActivityManager( 2420): Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  Wait queue length: 6.  Wait queue head age: 5531.5ms.)
01-07 09:39:16.757 E/ActivityManager( 2420): Load: 14.27 / 13.19 / 10.4
01-07 09:39:16.757 E/ActivityManager( 2420): CPU usage from 0ms to 5319ms later (2021-01-07 09:39:11.385 to 2021-01-07 09:39:16.705):
01-07 09:39:16.757 E/ActivityManager( 2420):   48% 2420/system_server: 31% user + 17% kernel / faults: 3809 minor 8 major
01-07 09:39:16.757 E/ActivityManager( 2420):   13% 2087/vendor.hisilicon.hardware.hwtvmw@1.0-service: 4.8% user + 8.2% kernel
01-07 09:39:16.757 E/ActivityManager( 2420):   11% 3978/com.ecloud.eairplay: 9.7% user + 1.6% kernel / faults: 1240 minor
01-07 09:39:16.757 E/ActivityManager( 2420):   9.9% 1291/HI_VPSS_Process: 0% user + 9.9% kernel
01-07 09:39:16.757 E/ActivityManager( 2420):   9.9% 2906/com.crazyai.whiteboard.systemui: 7.1% user + 2.8% kernel / faults: 1301 minor 2 major
01-07 09:39:16.757 E/ActivityManager( 2420):   9% 2098/surfaceflinger: 4.1% user + 4.8% kernel / faults: 694 minor
01-07 09:39:16.757 E/ActivityManager( 2420):   6.5% 2243/pqserver: 0.7% user + 5.8% kernel
01-07 09:39:16.757 E/ActivityManager( 2420):   3% 2892/com.hisilicon.tvui: 1.8% user + 1.1% kernel / faults: 798 minor
01-07 09:39:16.757 E/ActivityManager( 2420):   2.6% 2223/audioserver: 1.5% user + 1.1% kernel / faults: 118 minor
01-07 09:39:16.757 E/ActivityManager( 2420):   0% 3392/com.crazyai.timemanager: 0% user + 0% kernel / faults: 1095 minor
...

以上就是从APLOG得到的全部线索,尤其记住记住以上两个PID。

2、分析/data/anr/trace.txt

/data/anr/trace.txt文件是用于ANR时保存各进程/线程当时堆栈快照详细信息的:前面是线程名称,sysTid为进程/线程Id,pc代表PC 计数器,括号内为函数入口地址的偏移量等。

.......
"main" prio=5 tid=1 Native
  | group="main" sCount=1 dsCount=0 flags=1 obj=0x73658810 self=0x7c3aebea00
  | sysTid=2906 nice=0 cgrp=default sched=0/0 handle=0x7c3f6a19b0
  | state=S schedstat=( 0 0 0 ) utm=999 stm=279 core=3 HZ=100
  | stack=0x7fe11c7000-0x7fe11c9000 stackSize=8MB
  | held mutexes=
  kernel: __switch_to+0x90/0xb8
  kernel: binder_thread_read+0x444/0x1008
  kernel: binder_ioctl_write_read+0x16c/0x318
  kernel: binder_ioctl+0x514/0x760
  kernel: do_vfs_ioctl+0xb0/0x760
  kernel: SyS_ioctl+0x94/0xa8
  kernel: el0_svc_naked+0x24/0x28
  native: #00 pc 00000000000686bc  /system/lib64/libc.so (__ioctl+4)
  native: #01 pc 0000000000023ed0  /system/lib64/libc.so (ioctl+132)
  native: #02 pc 0000000000061aa8  /system/lib64/libbinder.so (_ZN7android14IPCThreadState14talkWithDriverEb+256)
  native: #03 pc 0000000000062840  /system/lib64/libbinder.so (_ZN7android14IPCThreadState15waitForResponseEPNS_6ParcelEPi+340)
  native: #04 pc 0000000000062560  /system/lib64/libbinder.so (_ZN7android14IPCThreadState8transactEijRKNS_6ParcelEPS1_j+216)
  native: #05 pc 0000000000056e30  /system/lib64/libbinder.so (_ZN7android8BpBinder8transactEjRKNS_6ParcelEPS1_j+72)
  native: #06 pc 0000000000121500  /system/lib64/libandroid_runtime.so (???)
  native: #07 pc 0000000000933cb4  /system/framework/arm64/boot-framework.oat (Java_android_os_BinderProxy_transactNative__ILandroid_os_Parcel_2Landroid_os_Parcel_2I+196)
  at android.os.BinderProxy.transactNative(Native method)
  at android.os.BinderProxy.transact(Binder.java:748)
  at com.crazyai.voiceframework.IRemoteInteract$Stub$Proxy.callIPC(IRemoteInteract.java:7)
  at com.crazyai.voiceframework.core.AIVoiceManager.postMessage(AIVoiceManager.java:3)
  at com.crazyai.voiceframework.core.AIVoiceManager.disableWaken(AIVoiceManager.java:5)
  at com.crazyai.whiteboard.systemui.voice.VoiceAssistant.disable(VoiceAssistant.kt:1)
  at com.crazyai.whiteboard.systemui.voice.VoiceAssistant.initAssistant(VoiceAssistant.kt:4)
  at com.crazyai.whiteboard.systemui.voice.VoiceAssistant.access$initAssistant(VoiceAssistant.kt:1)
  at com.crazyai.whiteboard.systemui.voice.VoiceAssistant$readyReceiver$1$a.run(VoiceAssistant.kt:1)
  at android.os.Handler.handleCallback(Handler.java:789)
  at android.os.Handler.dispatchMessage(Handler.java:98)
  at android.os.Looper.loop(Looper.java:164)
  at android.app.ActivityThread.main(ActivityThread.java:6553)
  at java.lang.reflect.Method.invoke(Native method)
  at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:240)
  at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:767)
  ....

从trace文件可以初步得出 包名为com.crazyai.whiteboard.systemui 与 包名为com.crazyai.voiceframework 通过Binder进行通信时,调用链为com.crazyai.whiteboard.systemui.voice.VoiceAssistant.disable–>com.crazyai.voiceframework.core.AIVoiceManager.disableWaken–>com.crazyai.voiceframework.core.AIVoiceManager.postMessage–>com.crazyai.voiceframework.IRemoteInteractImpl.callIPC导致了ANR。但是还是看不出什么原因。

3、分析/data/tombstones目录下的文件

/data/tombstones目录下的文件,当运行在Linux Kernel内核之上的Android系统出现异常时,一般会自动重启Android层的,但可能导致问题很难复现定位debug,于是当Android层出现异常就会将进程的上下文信息保存到墓碑tombstone(进程下所属的所有线程的堆栈信息),主要包括:

  • 标准开头是标志性log开始 " * * * * * * * * * * * * * * "

  • dump_header_info打印头信息

  • dump_thread_info打印thread信息

  • dump_signal_info打印信号信息

  • dump_probable_cause打印可能原因信息

  • dump_registers打印寄存器信息

  • log_backtrace打印backtrace的信息

  • dump_stack打印stack的信息

  • dump_memory_and_code打印memory的信息

  • dump_all_maps打印map的信息

  • dump_log_file(log, pid, “system”, tail);打印system log的信息

  • dump_log_file(log, pid, “main”, tail);打印mainlog的信息

    • 当Native进程发生了异常,比如NULL指针
    • 操作系统会去异常向量表的地址去处理异常,然后发送信号
    • 在debuggred_init注册的信号处理函数就会收到处理
    • 创建伪线程去启动crash_dump进程,crash_dump则会获取当前进程中各个线程的crash信息
    • tombstoned进程是开机就启动的,开机时注册好了socket等待监听
    • 当在crash_dump中去连接tombstoned进程的时候,根据传递的dump_type类型会返回一个/data/tombstones/下文件描述符
    • crash_dump进程后续通过engrave_tombstone函数将所有的线程的详细信息写入到tombstone文件中
    • 则就在/data/tombstones下生成了此次对应的tombstone_XX文件

文件格式形如tombstone_00,以下为部分关键信息,在pid为3549的进程下以VBOT#WakeupComm为名的线程在短时间内被创建了300多条,Binder线程池内部的线程也创建了100多条且TID 均不同,而且最关键是看到了由于'Could not create epoll instance: Too many open files内核发出了SIGABRT 信号,

*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Build fingerprint: 'unknown'
Revision: '0'
ABI: 'arm64'
pid: 3549, tid: 10427, name: VBOT#WakeupComm  >>> com.lamy.aivoiceserver <<<
signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
Abort message: 'Could not create epoll instance: Too many open files'
    x0   0000000000000000  x1   00000000000028bb  x2   0000000000000006  x3   0000000000000008
    x4   0000000000000000  x5   0000000000000000  x6   0000000000000000  x7   7f7f7f7f7f7f7f7f
    x8   0000000000000083  x9   bdee45110d16c533  x10  0000000000000000  x11  0000000000000001
    x12  ffffffffffffffff  x13  ffffffffffffffff  x14  ff00000000000000  x15  ffffffffffffffff
    x16  0000007c3d173308  x17  0000007c3d1155d0  x18  0000007c3acee480  x19  0000000000000ddd
    x20  00000000000028bb  x21  00000000703fb870  x22  00000000000028bb  x23  0000000000000001
    x24  00000000703a2da8  x25  0000000013080088  x26  00000000130800b0  x27  0000000013080100
    x28  0000007c3adc4000  x29  0000007c00ffd950  x30  0000007c3d0c9994
    sp   0000007c00ffd910  pc   0000007c3d1155d8  pstate 0000000060000000
    v0   2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e  v1   0000007c3d1d59800000007c00ffd830
    v2   2e6761742e676f6c2e74736973726570  v3   80200800000000008020080080000000
    v4   00000000000000000000000000000000  v5   00000000000000000000000000000000
    v6   40000000400000004000000000000000  v7   80200802802008028020080280200802
    v8   00000000000000000000000000000000  v9   00000000000000000000000000000000
    v10  00000000000000000000000000000000  v11  00000000000000000000000000000000
    v12  00000000000000000000000000000000  v13  00000000000000000000000000000000
    v14  00000000000000000000000000000000  v15  00000000000000000000000000000000
    v16  40100401401004014010040140100401  v17  400040004000000040404000a800a880
    v18  40000000400000004000000000000000  v19  00000000000000000000007c00000000
    v20  00000000000000000000007cffe5b533  v21  00000000000000003a000088ffe5b533
    v22  000000000000000000ffffffffffffff  v23  00000000000000000000000000000000
    v24  00000000000000003afd5f3800000000  v25  000000000000000001fe01ca016a0066
    v26  00000000000000000000000000000000  v27  000000000000000000000000ebad808b
    v28  000000000000000000000000ebad808c  v29  000000000000000000000000ebad808d
    v30  000000000000000000000000ebad808e  v31  000000000000000000000000ffffffff
    fpsr 00000013  fpcr 00000000

pid: 3549, tid: 3586, name: Binder:3549_2  >>> com.lamy.aivoiceserver <<<
signal 5 (SIGTRAP), code -32763 (?), fault addr 0x273e00000e02
    x0   0000000000000003  x1   00000000c0306201  x2   0000007c247792d8  x3   0000000000000000
    x4   0000000000000000  x5   0000000000000000  x6   0000000000000000  x7   00335f393435333a
    x8   000000000000001d  x9   0000007c24779298  x10  00000000ffffffd0  x11  0000007c24779260
    x12  0000007c24779298  x13  0000007c247792d0  x14  0000000000000100  x15  0000007c24778f78
    x16  0000007c3b772dc8  x17  0000007c3d0cfe4c  x18  0000007c3acee480  x19  0000007c31633058
    x20  0000007c316330c0  x21  0000007c31633000  x22  00000000fffffff7  x23  0000000000000100
    x24  0000000000000004  x25  0000000000000000  x26  0000000000000000  x27  0000000000000000
    x28  0000000012c32a80  x29  0000007c247792c0  x30  0000007c3d0cfed4
    sp   0000007c247791e0  pc   0000007c3d1146bc  pstate 00000000a0000000
    v0   0000007c245800000000007c00000001  v1   0000007c247791300000007c247791a0
    v2   000000003a7000676e69636172742e64  v3   00000000000000000000000000000000
    v4   80200802000008000000000000000000  v5   00000000000004000000000000100000
    v6   00000000000000000000000000000400  v7   80200802802008028020080280200802
    v8   00000000000000000000000000000000  v9   00000000000000000000000000000000
    v10  00000000000000000000000000000000  v11  00000000000000000000000000000000
    v12  00000000000000000000000000000000  v13  00000000000000000000000000000000
    v14  00000000000000000000000000000000  v15  00000000000000000000000000000000
    v16  40100401401004014010040140100401  v17  a00a000800000004aa08000400040010
    v18  80200802000008000000000000000400  v19  00000000000000000000007c00000000
    v20  00000000000000000000007cffe5b533  v21  00000000000000003a000088ffe5b533
    v22  000000000000000000ffffffffffffff  v23  002e0041002400440000000000000000
    v24  00000000000000003afd5f3800000000  v25  007401fa00be008801fe01ca016a0066
    v26  003a0000000000680000000000000000  v27  000000000000000000000000ebad808b
    v28  000000000000000000000000ebad808c  v29  000000000000000000000000ebad808d
    v30  000000000000000000000000ebad808e  v31  00000000000000000000000012c32968
    fpsr 00000013  fpcr 00000000

backtrace:
    #00 pc 00000000000686bc  /system/lib64/libc.so (__ioctl+4)
    #01 pc 0000000000023ed0  /system/lib64/libc.so (ioctl+132)
    #02 pc 0000000000061aa8  /system/lib64/libbinder.so (_ZN7android14IPCThreadState14talkWithDriverEb+256)
    #03 pc 0000000000061c18  /system/lib64/libbinder.so (_ZN7android14IPCThreadState20getAndExecuteCommandEv+24)
    #04 pc 00000000000622e8  /system/lib64/libbinder.so (_ZN7android14IPCThreadState14joinThreadPoolEb+60)
    #05 pc 0000000000083944  /system/lib64/libbinder.so (_ZN7android10PoolThread10threadLoopEv+24)
    #06 pc 000000000001160c  /system/lib64/libutils.so (_ZN7android6Thread11_threadLoopEPv+280)
    #07 pc 00000000000b7a5c  /system/lib64/libandroid_runtime.so (_ZN7android14AndroidRuntime15javaThreadShellEPv+136)
    #08 pc 0000000000065f88  /system/lib64/libc.so (_ZL15__pthread_startPv+36)
    #09 pc 000000000001ed24  /system/lib64/libc.so (__start_thread+68)
	
	--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
pid: 3549, tid: 3993, name: VBOT#WakeupComm  >>> com.lamy.aivoiceserver <<<
signal 5 (SIGTRAP), code -32763 (?), fault addr 0x273e00000f99
    x0   fffffffffffffffc  x1   0000007c23465bf8  x2   0000000000000010  x3   00000000ffffffff
    x4   0000000000000000  x5   0000000000000008  x6   0000007c3f549000  x7   0000000000027526
    x8   0000000000000016  x9   7fffffffffffffff  x10  000000000000000c  x11  0000000000000000
    x12  0000000000000018  x13  000000005ff6607d  x14  0031880de3719bc0  x15  00000e57b65572b5
    x16  0000007c3d173498  x17  0000007c3d0cb4d0  x18  0000000000000008  x19  0000007c3161d180
    x20  0000007c3161d228  x21  00000000ffffffff  x22  00000000ffffffff  x23  0000007c3161d180
    x24  0000000000000028  x25  000000000000000c  x26  0000007c31618a40  x27  000000006ff4bd88
    x28  00000000704faf40  x29  0000007c23465ba0  x30  0000007c3d0cb504
    sp   0000007c23465b90  pc   0000007c3d1145d0  pstate 0000000060000000
    v0   00000000000000000000000000000000  v1   00000000000000000000007c3dbcb1c0
    v2   00000000000000006e4965746f6d6552  v3   00000000000000000000000000000000
    v4   00000000000000000000000000000000  v5   00000000000000004000000000000000
    v6   00000000000000000000000000000000  v7   00000000000000008020080280200802
    v8   00000000000000000000000000000000  v9   00000000000000000000000000000000
    v10  00000000000000000000000000000000  v11  00000000000000000000000000000000
    v12  00000000000000000000000000000000  v13  00000000000000000000000000000000
    v14  00000000000000000000000000000000  v15  00000000000000000000000000000000
    v16  40100401401004014010040140100401  v17  a008000200000000a802000040404000
    v18  80200800000000020000000000000000  v19  00000000000000000000007c00000000
    v20  00000000000000000000007cffe5b533  v21  00000000000000003a000088ffe5b533
    v22  000000000000000000ffffffffffffff  v23  002e0041002400440000000000000000
    v24  00000000000000003afd5f3800000000  v25  007401fa00be008801fe01ca016a0066
    v26  003a0000000000680000000000000000  v27  000000000000000000000000ebad808b
    v28  000000000000000000000000ebad808c  v29  000000000000000000000000ebad808d
    v30  000000000000000000000000ebad808e  v31  000000000000000000000000ffffffff
    fpsr 00000013  fpcr 00000000
    ...

其实分析到这就可以初步得出结论了:由于频繁创建HanderThread 却没有及时释放,同时Looper 一直在loop导致前面创建的HandlerThread 没有被释放,而Handler 机制是基于Linux 的epoll机制的,每一次创建时都会打开一个句柄fd,只有主动quit 时才会去关闭对应的句柄,更重要的是Linux 内核对于每一个进程最大能打开的句柄数是有限制的,这种情况下只有创建打开,没有关闭释放自然会超过句柄的最大数,从而导致了Could not create epoll instance: Too many open files

4、查看proc/pid/fd

PID 代表具体的线程ID

/proc/pid/fd 为了进一步验证我们的结论,root设备后通过adb shell 命令

Hi3751V811:/ # cd proc/
Hi3751V811:/proc # ps -A | grep aiv
u0_a46        3239  2057 1528200  99976 SyS_epoll_wait      0 S com.crazyai.aivoiceserver
Hi3751V811:/proc # cd 3239
Hi3751V811:/proc/3239 # cd fd
Hi3751V811:/proc/3239/fd # ls -all

然后就会看到,显示了所有的打开着的句柄


Hi3751V811:/proc/3239/fd # ls -all
total 0
dr-x------ 2 u0_a46 u0_a46  0 2021-01-11 14:56 .
dr-xr-xr-x 9 u0_a46 u0_a46  0 2021-01-11 14:56 ..
lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 0 -> /dev/null
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 10 -> /system/framework/core-libart.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 11 -> /system/framework/conscrypt.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 12 -> /system/framework/okhttp.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 13 -> /system/framework/legacy-test.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 14 -> /system/framework/bouncycastle.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 15 -> /system/framework/ext.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 16 -> /system/framework/framework.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 17 -> /system/framework/telephony-common.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 18 -> /system/framework/voip-common.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 19 -> /system/framework/ims-common.jar
lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 2 -> /dev/null
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 20 -> /system/framework/apache-xml.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 21 -> /system/framework/org.apache.http.legacy.boot.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 22 -> /system/framework/android.hidl.base-V1.0-java.jar
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 23 -> /system/framework/android.hidl.manager-V1.0-java.j
ar

lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 34 -> anon_inode:[eventfd]
lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 35 -> anon_inode:[eventpoll]
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 36 -> /system/app/LamyAIVoiceServer/LamyAIVoiceServer.ap
k
lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 37 -> anon_inode:[eventfd]
lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 38 -> anon_inode:[eventpoll]
lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 39 -> anon_inode:[eventfd]
l-wx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 4 -> /sys/kernel/debug/tracing/trace_marker
lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 40 -> anon_inode:[eventpoll]
lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 41 -> anon_inode:[eventfd]
lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 42 -> anon_inode:[eventpoll]
lr-x------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 43 -> pipe:[28897]
lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 44 -> anon_inode:[eventfd]
lrwx------ 1 u0_a46 u0_a46 64 2021-01-11 14:56 45 -> anon_inode:[eventpoll]

...

由于篇幅问题不一一贴出,当时3459 进程下打开了1200+个句柄,而且很多的句柄都是重复指向同一文件的,于是石锤最终砸下。

四、优化修改

既然我们知道Handler 机制在sendMessage 时创建并打开新的句柄,那么避免这种情况就只有一个核心思想:不创建多余的HandlerThread并且及时把HandlerThread quit掉

public class WakeupCommand extends BaseCommand {
    public static volatile WakeupCommand wakeCommand;
    private WakeupCommand(){
        super();
    }
    private WakeupCommand(@NonNull JSONObject json,RemoteInteractImpl impl) {
        remoteInteract=impl;
        this.cmdJson = json;
        handlerThread = new HandlerThread("VBOT#"+TAG);
        handlerThread.start();
        workHandler = new SyncHandler(handlerThread.getLooper());
    }
    protected static WakeupCommand getInstance(@NonNull JSONObject json,RemoteInteractImpl impl){
        if(wakeCommand==null){
            synchronized (WakeupCommand.class){
                wakeCommand=new WakeupCommand(json,impl);
            }
        }
        return wakeCommand;
    }
    ...
}

最后希望这篇文章提供给你的是解决问题的思路,而不是出现“Could not create epoll instance: Too many open files”就直接照抄照搬。

  • 3
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

CrazyMo_

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值