Android进程间通信之binder - 可能导致的异常

概述

在平常程序运行过程中,可能碰到最多跟binder相关的异常是RemoteException,但本文只分析跟binder机制相关的异常,而RemoteException是server端逻辑导致的其它异常在client端的表现。
跟binder机制相关的异常有:android.app.RemoteServiceException: can’t deliver broadcast,JavaBinder: !!! FAILED BINDER TRANSACTION !!!,TransactionTooLargeException,DeadSystemException,DeadObjectException;有没有似曾相识的异常。
这些异常都跟上一遍文章Android进程间通信之binder - 几个重要数字 中的数字有扯不清的关系。
Android进程间通信之binder - 实战
Android进程间通信之binder - 几个重要数字
Android进程间通信之binder - debug transaction
Android进程间通信之binder - 重要工具aidl
Android进程间通信之binder - 上层协议IPCThreadState
Android进程间通信之binder - 工具类Parcel

异常分析

can’t deliver broadcast

Fatal Exception: android.app.RemoteServiceException: can't deliver broadcast
   at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1813)
   at android.os.Handler.dispatchMessage(Handler.java:102)
   at android.os.Looper.loop(Looper.java:154)
   at android.app.ActivityThread.main(ActivityThread.java:6776)
   at java.lang.reflect.Method.invoke(Method.java)
   at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:1520)
   at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1410)

第一眼看到这个堆栈是不是觉得这是系统问题,跟app没有关系,其实我第一次看见也是这么判断的,还执着的认为BroadcastQueue肯定是出问题了。
分析这个问题就带看源码?首先确定这个异常抛出的前后代码逻辑,我们一起看代码(本文代码都来自于aospxref.com android11):

    void performReceiveLocked(ProcessRecord app, IIntentReceiver receiver,
            Intent intent, int resultCode, String data, Bundle extras,
            boolean ordered, boolean sticky, int sendingUser)
            throws RemoteException {
        // Send the intent to the receiver asynchronously using one-way binder calls.
        if (app != null) {
            if (app.thread != null) {
                // If we have an app thread, do the call through that so it is
                // correctly ordered with other one-way calls.
                try {
                    app.thread.scheduleRegisteredReceiver(receiver, intent, resultCode,
                            data, extras, ordered, sticky, sendingUser, app.getReportedProcState());
                // TODO: Uncomment this when (b/28322359) is fixed and we aren't getting
                // DeadObjectException when the process isn't actually dead.
                //} catch (DeadObjectException ex) {
                // Failed to call into the process.  It's dying so just let it die and move on.
                //    throw ex;
                } catch (RemoteException ex) {
                    // Failed to call into the process. It's either dying or wedged. Kill it gently.
                    synchronized (mService) {
                        Slog.w(TAG, "Can't deliver broadcast to " + app.processName
                                + " (pid " + app.pid + "). Crashing it.");
                        app.scheduleCrash("can't deliver broadcast");
                    }
                    throw ex;
                }
            } else {
                // Application has died. Receiver doesn't exist.
                throw new RemoteException("app.thread must not be null");
            }
        } else {
            receiver.performReceive(intent, resultCode, data, extras, ordered,
                    sticky, sendingUser);
        }
    }

这个异常是执行scheduleRegisteredReceiver函数时抛出的RemoteException,是app.thread binder调用到server端时发生了异常,这儿的server不是system_server进程,这儿的server是指binder的server,在这儿server是app程序,(这儿衍生一个问题:这个ibinder对象是从什么地方赋值的?)既然是binder调用,肯定有一个aidl接口,找到接口描述;

frameworks/base/core/java/android/app/IApplicationThread.aidl
aidl中方法是一个正常的描述接口,第一眼看,这是一个在正常不过的binder同步调用,然而它却是一个异步调用,oneway关键字它写到哪儿了?
    void scheduleRegisteredReceiver(IIntentReceiver receiver, in Intent intent,
            int resultCode, in String data, in Bundle extras, boolean ordered,
            boolean sticky, int sendingUser, int processState);

它的整个aidl接口统一定义为异步调用;
oneway interface IApplicationThread {
...
}
IApplicationThread接口中的函数都是oneway调用;

这里需要去了解binder async调用执行流程:
在这里插入图片描述
在这里衍生一个知识点,oneway binder在server端只有一个线程在执行,具体的可以看驱动代码;我们在后面的原理文章中会仔细去分析它。
这个接口的参数加起来也不是很大,但是怎么就超出512k了,其实就是之前的binder调用把空间给占了,同步binder和异步空间使用同一片区域。所以说这个问题还待看到底之前的binder调用是谁,传输了多少size。是不是执行耗时。

FAILED BINDER TRANSACTION

DeadObjectException

上面分析了原理,当前这个failed binder transaction也是binder 内存空间的问题。

E/JavaBinder: !!! FAILED BINDER TRANSACTION !!!  (parcel size = 40084)                                            
W/System.err: android.os.DeadObjectException: Transaction failed on small parcel; remote process probably died    
W/System.err:     at android.os.BinderProxy.transactNative(Native Method)                                         
W/System.err:     at android.os.BinderProxy.transact(Binder.java:764)                                             
W/System.err:     at c.t.myapplication.IMyAidlInterface$Stub$Proxy.failedBinderError(IMyAidlInterface.java:149)   
W/System.err:     at c.t.myapplication.MainActivity$2.run(MainActivity.java:47)                                   
W/System.err:     at java.lang.Thread.run(Thread.java:764)   

产生这个异常的代码

        //client端调用代码
        int i = 0;
        int[] val = new int[10000];
        while (i<30) {
            i++;
            new Thread(new Runnable() {
                @Override
                public void run() {
                    try {
                        binder.failedBinderError(val);
                    } catch (RemoteException e) {
                        e.printStackTrace();
                    }
                }
            }).start();
        }
        //service端执行代码,用sleep模拟了一个耗时操作
        @Override
        public void failedBinderError(int[] val) throws RemoteException {
            try {
                Thread.sleep(1000000);  //Simulated long time operation
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }

TransactionTooLargeException

 W/System.err: android.os.TransactionTooLargeException: data parcel size 1200084 bytes                               
 W/System.err:     at android.os.BinderProxy.transactNative(Native Method)                                           
 W/System.err:     at android.os.BinderProxy.transact(Binder.java:764)                                               
 W/System.err:     at c.t.myapplication.IMyAidlInterface$Stub$Proxy.failedBinderError(IMyAidlInterface.java:149)     
 W/System.err:     at c.t.myapplication.MainActivity$2.run(MainActivity.java:47)                                     
 W/System.err:     at java.lang.Thread.run(Thread.java:764)     

为了方便对比,我把产生这个异常的代码也贴出来

		//client调用代码
        int i = 0;
        int[] val = new int[300000];
        while (i<30) {
            i++;
            new Thread(new Runnable() {
                @Override
                public void run() {
                    try {
                        binder.failedBinderError(val);
                    } catch (RemoteException e) {
                        e.printStackTrace();
                    }
                }
            }).start();
        }
        //service端代码跟上面是同一个函数

产生这两个异常的代码几乎一致,都是启动30个线程同时去调用binder方法,而同步调用的binder却是一个耗时方法,只要成功调用一次,binder service端的内存就会被占用,当重复到一定次数,内存耗尽后,在某一次调用中,binder调用返回失败; 大家还记得当前这个实例中,binder调用占用service的内存大小是多少吗?忘了可以回过头看看上一篇文章,binder的几个重要数字。
之所以产生了两个不同的异常,测试代码唯一不同的地方就是int 数组 val new的size不同;当前binder传输失败如果size小于200k就是第一个异常,如果当前传输大于200k,就抛出第二个异常。
直接看代码:
在binder调用失败后,执行此函数signalExceptionForError,针对具体error抛出相应的异常;本文只关心FAILED_TRANSACTION;

void signalExceptionForError(JNIEnv* env, jobject obj, status_t err,
        bool canThrowRemoteException, int parcelSize)
{
    switch (err) {
        case UNKNOWN_ERROR:
            jniThrowException(env, "java/lang/RuntimeException", "Unknown error");
            break;
        case NO_MEMORY:
            jniThrowException(env, "java/lang/OutOfMemoryError", NULL);
            break;
        case INVALID_OPERATION:
            jniThrowException(env, "java/lang/UnsupportedOperationException", NULL);
            break;
        case BAD_VALUE:
            jniThrowException(env, "java/lang/IllegalArgumentException", NULL);
            break;
        case BAD_INDEX:
            jniThrowException(env, "java/lang/IndexOutOfBoundsException", NULL);
            break;
        case BAD_TYPE:
            jniThrowException(env, "java/lang/IllegalArgumentException", NULL);
            break;
        case NAME_NOT_FOUND:
            jniThrowException(env, "java/util/NoSuchElementException", NULL);
            break;
        case PERMISSION_DENIED:
            jniThrowException(env, "java/lang/SecurityException", NULL);
            break;
        case NOT_ENOUGH_DATA:
            jniThrowException(env, "android/os/ParcelFormatException", "Not enough data");
            break;
        case NO_INIT:
            jniThrowException(env, "java/lang/RuntimeException", "Not initialized");
            break;
        case ALREADY_EXISTS:
            jniThrowException(env, "java/lang/RuntimeException", "Item already exists");
            break;
        case DEAD_OBJECT:
            // DeadObjectException is a checked exception, only throw from certain methods.
            jniThrowException(env, canThrowRemoteException
                    ? "android/os/DeadObjectException"
                            : "java/lang/RuntimeException", NULL);
            break;
        case UNKNOWN_TRANSACTION:
            jniThrowException(env, "java/lang/RuntimeException", "Unknown transaction code");
            break;
        case FAILED_TRANSACTION: {
            ALOGE("!!! FAILED BINDER TRANSACTION !!!  (parcel size = %d)", parcelSize);
            const char* exceptionToThrow;
            char msg[128];
            // TransactionTooLargeException is a checked exception, only throw from certain methods.
            // FIXME: Transaction too large is the most common reason for FAILED_TRANSACTION
            //        but it is not the only one.  The Binder driver can return BR_FAILED_REPLY
            //        for other reasons also, such as if the transaction is malformed or
            //        refers to an FD that has been closed.  We should change the driver
            //        to enable us to distinguish these cases in the future.
            if (canThrowRemoteException && parcelSize > 200*1024) {
                // bona fide large payload
                exceptionToThrow = "android/os/TransactionTooLargeException";
                snprintf(msg, sizeof(msg)-1, "data parcel size %d bytes", parcelSize);
            } else {
                // Heuristic: a payload smaller than this threshold "shouldn't" be too
                // big, so it's probably some other, more subtle problem.  In practice
                // it seems to always mean that the remote process died while the binder
                // transaction was already in flight.
                exceptionToThrow = (canThrowRemoteException)
                        ? "android/os/DeadObjectException"
                        : "java/lang/RuntimeException";
                snprintf(msg, sizeof(msg)-1,
                        "Transaction failed on small parcel; remote process probably died");
            }
            jniThrowException(env, exceptionToThrow, msg);
        } break;
        case FDS_NOT_ALLOWED:
            jniThrowException(env, "java/lang/RuntimeException",
                    "Not allowed to write file descriptors here");
            break;
        case UNEXPECTED_NULL:
            jniThrowNullPointerException(env, NULL);
            break;
        case -EBADF:
            jniThrowException(env, "java/lang/RuntimeException",
                    "Bad file descriptor");
            break;
        case -ENFILE:
            jniThrowException(env, "java/lang/RuntimeException",
                    "File table overflow");
            break;
        case -EMFILE:
            jniThrowException(env, "java/lang/RuntimeException",
                    "Too many open files");
            break;
        case -EFBIG:
            jniThrowException(env, "java/lang/RuntimeException",
                    "File too large");
            break;
        case -ENOSPC:
            jniThrowException(env, "java/lang/RuntimeException",
                    "No space left on device");
            break;
        case -ESPIPE:
            jniThrowException(env, "java/lang/RuntimeException",
                    "Illegal seek");
            break;
        case -EROFS:
            jniThrowException(env, "java/lang/RuntimeException",
                    "Read-only file system");
            break;
        case -EMLINK:
            jniThrowException(env, "java/lang/RuntimeException",
                    "Too many links");
            break;
        default:
            ALOGE("Unknown binder error code. 0x%" PRIx32, err);
            String8 msg;
            msg.appendFormat("Unknown binder error code. 0x%" PRIx32, err);
            // RemoteException is a checked exception, only throw from certain methods.
            jniThrowException(env, canThrowRemoteException
                    ? "android/os/RemoteException" : "java/lang/RuntimeException", msg.string());
            break;
    }
}

这代码是不是不需要解释了,全是关键字,只需要将异常msg组织起来,调用jni给vm抛出异常。

DeadSystemException

    /**
     * Rethrow this exception when we know it came from the system server. This
     * gives us an opportunity to throw a nice clean
     * {@link DeadSystemException} signal to avoid spamming logs with
     * misleading stack traces.
     * <p>
     * Apps making calls into the system server may end up persisting internal
     * state or making security decisions based on the perceived success or
     * failure of a call, or any default values returned. For this reason, we
     * want to strongly throw when there was trouble with the transaction.
     *
     * @throws RuntimeException
     */
    @NonNull
    public RuntimeException rethrowFromSystemServer() {
        if (this instanceof DeadObjectException) {
            throw new RuntimeException(new DeadSystemException());
        } else {
            throw new RuntimeException(this);
        }
    }

在整个app的执行过程中,生命周期,显示view等都需要跟system_server提供的service(activity,window,package)通信,如果系统binder调用出现问题,你在分析log是可能纳闷,system_server从log看运行的很正常,但是app居然给你报告system已经death。

到这儿binder几个常见异常已经算是告一段落。接下来,我们聊聊,在遇到binder问题是如何去调试问题;找到问题root cause。

都看到这儿了,辛苦一下,给点个赞呗。。。。

  • 3
    点赞
  • 3
    评论
  • 0
    收藏
  • 一键三连
    一键三连
  • 扫一扫,分享海报

打赏
文章很值,打赏犒劳作者一下
相关推荐
©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页

打赏

码龙1234

你的鼓励将是我创作的最大动力

¥2 ¥4 ¥6 ¥10 ¥20
输入1-500的整数
余额支付 (余额:-- )
扫码支付
扫码支付:¥2
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付 9.90元
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值