稳定性问题ANR-input

码龙1234

已于 2024-06-02 15:06:48 修改

阅读量306

点赞数 4

分类专栏： # ANR 文章标签： android 稳定性 ANR

于 2024-06-01 00:14:53 首次发布

本文链接：https://blog.csdn.net/over_qqqq/article/details/139338901

版权

ANR 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

专栏简介

接上文【Android ANR简介】内容，深入探索input 类型的ANR问题产生原理，至于解决此类ANR的进阶内容会在下篇【稳定性问题ANR-input进阶】中详细介绍。

input ANR简介

Android app的input事件都是由主线程消费的，假设主线程有耗时函数执行，就会产生ANR问题；那么问题来了，主线程就不能执行耗时函数吗？答案是能；同理解释一下主线程中执行耗时函数，且不碰到ANR问题的检测诱因，就不会产生ANR；假设一个APP不接受任何广播，不处理任何input事件（如后台程序），并且已经启动就绪，这个时候主线程执行耗时操作永远不会产生ANR。同理可知，广播和input事件也是ANR问题的一个检测点。再次抽象，ANR问题是系统定义的一些场景必须在既定的时间内处理完成，从loop中清理timeout消息，否则timeout消息一旦执行，就会触发ANR问题。

原理

在这里插入图片描述

AMS 中的appNotResponding方法由inputDispatchingTimedOut调用，接着反推是谁调用的这个方法。

 /**
     * Handle input dispatching timeouts.
     * @return whether input dispatching should be aborted or not.
     */
    boolean inputDispatchingTimedOut(ProcessRecord proc, String activityShortComponentName,
            ApplicationInfo aInfo, String parentShortComponentName,
            WindowProcessController parentProcess, boolean aboveSystem,
            TimeoutRecord timeoutRecord) {
        try {
            Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "inputDispatchingTimedOut()");
 ....
                mAnrHelper.appNotResponding(proc, activityShortComponentName, aInfo,
                        parentShortComponentName, parentProcess, aboveSystem, timeoutRecord,
                        /*isContinuousAnr*/ true);
            }
...
        } finally {
            Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
        }
    }

此方法的调用源头在InputDispatcher.cpp

void InputDispatcher::sendWindowUnresponsiveCommandLocked(const sp<IBinder>& token,
                                                          std::optional<int32_t> pid,
                                                          std::string reason) {
    auto command = [this, token, pid, reason = std::move(reason)]() REQUIRES(mLock) {
        scoped_unlock unlock(mLock);
        mPolicy.notifyWindowUnresponsive(token, pid, reason);
    };
    postCommandLocked(std::move(command));
}

在每次循环时检查下次anr check的时间，如果当前时间小于check time，返回时间点，设置到epoll的timeout中，否则调用上面方法触发ANR问题。

nsecs_t InputDispatcher::processAnrsLocked() {
    const nsecs_t currentTime = now();
    nsecs_t nextAnrCheck = LLONG_MAX;
    。。。

    // Check if any connection ANRs are due
    nextAnrCheck = std::min(nextAnrCheck, mAnrTracker.firstTimeout());
    if (currentTime < nextAnrCheck) { // most likely scenario
        return nextAnrCheck;          // everything is normal. Let's check again at nextAnrCheck
    }

    // If we reached here, we have an unresponsive connection.
    std::shared_ptr<Connection> connection = getConnectionLocked(mAnrTracker.firstToken());
    if (connection == nullptr) {
        ALOGE("Could not find connection for entry %" PRId64, mAnrTracker.firstTimeout());
        return nextAnrCheck;
    }
    connection->responsive = false;
    // Stop waking up for this unresponsive connection
    mAnrTracker.eraseToken(connection->inputChannel->getConnectionToken());
    onAnrLocked(connection);
    return LLONG_MIN;
}

上面代码省略处为另外的ANR，是窗口切换导致的ANR，本文不展开，因此删除了。
nextAnrCheck选取了一个最小值，不关心forcewindow逻辑，假设mAnrTracker的timeout时间最小。

void InputDispatcher::startDispatchCycleLocked(nsecs_t currentTime,
                                               const std::shared_ptr<Connection>& connection) {
    。。。
    while (connection->status == Connection::Status::NORMAL && !connection->outboundQueue.empty()) {
        。。。
        const std::chrono::nanoseconds timeout = getDispatchingTimeoutLocked(connection);
        dispatchEntry->timeoutTime = currentTime + timeout.count();

mAnrTracker中insert的就是dispatchEntry的时间和connection，此次这个connection是app的socket对象。

std::chrono::nanoseconds InputDispatcher::getDispatchingTimeoutLocked(
        const std::shared_ptr<Connection>& connection) {
    if (connection->monitor) {
        return mMonitorDispatchingTimeout;
    }
    const sp<WindowInfoHandle> window =
            getWindowHandleLocked(connection->inputChannel->getConnectionToken());
    if (window != nullptr) {
        return window->getDispatchingTimeout(DEFAULT_INPUT_DISPATCHING_TIMEOUT);
    }
    return DEFAULT_INPUT_DISPATCHING_TIMEOUT;
}

DEFAULT_INPUT_DISPATCHING_TIMEOUT默认值是5s，但是具体的实际值要看getDispatchingTimeout函数结果。

WindowProcessController.java
    public long getInputDispatchingTimeoutMillis() {
        synchronized (mAtm.mGlobalLock) {
            return isInstrumenting() || isUsingWrapper()
                    ? INSTRUMENTATION_KEY_DISPATCHING_TIMEOUT_MILLIS :
                    DEFAULT_DISPATCHING_TIMEOUT_MILLIS;
        }
    }

可以看到超时时间可以通过函数动态改变。

事件

KeyEvent和MotionEvent组成了我们的input事件，system_server中的inputdispatcher线程依次调用 dispatchOnce() -> mLooper->pollOnce进入等待事件到来。事件是从inputflinger进程中发送过来的。
startDispatchCycleLocked中调用APP connection->inputPublisher，将事件发送到force window对应的app中，由app消费。
dispatchOnce() ->dispatchOnceInnerLocked 分发事件，事件分发完成后processAnrsLocked检查是否有事件已经超时了。
我们考虑一种极限case，第一次key事件发生；第二次事件永远没有发生，还会产生ANR问题吗？答案是会的；因为第一次Key事件push到APP之后，将设置一个下次唤醒时间，epoll_wait会在时间点返回，执行loop中的msg，如果没有event，检查是否有msg超时。

nextWakeupTime = std::min(nextWakeupTime, nextAnrCheck);

结语

需要定位解决此类ANR问题；第一：不能在key事件dispatcher的调用链中添加耗时函数。第二：在key事件发生过程中，主线程要相对空闲。
问题分类：

自身msg处理过程中耗时，有网络，io，binder等不确定何时执行结束的场景。
key 事件之前有大量异常msg，msg本身不耗时，但是数量太多，导致最后发生问题。
key事件之前有msg执行耗时，但AMS机制触发是，却又空闲，代码停到nativePollOnece。
这几类的问题是ANR产生的最多的场景。现在回头来看，ANR问题其实也没难么难解。

码龙1234

关注

4
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
稳定性问题ANR-input

Android app的input事件都是有主线程消费的，假设主线程有耗时函数执行，就会产生ANR问题；但是主线程就不能执行耗时函数吗？答案是能；同理解释了一下主线程中的耗时函数只要不碰到ANR问题的检测诱因，就不会产生ANR；假设一个APP不接受任何广播，不处理任何input事件（如后台程序），并且已经启动就绪之后，这个时候主线程执行耗时操作永远不会产生ANR。同理可知，广播和input事件也是ANR问题的一个检测点。
复制链接

扫一扫

专栏目录