Android Input系统8 ANR原理分析

本文详细介绍了Android系统中ANR(Application Not Responding)的触发原理,特别是与Input事件处理相关的ANR。从InputReader、InputDispatcher和UI线程的角度梳理了输入事件的处理流程,分析了ANR如何在5秒超时时限内检测到输入事件的分发延迟,并触发相应处理。同时,文章还讨论了系统的输入死锁监测机制,以及ANR的监控和报告过程。
摘要由CSDN通过智能技术生成

一 概述

当 input 事件处理得慢就会触发 ANR,那 ANR 内部原理是什么,哪些场景会产生 ANR? “工欲善其事必先利其器”,为了理解 input ANR 原理,前面几篇文章疏通了整个 input 框架的处理流程,都是为了这篇文章而做铺垫。在正式开始分析 ANR 触发原理以及触发场景之前,先来回顾一下 input 流程。

1.1 InputReader

在这里插入图片描述
InputReader 的主要工作分两部分:

1.调用 EventHub 的 getEvents() 读取节点 /dev/input/eventX 下的输入事件,并把表示原始事件的 input_event 结构体转换成 RawEvent 结构体,RawEvent 根据不同 InputMapper 来转换成相应的 EventEntry,比如按键事件则对应 KeyEntry,触摸事件则对应 MotionEntry。

  • 转换结果:input_event -> EventEntry

2.将事件添加到 InputDispatcher 的 mInboundQueue 队列尾部,加入该队列前有以下两个过滤:

  • IMS.interceptKeyBeforeQueueing:事件分发前可增加业务逻辑
  • IMS.filterInputEvent:可拦截事件,当返回值为 false 的事件都直接拦截,没有机会加入 mInboundQueue 队列,不会再往下分发;否则进入下一步
  • enqueueInboundEventLocked:执行输入事件放入 mInboundQueue 队列尾部
  • mLooper->wake:并根据情况来唤醒 InputDispatcher 线程

3.KeyboardInputMapper.processKey() 的过程,记录下按下 down 事件的时间点

1.2 InputDispatcher

在这里插入图片描述

1.dispatchOnceInnerLocked():从 InputDispatcher 的 mInboundQueue 队列,取出事件 EventEntry。另外该方法开始执行的时间点 (currentTime) 便是后续事件 dispatchEntry 的分发时间 (deliveryTime)

2.dispatchKeyLocked():满足一定条件时会添加命令 doInterceptKeyBeforeDispatchingLockedInterruptible

3.enqueueDispatchEntryLocked():生成事件 DispatchEntry 并加入 connection 的 outbound 队列

4.startDispatchCycleLocked():从 outboundQueue 中取出事件 DispatchEntry,重新放入 connection 的 waitQueue 队列

5.runCommandsLockedInterruptible():通过循环遍历方式,依次处理 mCommandQueue 队列中的所有命令。而 mCommandQueue 队列中的命令是通过 postCommandLocked() 方式向该队列添加的。ANR 回调命令便是在这个时机执行

6.handleTargetsNotReadyLocked():该过程会判断是否等待超过 5s 来决定是否调用 onANRLocked()

流程15中 sendMessage 是将 input 事件分发到 app 端,当 app 处理完该事件后会发送 finishInputEvent() 事件。接下来又回到 pollOnce() 方法。

1.3 UI Thread

在这里插入图片描述

  • InputDispatcher 线程监听 socket 服务端,收到消息后回调 InputDispatcher.handleReceiveCallback()
  • UI 主线程监听 socket 客户端,收到消息后回调 NativeInputEventReceiver.handleEvent()

对于 ANR 的触发主要是在 InputDispatcher 过程,下面再从 ANR 的角度来说一说 ANR 触发过程。

二 ANR处理流程

ANR 时间区间便是指当前这次的事件 dispatch 过程中执行 findFocusedWindowTargetsLocked() 方法到下一次执行 resetANRTimeoutsLocked() 的时间区间。以下 5 个函数会 reset。都位于 InputDispatcher.cpp 文件中:

  • dispatchOnceInnerLocked
  • setInputDispatchMode
  • setFocusedApplication
  • releasePendingEventLocked
  • resetAndDropEverythingLocked

简单来说,主要是以下 4 个场景,会有机会执行 resetANRTimeoutsLocked:

  • 解冻屏幕,系统开/关机的时刻点 (thawInputDispatchingLw,setEventDispatchingLw,最后调用 setInputDispatchMode)
  • wms 聚焦 app 的改变 (WMS.setFocusedApp,IMS.setFocusedApplication,setFocusedApplication)
  • 设置 input filter 的过程 (IMS.setInputFilter,进而调用 resetAndDropEverythingLocked)
  • 再次分发事件的过程 (dispatchOnceInnerLocked)
  • dispatch 结束的时候 (dispatchOnceInnerLocked 最后 done 为 true,最终调用 releasePendingEventLocked)

当 InputDispatcher 线程,执行 findFocusedWindowTargetsLocked() 过程调用到 handleTargetsNotReadyLocked,且满足超时 5s 的情况则会调用 onANRLocked()。

2.1 onANRLocked

void InputDispatcher::onANRLocked(nsecs_t currentTime,
    const sp<InputApplicationHandle>& applicationHandle,
    const sp<InputWindowHandle>& windowHandle,
    nsecs_t eventTime, nsecs_t waitStartTime, const char* reason) {
    
    float dispatchLatency = (currentTime - eventTime) * 0.000001f;
    float waitDuration = (currentTime - waitStartTime) * 0.000001f;

    ALOGI("Application is not responding: %s. "
"It has been %0.1fms since event, %0.1fms since wait started. Reason: %s",
getApplicationWindowLabelLocked(applicationHandle, windowHandle).string(),
dispatchLatency, waitDuration, reason);

    // 捕获 ANR 的现场信息
    time_t t = time(NULL);
    struct tm tm;
    localtime_r(&t, &tm);
    char timestr[64];
    strftime(timestr, sizeof(timestr), "%F %T", &tm);
    mLastANRState.clear();
    mLastANRState.append(INDENT "ANR:\n");
    mLastANRState.appendFormat(INDENT2 "Time: %s\n", timestr);
    mLastANRState.appendFormat(INDENT2 "Window: %s\n",
    getApplicationWindowLabelLocked(applicationHandle, windowHandle).string());
    mLastANRState.appendFormat(INDENT2 "DispatchLatency: %0.1fms\n", dispatchLatency);
    mLastANRState.appendFormat(INDENT2 "WaitDuration: %0.1fms\n", waitDuration);
    mLastANRState.appendFormat(INDENT2 "Reason: %s\n", reason);
    dumpDispatchStateLocked(mLastANRState);

    // 将 ANR 命令加入 mCommandQueue
    CommandEntry* commandEntry = postCommandLocked(
            & InputDispatcher::doNotifyANRLockedInterruptible);
    commandEntry->inputApplicationHandle = applicationHandle;
    commandEntry->inputWindowHandle = windowHandle;
    commandEntry->reason = reason;
}

onANRLocked() 中会对 ANR 信息进行收集,然后构建一个回调函数为 doNotifyANRLockedInterruptible 的 CommandEntry ,并加入 mCommandQueue 队列。

这样,当循环执行到下一轮 InputDispatcher.dispatchOnce 的过程中,会先执行 runCommandsLockedInterruptible() 方法,取出 mCommandQueue 队列的所有命令逐一执行。那么就会执行 ANR 所对应的函数 doNotifyANRLockedInterruptible

2.2 doNotifyANRLockedInterruptible

InputDispatcher.cpp

void InputDispatcher::doNotifyANRLockedInterruptible(
        CommandEntry* commandEntry) {
    mLock.unlock();
    
    nsecs_t newTimeout = mPolicy->notifyANR(
        commandEntry->inputApplicationHandle, commandEntry->inputWindowHandle,
        commandEntry->reason);

    mLock.lock();
    // newTimeout = 5s
    resumeAfterTargetsNotReadyTimeoutLocked(newTimeout,
            commandEntry->inputWindowHandle != NULL
            ? commandEntry->inputWindowHandle->getInputChannel() : NULL);
}

我们已经知道这里的 mPolicy,就是 NativeInputManager。

2.3 NativeInputManager.notifyANR

com_android_server_input_InputManagerService.cpp

nsecs_t NativeInputManager::notifyANR(
    const sp<InputApplicationHandle>& inputApplicationHandle,
    const sp<InputWindowHandle>& inputWindowHandle, const String8& reason) {
    ......
    JNIEnv* env = jniEnv();
    ScopedLocalFrame localFrame(env);

    jobject tokenObj = javaObjectForIBinder(env, token);
    jstring reasonObj = env->NewStringUTF(reason.c_str());

    // 调用 Java 方法
    jlong newTimeout = env->CallLongMethod(mServiceObj,
                gServiceClassInfo.notifyANR, tokenObj,
                reasonObj);
    if (checkAndClearExceptionFromCallback(env, "notifyANR")) {
        newTimeout = 0; // 抛出异常,则清理并重置 timeout
    } else {
        assert(newTimeout >= 0);
    }
    return newTimeout;
}

先看看 register_android_server_InputManager 过程:

int register_android_server_InputManager(JNIEnv* env) {
    int res = jniRegisterNativeMethods(env,
    "com/android/server/input/InputManagerService",
    gInputManagerMethods, NELEM(gInputManagerMethods));

    jclass clazz;
    FIND_CLASS(clazz, "com/android/server/input/InputManagerService");
    ......
    GET_METHOD_ID(gServiceClassInfo.notifyANR, clazz,
            "notifyANR",
            "(Landroid/os/IBinder;Ljava/lang/String;)J");
    ......
}

可知 gServiceClassInfo.notifyANR 是指 IMS.notifyANR

2.4 IMS.notifyANR

private long notifyANR(IBinder token, String reason) {
    return mWindowManagerCallbacks.notifyANR(
            token, reason);
}

此处 mWindowManagerCallbacks 是指 InputManagerCallback 对象。

2.5 InputManagerCallback.notifyANR

InputManagerCallback.java

public long notifyANR(IBinder token, String reason) {
    AppWindowToken appWindowToken = null;
    WindowState windowState = null;
    boolean aboveSystem = false;
    synchronized (mService.mGlobalLock) {
        if (token != null) {
                windowState = mService.windowForClientLocked(null, token, false);
                if (windowState != null) {
                    appWindowToken = windowState.mAppToken;
                }
        }
        // 输出 input 事件分发超时 log
        if (windowState != null) {
                Slog.i(TAG_WM, "Input event dispatching timed out "
                        + "sending to " + windowState.mAttrs.getTitle()
                        + ".  Reason: " + reason);
                // Figure out whether this window is layered above system windows.
                // We need to do this here to help the activity manager know how to
                // layer its ANR dialog.
                int systemAlertLayer = 
                mService.mPolicy.getWindowLayerFromTypeLw(
                TYPE_APPLICATION_OVERLAY,
                windowState.mOwnerCanAddInternalSystemWindow);
                aboveSystem = windowState.mBaseLayer > systemAlertLayer;
            } else if (appWindowToken != null) {
                Slog.i(TAG_WM, "Input event dispatching timed out "
                        + "sending to application " + appWindowToken.stringName
                        + ".  Reason: " + reason);
            } else {
                Slog.i(TAG_WM, "Input event dispatching timed out "
                        + ".  Reason: " + reason);
            }
        mService.saveANRStateLocked(appWindowToken, windowState, reason);
    }

    // All the calls below need to happen without the WM
    // lock held since they call into AM.
    mService.mAtmInternal.saveANRState(reason);
        
    if (appWindowToken != null && appWindowToken.appToken != null) {
        final boolean abort = appWindowToken.keyDispatchingTimedOut(reason,
                (windowState != null) ? windowState.mSession.mPid : -1);
        if (! abort) {
            return appWindowToken.inputDispatchingTimeoutNanos; //5s
        }
    } else if (windowState != null) {
        long timeout = mService.mAmInternal.inputDispatchingTimedOut(
                windowState.mSession.mPid, aboveSystem, reason);
        if (timeout >= 0) {
            return timeout * 1000000L; //5s
        }
    }
    return 0;
}

AppWindowToken.java
boolean keyDispatchingTimedOut(String reason, int windowPid) {
        return mActivityRecord != null &&
        mActivityRecord.keyDispatchingTimedOut(reason, windowPid);
    }

发生 input 相关的 ANR 时在 system log 输出 ANR 信息,并且 tag 为 WindowManager。主要有 3 类 log:

  • Input event dispatching timed out sending to [windowState.mAttrs.getTitle()]
  • Input event dispatching timed out sending to application [appWindowToken.stringName)]
  • Input event dispatching timed out sending

2.6 DispatchingTimedOut

2.6.1 ActivityRecord.keyDispatchingTimedOut

final class ActivityRecord extends ConfigurationContainer {
    ......
    public boolean keyDispatchingTimedOut(String reason, int windowPid) {
        ActivityRecord anrActivity;
        WindowProcessController anrApp;
        boolean windowFromSameProcessAsActivity;
        synchronized (mAtmService.mGlobalLock) {
            anrActivity = getWaitingHistoryRecordLocked();
            anrApp = app;
            windowFromSameProcessAsActivity =
                    !hasProcess() || app.getPid() == windowPid || windowPid == -1;
        }

        if (windowFromSameProcessAsActivity) {
            return mAtmService.mAmInternal.inputDispatchingTimedOut(
            anrApp.mOwner, anrActivity.shortComponentName,
            anrActivity.appInfo, shortComponentName, app, false, reason);
        } else {
            // In this case another process added windows using
            // this activity token. So, we call the
            // generic service input dispatch timed out
            // method so that the right process is blamed.
            return mAtmService.mAmInternal.inputDispatchingTimedOut(
                    windowPid, false /* aboveSystem */, reason) < 0;
        }
    }
}

### 2.6.2 AMS.inputDispatchingTimedOut

long inputDispatchingTimedOut(int pid, final boolean aboveSystem,
    String reason) {
        if (checkCallingPermission(FILTER_EVENTS) !=
            PackageManager.PERMISSION_GRANTED) {
 throw new SecurityException("Requires permission " + FILTER_EVENTS);
        }
        ProcessRecord proc;
        long timeout;
        synchronized (this) {
            synchronized (mPidsSelfLocked) {
                proc = mPidsSelfLocked.get(pid);// 根据 pid 查看进程 record
            }
            // 超时为 KEY_DISPATCHING_TIMEOUT,即 timeout = 5s
            timeout = proc != null ?
            proc.getInputDispatchingTimeout() : KEY_DISPATCHING_TIMEOUT_MS;
        }

        if (inputDispatchingTimedOut(proc, null, null, null,
            null, aboveSystem, reason)) {
            return -1;
        }
        return timeout;
}


boolean inputDispatchingTimedOut(ProcessRecord proc,
    String activityShortComponentName, ApplicationInfo aInfo,
    String parentShortComponentName, WindowProcessController parentProcess,
    boolean aboveSystem, String reason) {
        if (checkCallingPermission(FILTER_EVENTS) !=
            PackageManager.PERMISSION_GRANTED) {
            throw new SecurityException("Requires permission " + FILTER_EVENTS);
        }

        final String annotation;
        if (reason == null) {
            annotation = "Input dispatching timed out";
        } else {
            annotation = "Input dispatching timed out (" + reason + ")";
        }

        if (proc != null) {
            synchronized (this) {
                if (proc.isDebugging()) {
                    return false;
                }

                if (proc.getActiveInstrumentation() != null) {
                    Bundle info = new Bundle();
                    info.putString("shortMsg", "keyDispatchingTimedOut");
                    info.putString("longMsg", annotation);
                    finishInstrumentationLocked(
                    proc, Activity.RESULT_CANCELED, info);
                    return true;
                }
            }
            proc.appNotResponding(activityShortComponentName, aInfo,
                    parentShortComponentName, parentProcess,
                    aboveSystem, annotation);
        }
        return true;
}

appNotResponding 会输出现场的重要进程的 trace 等信息。 再回到【小节2.2】处理完 ANR 后再调用 resumeAfterTargetsNotReadyTimeoutLocked。

2.7 resumeAfterTargetsNotReadyTimeoutLocked

InputDispatcher.cpp

void InputDispatcher::resumeAfterTargetsNotReadyTimeoutLocked(
    nsecs_t newTimeout, const sp<InputChannel>& inputChannel) {
    if (newTimeout > 0) {
        // 超时时间增加 5s
        mInputTargetWaitTimeoutTime = now() + newTimeout;
    } else {
        // Give up.
        mInputTargetWaitTimeoutExpired = true;

        // Input state will not be realistic.  Mark it out of sync.
        if (inputChannel.get()) {
            ssize_t connectionIndex =
            getConnectionIndexLocked(inputChannel);
            if (connectionIndex >= 0) {
                sp<Connection> connection =
                mConnectionsByFd.valueAt(connectionIndex);
                sp<IBinder> token = connection->inputChannel->getToken();

                if (token != nullptr) {
                    removeWindowByTokenLocked(token);
                }

                if (connection->status == Connection::STATUS_NORMAL) {
                    CancelationOptions options(
                    CancelationOptions::CANCEL_ALL_EVENTS,
                    "application not responding");
         synthesizeCancelationEventsForConnectionLocked(connection, options);
                }
            }
        }
    }
}

三 input 死锁监测机制

3.1 IMS.start

InputManagerService.java

public void start() {
    ......
    Watchdog.getInstance().addMonitor(this);
    ......
}

InputManagerService 实现了 Watchdog.Monitor 接口,并且在启动过程将自己加入到了 Watchdog 线程的 monitor 队列

3.2 IMS.monitor

Watchdog 便会定时调用 IMS.monitor() 方法

    @Override
    public void monitor() {
        synchronized (mInputFilterLock) { }
        nativeMonitor(mPtr);
    }

nativeMonitor 经过 JNI 调用,进入如下方法:

static void nativeMonitor(JNIEnv* /* env */, jclass /* clazz */, jlong ptr) {
    NativeInputManager* im = reinterpret_cast<NativeInputManager*>(ptr);

    im->getInputManager()->getReader()->monitor();
    im->getInputManager()->getDispatcher()->monitor();
}

3.3 InputReader::monitor

InputReader.cpp

void InputReader::monitor() {
    // 请求和释放一次 mLock,来确保 reader 没有发生死锁的问题
    mLock.lock();
    mEventHub->wake();
    mReaderIsAliveCondition.wait(mLock);
    mLock.unlock();

    // 监测 EventHub
    mEventHub->monitor();
}

获取 mLock 之后,进入 Condition 类型的 wait() 方法,等待 InputReader 线程的 loopOnce() 中的 broadcast() 来唤醒。

void InputReader::loopOnce() {
    size_t count = mEventHub->getEvents(timeoutMillis, mEventBuffer, EVENT_BUFFER_SIZE);
    ......
    {
        AutoMutex _l(mLock);
        mReaderIsAliveCondition.broadcast();
        if (count) {
            processEventsLocked(mEventBuffer, count);
        }
    }
    ......
    mQueuedListener->flush();
}

3.3.1 EventHub::monitor

EventHub.cpp

void EventHub::monitor() {
    // 请求和释放一次 mLock,来确保 reader 没有发生死锁的问题
    mLock.lock();
    mLock.unlock();
}

3.4 InputDispatcher::monitor

InputDispatcher.cpp

void InputDispatcher::monitor() {
    std::unique_lock _l(mLock);
    mLooper->wake();
    mDispatcherIsAliveCondition.wait(_l);
}

获取 mLock 之后,进入 Condition 类型的 wait() 方法,等待 InputDispatcher 线程的 loopOnce() 中的 broadcast() 来唤醒。

void InputDispatcher::dispatchOnce() {
    nsecs_t nextWakeupTime = LONG_LONG_MAX;
    {
        std::scoped_lock _l(mLock);
        mDispatcherIsAlive.notify_all();
        if (!haveCommandsLocked()) {
            dispatchOnceInnerLocked(&nextWakeupTime);
        }
        if (runCommandsLockedInterruptible()) {
            nextWakeupTime = LONG_LONG_MIN;
        }
    }

    nsecs_t currentTime = now();
    int timeoutMillis = toMillisecondTimeoutDelay(currentTime, nextWakeupTime);
    mLooper->pollOnce(timeoutMillis); // 进入 epoll_wait
}

3.5 小结

通过将 InputManagerService 加入到 Watchdog 的 monitor 队列,定时监测是否发生死锁。

整个监测涉及 EventHub,InputReader,InputDispatcher,InputManagerService 的死锁监测。监测的原理很简单,通过尝试获取锁并释放锁的方式。

最后,可通过 adb shell dumpsys input 来查看 Android 系统当前的 input 状态,输出内容分别为 EventHub.dump(),InputReader.dump(),InputDispatcher.dump() 这 3 类,另外如果发生过 input ANR,那么也会输出上一个 ANR 的状态。

其中 mPendingEvent 代表当下正在处理的输入事件。

四 总结

4.1 ANR分类

由小节 #2.5 InputManagerCallback.notifyANR 完成,当发生 ANR 时 system log 中会出现以下信息,并且 TAG = WindowManager:

Input event dispatching timed out xxx. Reason: + reason,其中 xxx 取值:

  • 窗口类型:sending to windowState.mAttrs.getTitle()
  • 应用类型:sending to application appWindowToken.stringName
  • 其他类型:则为空

至于 Reason 主要有以下类型:

4.1.1 reason类型

checkWindowReadyForMoreInputLocked 完成, ANR reason 主要有以下几类:

  • 无窗口,有应用:Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up
  • 窗口暂停:Waiting because the [targetType] window is paused
  • 窗口未连接:Waiting because the [targetType] window’s input channel is not registered with the input dispatcher。The window may be in the process of being removed
  • 窗口连接已死亡:Waiting because the [targetType] window’s input connection is [Connection.Status]。The window may be in the process of being removed
  • 窗口连接已满:Waiting because the [targetType] window’s input channel is full。Outbound queue length:[outboundQueue长度]。Wait queue length:[waitQueue长度]
  • 按键事件,输出队列或事件等待队列不为空:Waiting to send key event because the [targetType] window has not finished processing all of the input events that were previously delivered to it。Outbound queue length:[outboundQueue长度]。Wait queue length:[waitQueue长度]
  • 非按键事件,事件等待队列不为空且头事件分发超时500ms:Waiting to send non-key event because the [targetType] window has not finished processing certain input events that were delivered to it over 500ms ago。Wait queue length:[waitQueue长度]。Wait queue head age:[等待时长]

其中

  • targetType:取值为 ”focused” 或者 ”touched”
  • Connection.Status:取值为 ”NORMAL”,”BROKEN”,”ZOMBIE”

另外,findFocusedWindowTargetsLocked,findTouchedWindowTargetsLocked 这两个方法中可以通过实现 updateDispatchStatistics() 来分析 anr 问题。

4.2 drop事件分类

由 dropInboundEventLocked 完成,输出事件丢弃的原因:

  • DROP_REASON_POLICY:“inbound event was dropped because the policy consumed it”;
  • DROP_REASON_DISABLED:“inbound event was dropped because input dispatch is disabled”
  • DROP_REASON_APP_SWITCH:“inbound event was dropped because of pending overdue app switch”
  • DROP_REASON_BLOCKED:“inbound event was dropped because the current application is not responding and the user has started interacting with a different application”
  • DROP_REASON_STALE:“inbound event was dropped because it is stale”

其他:

  • doDispatchCycleFinishedLockedInterruptible 的过程,会记录分发时间超过 2s 的事件
  • findFocusedWindowTargetsLocked 的过程,可以统计等待时长信息
  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值