ANR 之原理分析

最新推荐文章于 2024-07-27 20:57:44 发布

程序员Android

最新推荐文章于 2024-07-27 20:57:44 发布

阅读量885

点赞数

文章标签：队列网络 java android 并发编程

和你一起终身学习，这里是程序员Android

经典好文推荐，通过阅读本文，您将收获以下知识点:

一　概述

当 input 事件处理得慢就会触发 ANR，那 ANR 内部原理是什么，哪些场景会产生 ANR 呢。“工欲善其事必先利其器”，为了理解 input ANR 原理，前面几篇文章疏通了整个 input 框架的处理流程，都是为了这篇文章而做铺垫。在正式开始分析 ANR 触发原理以及触发场景之前，先来回顾一下 input 流程。

1.1 InputReader

InputReader 的主要工作分两部分：

１．调用 EventHub 的 getEvents() 读取节点 /dev/input 的 input_event 结构体转换成 RawEvent 结构体，RawEvent 根据不同 InputMapper 来转换成相应的 EventEntry，比如按键事件则对应 KeyEntry，触摸事件则对应 MotionEntry。

转换结果：input_event -> EventEntry

２．将事件添加到 mInboundQueue 队列尾部，加入该队列前有以下两个过滤：

IMS.interceptKeyBeforeQueueing：事件分发前可增加业务逻辑
IMS.filterInputEvent：可拦截事件，当返回值为 false 的事件都直接拦截，没有机会加入 mInboundQueue 队列，不会再往下分发；否则进入下一步
enqueueInboundEventLocked：该事件放入 mInboundQueue 队列尾部
mLooper->wake：并根据情况来唤醒 InputDispatcher 线程

３．KeyboardInputMapper.processKey() 的过程，记录下按下 down 事件的时间点

1.2 InputDispatcher

１．dispatchOnceInnerLocked()：从 InputDispatcher 的 mInboundQueue 队列，取出事件 EventEntry。另外该方法开始执行的时间点 (currentTime) 便是后续事件 dispatchEntry 的分发时间 (deliveryTime）
２．dispatchKeyLocked()：满足一定条件时会添加命令 doInterceptKeyBeforeDispatchingLockedInterruptible
３．enqueueDispatchEntryLocked()：生成事件 DispatchEntry 并加入 connection 的 outbound 队列
４．startDispatchCycleLocked()：从 outboundQueue 中取出事件 DispatchEntry，重新放入 connection 的 waitQueue 队列
５．runCommandsLockedInterruptible()：通过循环遍历方式，依次处理 mCommandQueue 队列中的所有命令。而 mCommandQueue 队列中的命令是通过 postCommandLocked() 方式向该队列添加的。ANR 回调命令便是在这个时机执行
６．handleTargetsNotReadyLocked()：该过程会判断是否等待超过 5s 来决定是否调用 onANRLocked()

流程15中 sendMessage 是将 input 事件分发到 app 端，当 app 处理完该事件后会发送 finishInputEvent() 事件。接下来又回到 pollOnce() 方法。

1.3 UI Thread

InputDispatcher 线程监听 socket 服务端，收到消息后回调 InputDispatcher.handleReceiveCallback()
UI 主线程监听 socket 客户端，收到消息后回调 NativeInputEventReceiver.handleEvent()

对于 ANR 的触发主要是在 InputDispatcher 过程，下面再从 ANR 的角度来说一说 ANR 触发过程。

二　ANR处理流程

ANR 时间区间便是指当前这次的事件 dispatch 过程中执行 findFocusedWindowTargetsLocked() 方法到下一次执行 resetANRTimeoutsLocked() 的时间区间。以下5个时机会 reset。都位于 InputDispatcher.cpp 文件：

resetAndDropEverythingLocked
releasePendingEventLocked
setFocusedApplication
dispatchOnceInnerLocked
setInputDispatchMode

简单来说，主要是以下4个场景，会有机会执行 resetANRTimeoutsLocked：

解冻屏幕，系统开/关机的时刻点 (thawInputDispatchingLw，setEventDispatchingLw)
wms 聚焦 app 的改变 (WMS.setFocusedApp，WMS.removeAppToken)
设置 input filter 的过程 (IMS.setInputFilter)
再次分发事件的过程 (dispatchOnceInnerLocked)

当 InputDispatcher 线程 findFocusedWindowTargetsLocked() 过程调用到 handleTargetsNotReadyLocked，且满足超时 5s 的情况则会调用 onANRLocked()。

2.1 onANRLocked

void InputDispatcher::onANRLocked(nsecs_t currentTime,
    const sp<InputApplicationHandle>& applicationHandle,
    const sp<InputWindowHandle>& windowHandle,
    nsecs_t eventTime, nsecs_t waitStartTime, const char* reason) {
    
    float dispatchLatency = (currentTime - eventTime) * 0.000001f;
    float waitDuration = (currentTime - waitStartTime) * 0.000001f;

    ALOGI("Application is not responding: %s. "
"It has been %0.1fms since event, %0.1fms since wait started. Reason: %s",
getApplicationWindowLabelLocked(applicationHandle, windowHandle).string(),
dispatchLatency, waitDuration, reason);

    //捕获ANR的现场信息
    time_t t = time(NULL);
    struct tm tm;
    localtime_r(&t, &tm);
    char timestr[64];
    strftime(timestr, sizeof(timestr), "%F %T", &tm);
    mLastANRState.clear();
    mLastANRState.append(INDENT "ANR:\n");
    mLastANRState.appendFormat(INDENT2 "Time: %s\n", timestr);
    mLastANRState.appendFormat(INDENT2 "Window: %s\n",
    getApplicationWindowLabelLocked(applicationHandle, windowHandle).string());
    mLastANRState.appendFormat(INDENT2 "DispatchLatency: %0.1fms\n", dispatchLatency);
    mLastANRState.appendFormat(INDENT2 "WaitDuration: %0.1fms\n", waitDuration);
    mLastANRState.appendFormat(INDENT2 "Reason: %s\n", reason);
    dumpDispatchStateLocked(mLastANRState);

    //将ANR命令加入mCommandQueue
    CommandEntry* commandEntry = postCommandLocked(
            & InputDispatcher::doNotifyANRLockedInterruptible);
    commandEntry->inputApplicationHandle = applicationHandle;
    commandEntry->inputWindowHandle = windowHandle;
    commandEntry->reason = reason;
}

发生 ANR 调用 onANRLocked() 的过程会将 doNotifyANRLockedInterruptible 加入 mCommandQueue。在下一轮 InputDispatcher.dispatchOnce 的过程中会先执行 runCommandsLockedInterruptible() 方法，取出 mCommandQueue 队列的所有命令逐一执行。那么 ANR 所对应的命令 doNotifyANRLockedInterruptible，接下来看该方法。

3.2 doNotifyANRLockedInterruptible

InputDispatcher.cpp

void InputDispatcher::doNotifyANRLockedInterruptible(
        CommandEntry* commandEntry) {
    mLock.unlock();
    
    nsecs_t newTimeout = mPolicy->notifyANR(
        commandEntry->inputApplicationHandle, commandEntry->inputWindowHandle,
        commandEntry->reason);

    mLock.lock();
    //newTimeout =5s
    resumeAfterTargetsNotReadyTimeoutLocked(newTimeout,
            commandEntry->inputWindowHandle != NULL
            ? commandEntry->inputWindowHandle->getInputChannel() : NULL);
}

mPolicy 是指 NativeInputManager

3.3 NativeInputManager.notifyANR

com_android_server_input_InputManagerService.cpp

nsecs_t NativeInputManager::notifyANR(
    const sp<InputApplicationHandle>& inputApplicationHandle,
    const sp<InputWindowHandle>& inputWindowHandle, const String8& reason) {
    ......
    JNIEnv* env = jniEnv();
    ScopedLocalFrame localFrame(env);

    jobject tokenObj = javaObjectForIBinder(env, token);
    jstring reasonObj = env->NewStringUTF(reason.c_str());

    //调用Java方法
    jlong newTimeout = env->CallLongMethod(mServiceObj,
                gServiceClassInfo.notifyANR, tokenObj,
                reasonObj);
    if (checkAndClearExceptionFromCallback(env, "notifyANR")) {
        newTimeout = 0; //抛出异常,则清理并重置timeout
    } else {
        assert(newTimeout >= 0);
    }
    return newTimeout;
}

先看看 register_android_server_InputManager 过程：

int register_android_server_InputManager(JNIEnv* env) {
    int res = jniRegisterNativeMethods(env,
    "com/android/server/input/InputManagerService",
    gInputManagerMethods, NELEM(gInputManagerMethods));

    jclass clazz;
    FIND_CLASS(clazz, "com/android/server/input/InputManagerService");
    ......
    GET_METHOD_ID(gServiceClassInfo.notifyANR, clazz,
            "notifyANR",
            "(Landroid/os/IBinder;Ljava/lang/String;)J");
    ......
}

可知 gServiceClassInfo.notifyANR 是指 IMS.notifyANR

3.4 IMS.notifyANR

private long notifyANR(IBinder token, String reason) {
    return mWindowManagerCallbacks.notifyANR(
            token, reason);
}

此处 mWindowManagerCallbacks 是指 InputManagerCallback 对象。

3.5 InputManagerCallback.notifyANR

InputManagerCallback.java

public long notifyANR(IBinder token, String reason) {
    AppWindowToken appWindowToken = null;
    WindowState windowState = null;
    boolean aboveSystem = false;
    synchronized (mService.mGlobalLock) {
        if (token != null) {
                windowState = mService.windowForClientLocked(null, token, false);
                if (windowState != null) {
                    appWindowToken = windowState.mAppToken;
                }
        }
        //输出input事件分发超时log
        if (windowState != null) {
                Slog.i(TAG_WM, "Input event dispatching timed out "
                        + "sending to " + windowState.mAttrs.getTitle()
                        + ".  Reason: " + reason);
                // Figure out whether this window is layered above system windows.
                // We need to do this here to help the activity manager know how to
                // layer its ANR dialog.
                int systemAlertLayer = 
                mService.mPolicy.getWindowLayerFromTypeLw(
                TYPE_APPLICATION_OVERLAY,
                windowState.mOwnerCanAddInternalSystemWindow);
                aboveSystem = windowState.mBaseLayer > systemAlertLayer;
            } else if (appWindowToken != null) {
                Slog.i(TAG_WM, "Input event dispatching timed out "
                        + "sending to application " + appWindowToken.stringName
                        + ".  Reason: " + reason);
            } else {
                Slog.i(TAG_WM, "Input event dispatching timed out "
                        + ".  Reason: " + reason);
            }
        mService.saveANRStateLocked(appWindowToken, windowState, reason);
    }

    // All the calls below need to happen without the WM
    // lock held since they call into AM.
    mService.mAtmInternal.saveANRState(reason);
        
    if (appWindowToken != null && appWindowToken.appToken != null) {
        final boolean abort = appWindowToken.keyDispatchingTimedOut(reason,
                (windowState != null) ? windowState.mSession.mPid : -1);
        if (! abort) {
            return appWindowToken.inputDispatchingTimeoutNanos; //5s
        }
    } else if (windowState != null) {
        long timeout = mService.mAmInternal.inputDispatchingTimedOut(
                windowState.mSession.mPid, aboveSystem, reason);
        if (timeout >= 0) {
            return timeout * 1000000L; //5s
        }
    }
    return 0;
}

AppWindowToken.java
boolean keyDispatchingTimedOut(String reason, int windowPid) {
        return mActivityRecord != null &&
        mActivityRecord.keyDispatchingTimedOut(reason, windowPid);
    }

发生 input 相关的 ANR 时在 system log 输出 ANR 信息，并且 tag 为 WindowManager。主要有 3 类 log：

Input event dispatching timed out sending to [windowState.mAttrs.getTitle()]
Input event dispatching timed out sending to application [appWindowToken.stringName)]
Input event dispatching timed out sending

3.6 DispatchingTimedOut

3.6.1 ActivityRecord.keyDispatchingTimedOut

final class ActivityRecord extends ConfigurationContainer {
    ......

    public boolean keyDispatchingTimedOut(String reason, int windowPid) {
        ActivityRecord anrActivity;
        WindowProcessController anrApp;
        boolean windowFromSameProcessAsActivity;
        synchronized (mAtmService.mGlobalLock) {
            anrActivity = getWaitingHistoryRecordLocked();
            anrApp = app;
            windowFromSameProcessAsActivity =
                    !hasProcess() || app.getPid() == windowPid || windowPid == -1;
        }

        if (windowFromSameProcessAsActivity) {
            return mAtmService.mAmInternal.inputDispatchingTimedOut(
            anrApp.mOwner, anrActivity.shortComponentName,
            anrActivity.appInfo, shortComponentName, app, false, reason);
        } else {
            // In this case another process added windows using
            // this activity token. So, we call the
            // generic service input dispatch timed out
            // method so that the right process is blamed.
            return mAtmService.mAmInternal.inputDispatchingTimedOut(
                    windowPid, false /* aboveSystem */, reason) < 0;
        }
    }
}

### 3.6.2 AMS.keyDispatchingTimedOut

long inputDispatchingTimedOut(int pid, final boolean aboveSystem,
    String reason) {
        if (checkCallingPermission(FILTER_EVENTS) !=
            PackageManager.PERMISSION_GRANTED) {
 throw new SecurityException("Requires permission " + FILTER_EVENTS);
        }
        ProcessRecord proc;
        long timeout;
        synchronized (this) {
            synchronized (mPidsSelfLocked) {
                proc = mPidsSelfLocked.get(pid);//根据pid查看进程record
            }
            //超时为KEY_DISPATCHING_TIMEOUT，即timeout = 5s
            timeout = proc != null ?
            proc.getInputDispatchingTimeout() : KEY_DISPATCHING_TIMEOUT_MS;
        }

        if (inputDispatchingTimedOut(proc, null, null, null,
            null, aboveSystem, reason)) {
            return -1;
        }
        return timeout;
}


boolean inputDispatchingTimedOut(ProcessRecord proc,
    String activityShortComponentName, ApplicationInfo aInfo,
    String parentShortComponentName, WindowProcessController parentProcess,
    boolean aboveSystem, String reason) {
        if (checkCallingPermission(FILTER_EVENTS) !=
            PackageManager.PERMISSION_GRANTED) {
            throw new SecurityException("Requires permission " + FILTER_EVENTS);
        }

        final String annotation;
        if (reason == null) {
            annotation = "Input dispatching timed out";
        } else {
            annotation = "Input dispatching timed out (" + reason + ")";
        }

        if (proc != null) {
            synchronized (this) {
                if (proc.isDebugging()) {
                    return false;
                }

                if (proc.getActiveInstrumentation() != null) {
                    Bundle info = new Bundle();
                    info.putString("shortMsg", "keyDispatchingTimedOut");
                    info.putString("longMsg", annotation);
                    finishInstrumentationLocked(
                    proc, Activity.RESULT_CANCELED, info);
                    return true;
                }
            }
            proc.appNotResponding(activityShortComponentName, aInfo,
                    parentShortComponentName, parentProcess,
                    aboveSystem, annotation);
        }
        return true;
}

appNotResponding 会输出现场的重要进程的 trace 等信息。再回到【小节3.2】处理完 ANR 后再调用 resumeAfterTargetsNotReadyTimeoutLocked。

3.7 resumeAfterTargetsNotReadyTimeoutLocked

InputDispatcher.cpp

void InputDispatcher::resumeAfterTargetsNotReadyTimeoutLocked(
    nsecs_t newTimeout, const sp<InputChannel>& inputChannel) {
    if (newTimeout > 0) {
        //超时时间增加5s
        mInputTargetWaitTimeoutTime = now() + newTimeout;
    } else {
        // Give up.
        mInputTargetWaitTimeoutExpired = true;

        // Input state will not be realistic.  Mark it out of sync.
        if (inputChannel.get()) {
            ssize_t connectionIndex =
            getConnectionIndexLocked(inputChannel);
            if (connectionIndex >= 0) {
                sp<Connection> connection =
                mConnectionsByFd.valueAt(connectionIndex);
                sp<IBinder> token = connection->inputChannel->getToken();

                if (token != nullptr) {
                    removeWindowByTokenLocked(token);
                }

                if (connection->status == Connection::STATUS_NORMAL) {
                    CancelationOptions options(
                    CancelationOptions::CANCEL_ALL_EVENTS,
                    "application not responding");
         synthesizeCancelationEventsForConnectionLocked(connection, options);
                }
            }
        }
    }
}

四　input死锁监测机制

4.1 IMS.start

InputManagerService.java

public void start() {
    ......
    Watchdog.getInstance().addMonitor(this);
    ......
}

InputManagerService 实现了 Watchdog.Monitor 接口，并且在启动过程将自己加入到了 Watchdog 线程的 monitor 队列。

4.2 IMS.monitor

Watchdog 便会定时调用 IMS.monitor() 方法

    @Override
    public void monitor() {
        synchronized (mInputFilterLock) { }
        nativeMonitor(mPtr);
    }

nativeMonitor 经过 JNI 调用，进如如下方法：

static void nativeMonitor(JNIEnv* /* env */, jclass /* clazz */, jlong ptr) {
    NativeInputManager* im = reinterpret_cast<NativeInputManager*>(ptr);

    im->getInputManager()->getReader()->monitor();
    im->getInputManager()->getDispatcher()->monitor();
}

4.3 InputReader.monitor

InputReader.cpp

void InputReader::monitor() {
    //请求和释放一次mLock,来确保reader没有发生死锁的问题
    mLock.lock();
    mEventHub->wake();
    mReaderIsAliveCondition.wait(mLock);
    mLock.unlock();

    //监测EventHub
    mEventHub->monitor();
}

获取 mLock 之后进入 Condition 类型的 wait() 方法，等待 InputReader 线程的 loopOnce() 中的 broadcast() 来唤醒。

void InputReader::loopOnce() {
    size_t count = mEventHub->getEvents(timeoutMillis, mEventBuffer, EVENT_BUFFER_SIZE);
    ......
    {
        AutoMutex _l(mLock);
        mReaderIsAliveCondition.broadcast();
        if (count) {
            processEventsLocked(mEventBuffer, count);
        }
    }
    ......
    mQueuedListener->flush();
}

4.3.1 EventHub.monitor

EventHub.cpp

void EventHub::monitor() {
    //请求和释放一次mLock,来确保reader没有发生死锁的问题
    mLock.lock();
    mLock.unlock();
}

4.4 InputDispatcher

InputDispatcher.cpp

void InputDispatcher::monitor() {
    std::unique_lock _l(mLock);
    mLooper->wake();
    mDispatcherIsAliveCondition.wait(_l);
}

获取 mLock 之后进入 Condition 类型的 wait() 方法，等待 InputDispatcher 线程的 loopOnce() 中的 broadcast() 来唤醒。

void InputDispatcher::dispatchOnce() {
    nsecs_t nextWakeupTime = LONG_LONG_MAX;
    {
        std::scoped_lock _l(mLock);
        mDispatcherIsAlive.notify_all();
        if (!haveCommandsLocked()) {
            dispatchOnceInnerLocked(&nextWakeupTime);
        }
        if (runCommandsLockedInterruptible()) {
            nextWakeupTime = LONG_LONG_MIN;
        }
    }

    nsecs_t currentTime = now();
    int timeoutMillis = toMillisecondTimeoutDelay(currentTime, nextWakeupTime);
    mLooper->pollOnce(timeoutMillis); //进入epoll_wait
}

4.5 小节

通过将 InputManagerService 加入到 Watchdog 的 monitor 队列，定时监测是否发生死锁。整个监测过涉及 EventHub，InputReader，InputDispatcher，InputManagerService 的死锁监测。监测的原理很简单，通过尝试获取锁并释放锁的方式。

最后，可通过 adb shell dumpsys input 来查看手机当前的 input 状态，输出内容分别为 EventHub.dump()，InputReader.dump()，InputDispatcher.dump() 这3类，另外如果发生过 input ANR，那么也会输出上一个 ANR 的状态。

其中 mPendingEvent 代表当下正在处理的事件。

五　总结

5.1 ANR分类

由小节[3.5] InputManagerCallback.notifyANR 完成，当发生 ANR 时 system log 中会出现以下信息，并且 TAG = WindowManager：

Input event dispatching timed out xxx. Reason: + reason，其中 xxx 取值：

窗口类型：sending to windowState.mAttrs.getTitle()
应用类型：sending to application appWindowToken.stringName
其他类型：则为空

至于 Reason 主要有以下类型：

5.1.1 reason类型

由 checkWindowReadyForMoreInputLocked 完成， ANR reason 主要有以下几类：

无窗口，有应用：Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up
窗口暂停：Waiting because the [targetType] window is paused
窗口未连接：Waiting because the [targetType] window’s input channel is not registered with the input dispatcher。The window may be in the process of being removed
窗口连接已死亡：Waiting because the [targetType] window’s input connection is [Connection.Status]。The window may be in the process of being removed
窗口连接已满：Waiting because the [targetType] window’s input channel is full。Outbound queue length：[outboundQueue长度]。Wait queue length：[waitQueue长度]
按键事件，输出队列或事件等待队列不为空：Waiting to send key event because the [targetType] window has not finished processing all of the input events that were previously delivered to it。Outbound queue length：[outboundQueue长度]。Wait queue length：[waitQueue长度]
非按键事件，事件等待队列不为空且头事件分发超时500ms：Waiting to send non-key event because the [targetType] window has not finished processing certain input events that were delivered to it over 500ms ago。Wait queue length：[waitQueue长度]。Wait queue head age：[等待时长]

其中

targetType：取值为 ”focused” 或者 ”touched”
Connection.Status：取值为 ”NORMAL”，”BROKEN”，”ZOMBIE”

另外，findFocusedWindowTargetsLocked，findTouchedWindowTargetsLocked 这两个方法中可以通过实现 updateDispatchStatisticsLocked() 来分析 anr 问题。

5.2 drop事件分类

由 dropInboundEventLocked完成，输出事件丢弃的原因：

DROP_REASON_POLICY：“inbound event was dropped because the policy consumed it”;
DROP_REASON_DISABLED：“inbound event was dropped because input dispatch is disabled”
DROP_REASON_APP_SWITCH：“inbound event was dropped because of pending overdue app switch”
DROP_REASON_BLOCKED：“inbound event was dropped because the current application is not responding and the user has started interacting with a different application”
DROP_REASON_STALE：“inbound event was dropped because it is stale”

其他：