从源码角度看traces.txt是如何生成的

简介

原文请看我的博客:从源码角度看traces.txt是如何生成的

traces.txt 位于安卓系统下/data/anr目录下,当系统中有应用出现ANR时,framework会在弹出Dialog的同时dump出当前各线程的堆栈情况,方便开发者分析出ANR的root cause。

ANR是Application Not Responsing 的简称,简而言之,就是安卓系统内置提示用户应用界面没有反应的机制,是用来避免应用界面一直卡顿,增加系统用户友好度的一种方式。

造成ANR的原因很多,都是因为在主线程执行任务太久阻塞了界面更新导致的,主要有以下几类:

  1. Broadcast Timeout: 前台广播执行超过10s, 后台广播执行超过60s (要注意的是,只有串行的广播才有超时机制,并行的广播并不会超时,也就是说,如果广播是动态注册的,直接调用sendBroadcast触发,如果主线程Looper中排在后面的Message不会触发超时机制,那么即时这个广播是前台广播,系统也永远不会弹出框提示用户超时了)
  2. Service Timeout: 前台服务之星超过20s, 后台服务之星超过200s
  3. Provider Timeout: 内容提供者,publish超过10s
  4. Input Timeout: 按键触摸事件分发超过5s

能够造成ANR的前提是任务是在主线程上执行的,执行什么样的任务主要有以下几点:

  1. 执行耗时任务过久,如文件读取或存储,网络访问获取文件太耗时
  2. 线程被阻塞过久,或者干脆出现了死锁
  3. 线程饥饿,如Binder线程总共16个,Binder主线程占有一个,剩余的15个工作线程都被占满
  4. CPU饥饿,负载值过大,虽然代码正常额但任务一直没有来得及执行

那么回到traces.txt文件,它到底包含了什么信息可以帮助开发者找到ANR问题的根源呢。

在这篇文章,我先从一份traces.txt的日志实例开始解析,然后再通过追踪源码来解释traces.txt是如何生成的。

我是基于Android 5.0原生代码进行分析的,最新的一些代码逻辑放到了AppErrors类中了,有些小差别,但是大体的逻辑和步骤没有变化,不会影响阅读。

日志实例

// 触发ANR的应用pid及触发时间
----- pid 30307 at 2017-09-03 14:51:14 -----
Cmd line: com.example.androidyue.bitmapdemo

JNI: CheckJNI is off; workarounds are off; pins=0; globals=272

DALVIK THREADS:
(mutexes: tll=0 tsl=0 tscl=0 ghl=0)

// JAVA 态线程状态

// 线程名称; 优先级; 线程id; 线程状态
"main" prio=5 tid=1 TIMED_WAIT
  // 线程组; 线程suspend计数; 线程debug suspend计数; 同辈线程对象; 当前线程对象
  | group="main" sCount=1 dsCount=0 obj=0x416eaf18 self=0x416d8650
  // 线程id; 进程优先级; 调度者优先级; 调度者组名; 线程缓存句柄
  | sysTid=30307 nice=0 sched=0/0 cgrp=apps handle=1074565528
  // native线程状态; 调度者状态; 执行用户代码的总时间; 执行系统代码的总时间; 任务cpu使用数量
  | state=S schedstat=( 0 0 0 ) utm=5 stm=4 core=3
  // 线程调用栈
  at java.lang.VMThread.sleep(Native Method)
  at java.lang.Thread.sleep(Thread.java:1044)
  at java.lang.Thread.sleep(Thread.java:1026)
  at com.example.androidyue.bitmapdemo.MainActivity$1.run(MainActivity.java:27)
  at android.app.Activity.runOnUiThread(Activity.java:4794)
  at com.example.androidyue.bitmapdemo.MainActivity.onResume(MainActivity.java:33)
  at android.app.Instrumentation.callActivityOnResume(Instrumentation.java:1282)
  at android.app.Activity.performResume(Activity.java:5405)

// Native态线程状态

"Binder_5" prio=5 tid=32 NATIVE
  | group="main" sCount=1 dsCount=0 obj=0x41dd6570 self=0x678fa458
  | sysTid=29473 nice=0 sched=0/0 cgrp=apps handle=1732666432
  | state=S schedstat=( 578000 10600000 3 ) utm=0 stm=0 core=0
  #00  pc 00016fe4  /system/lib/libc.so (__ioctl+8)
  #01  pc 0002a97d  /system/lib/libc.so (ioctl+16)
  #02  pc 00016ba1  /system/lib/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+132)
  #03  pc 00017363  /system/lib/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+154)
  #04  pc 0001b15d  /system/lib/libbinder.so
  #05  pc 00011267  /system/lib/libutils.so (android::Thread::_threadLoop(void*)+114)
  #06  pc 0004679f  /system/lib/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+66)
  #07  pc 00010dcd  /system/lib/libutils.so
  #08  pc 0000e3d8  /system/lib/libc.so (__thread_entry+72)
  #09  pc 0000dac4  /system/lib/libc.so (pthread_create+160)
  at dalvik.system.NativeStart.run(Native Method)复制代码

关键词对照

关键词解释
prio优先级
tid线程id
group组名
sCount挂起次数
dsCount调试下 挂起计数
obj同辈线程对象
self当前线程对象
sysTid线程id
nice进程优先级
sched调度者优先级
cgrp调度者组名
handle线程缓存句柄
statenative线程状态
schedstat调度者状态
utm用户态CPU时间
stm内核态CPU时间
core运行所在核

线程状态对照

JAVA状态CPP状态解释
TERMINATEDZOMBIE线程死亡
RUNNABLERUNNING/RUNNABLE可运行或正在运行
TIMED_WAITINGTIMED_WAIT执行了带有超时参数的wait, sleep或join函数
BLOCKEDMONITOR线程阻塞,等待获取对象锁
WAITINGWAIT执行了无超时参数的wait
NEWINITIALIZING新建,正在初始化中
NEWSTARTING新建,正在启动
RUNNABLENATIVE正在执行JNI本地函数
WAITINGVMWAIT正在等待VM资源
RUNNABLESUSPENDED线程暂停,通常由于GC或者DEBUG被挂起

源码分析 (虚拟机)

我将从代码的最底层开始分析,再慢慢到触发点的顶端,先从cpp开始

Thread::DumpState

thread.cc

void Thread::DumpState(std::ostream& os, const Thread* thread, pid_t tid) {
  // 组名
  std::string group_name;
  // 优先级
  int priority;
  // 是否是daemon线程
  bool is_daemon = false;
  // 拿到当前线程对象指针
  Thread* self = Thread::Current();

  if (thread != nullptr) {
    ...
  }


  if (gAborting == 0 && self != nullptr && thread != nullptr && thread->tlsPtr_.opeer != nullptr) {
    ScopedObjectAccessUnchecked soa(self);
    priority = soa.DecodeField(WellKnownClasses::java_lang_Thread_priority)
        ->GetInt(thread->tlsPtr_.opeer);
    is_daemon = soa.DecodeField(WellKnownClasses::java_lang_Thread_daemon)
        ->GetBoolean(thread->tlsPtr_.opeer);

    ...

    if (thread_group != nullptr) {
      ...
      group_name = (group_name_string != nullptr) ? group_name_string->ToModifiedUtf8() : "<null>";
    }
  } else {
    priority = GetNativePriority();
  }

  std::string scheduler_group_name(GetSchedulerGroupName(tid));
  if (scheduler_group_name.empty()) {
    scheduler_group_name = "default";
  }

  // 这里开始输出线程状态
  if (thread != nullptr) {
    // 线程名输出
    os << '"' << *thread->tlsPtr_.name << '"';
    if (is_daemon) {
      os << " daemon";
    }
    // 优先级; ID; 状态
    os << " prio=" << priority
       << " tid=" << thread->GetThreadId()
       << " " << thread->GetState();
    if (thread->IsStillStarting()) {
      os << " (still starting up)";
    }
    os << "\n";
  } else {
    os << '"' << ::art::GetThreadName(tid) << '"'
       << " prio=" << priority
       << " (not attached)\n";
  }

  // 线程组; 线程suspend计数; 线程debug suspend计数; 同辈线程对象; 当前线程对象
  if (thread != nullptr) {
    MutexLock mu(self, *Locks::thread_suspend_count_lock_);
    os << "  | group=\"" << group_name << "\""
       << " sCount=" << thread->tls32_.suspend_count
       << " dsCount=" << thread->tls32_.debug_suspend_count
       << " obj=" << reinterpret_cast<void*>(thread->tlsPtr_.opeer)
       << " self=" << reinterpret_cast<const void*>(thread) << "\n";
  }

  // 线程id; 进程优先级; 调度者组名
  os << "  | sysTid=" << tid
     << " nice=" << getpriority(PRIO_PROCESS, tid)
     << " cgrp=" << scheduler_group_name;

  // 调度者优先级; 缓存句柄
  if (thread != nullptr) {
    int policy;
    sched_param sp;
    CHECK_PTHREAD_CALL(pthread_getschedparam, (thread->tlsPtr_.pthread_self, &policy, &sp),
                       __FUNCTION__);
    os << " sched=" << policy << "/" << sp.sched_priority
       << " handle=" << reinterpret_cast<void*>(thread->tlsPtr_.pthread_self);
  }
  os << "\n";

  // Grab the scheduler stats for this thread.
  std::string scheduler_stats;
  if (ReadFileToString(StringPrintf("/proc/self/task/%d/schedstat", tid), &scheduler_stats)) {
    scheduler_stats.resize(scheduler_stats.size() - 1);  // Lose the trailing '\n'.
  } else {
    scheduler_stats = "0 0 0";
  }

  char native_thread_state = '?';
  int utime = 0;
  int stime = 0;
  int task_cpu = 0;
  GetTaskStats(tid, &native_thread_state, &utime, &stime, &task_cpu);

  // native线程状态; 调度者状态; 执行用户代码的总时间; 执行系统代码的总时间; 任务cpu使用数量
  os << "  | state=" << native_thread_state
     << " schedstat=( " << scheduler_stats << " )"
     << " utm=" << utime
     << " stm=" << stime
     << " core=" << task_cpu
     << " HZ=" << sysconf(_SC_CLK_TCK) << "\n";
  if (thread != nullptr) {
    // dump 线程当前栈
    os << "  | stack=" << reinterpret_cast<void*>(thread->tlsPtr_.stack_begin) << "-"
        << reinterpret_cast<void*>(thread->tlsPtr_.stack_end) << " stackSize="
        << PrettySize(thread->tlsPtr_.stack_size) << "\n";
    // dump 线程锁信息
    os << "  | held mutexes=";
    for (size_t i = 0; i < kLockLevelCount; ++i) {
      if (i != kMonitorLock) {
        BaseMutex* mutex = thread->GetHeldMutex(static_cast<LockLevel>(i));
        if (mutex != nullptr) {
          os << " \"" << mutex->GetName() << "\"";
          if (mutex->IsReaderWriterMutex()) {
            ReaderWriterMutex* rw_mutex = down_cast<ReaderWriterMutex*>(mutex);
            if (rw_mutex->GetExclusiveOwnerTid() == static_cast<uint64_t>(tid)) {
              os << "(exclusive held)";
            } else {
              os << "(shared held)";
            }
          }
        }
      }
    }
    os << "\n";
  }
}复制代码

GetTaskStats

utils.cc

void GetTaskStats(pid_t tid, char* state, int* utime, int* stime, int* task_cpu) {
  *utime = *stime = *task_cpu = 0;
  std::string stats;
  // 从stat节点获取数据
  if (!ReadFileToString(StringPrintf("/proc/self/task/%d/stat", tid), &stats)) {
    return;
  }
  // Skip the command, which may contain spaces.
  stats = stats.substr(stats.find(')') + 2);
  // Extract the three fields we care about.
  std::vector<std::string> fields;
  Split(stats, ' ', &fields);
  *state = fields[0][0];
  *utime = strtoull(fields[11].c_str(), nullptr, 10);
  *stime = strtoull(fields[12].c_str(), nullptr, 10);
  *task_cpu = strtoull(fields[36].c_str(), nullptr, 10);
}复制代码

DumpUnattachedThread

thread_list.cc

// 读取该线程中的所有状态信息:JAVA态,Native态,Kernel态
static void DumpUnattachedThread(std::ostream& os, pid_t tid) NO_THREAD_SAFETY_ANALYSIS {
  // [接上]
  Thread::DumpState(os, nullptr, tid);
  DumpKernelStack(os, tid, "  kernel: ", false);

  if (false) {
    DumpNativeStack(os, tid, "  native: ");
  }
  os << "\n";
}

void ThreadList::DumpUnattachedThreads(std::ostream& os) {
  DIR* d = opendir("/proc/self/task");
  if (!d) {
    return;
  }

  Thread* self = Thread::Current();
  dirent* e;
  // 获取当前进程中所有线程,dump这些线程的信息
  while ((e = readdir(d)) != nullptr) {
    char* end;
    pid_t tid = strtol(e->d_name, &end, 10);
    if (!*end) {
      bool contains;
      {
        MutexLock mu(self, *Locks::thread_list_lock_);
        contains = Contains(tid);
      }
      if (!contains) {
        // [接上]
        DumpUnattachedThread(os, tid);
      }
    }
  }
  closedir(d);
}

bool ThreadList::Contains(Thread* thread) {
  return find(list_.begin(), list_.end(), thread) != list_.end();
}复制代码

DumpForSigQuit

thread_list.cc

void ThreadList::DumpForSigQuit(std::ostream& os) {
  ...
  DumpUnattachedThreads(os);
}复制代码

runtime.cc

void Runtime::DumpForSigQuit(std::ostream& os) {
  GetClassLinker()->DumpForSigQuit(os);
  GetInternTable()->DumpForSigQuit(os);
  GetJavaVM()->DumpForSigQuit(os);
  GetHeap()->DumpForSigQuit(os);
  TrackedAllocators::Dump(os);
  os << "\n";

  // [接上]
  // dump 线程列表
  thread_list_->DumpForSigQuit(os);
  BaseMutex::DumpAll(os);
}复制代码

HandleSigQuit

singal_catcher.cc

void SignalCatcher::HandleSigQuit() {
  Runtime* runtime = Runtime::Current();
  std::ostringstream os;
  // 输出进程id以及当前dump时间
  os << "\n"
      << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";

  DumpCmdLine(os);

  // 输出系统信息
  std::string fingerprint = runtime->GetFingerprint();
  os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\n";
  os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\n";

  os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";

  // [接上]
  runtime->DumpForSigQuit(os);

  ...
  os << "----- end " << getpid() << " -----\n";
  Output(os.str());
}复制代码

SignalCatcher::Run

void* SignalCatcher::Run(void* arg) {
...
  // Set up mask with signals we want to handle.
  SignalSet signals;
  signals.Add(SIGQUIT);
  signals.Add(SIGUSR1);

  while (true) {
    // 等待信号
    int signal_number = signal_catcher->WaitForSignal(self, signals);
...

    switch (signal_number) {
    case SIGQUIT:
      // [接上]
      signal_catcher->HandleSigQuit();
      break;
    case SIGUSR1:
      signal_catcher->HandleSigUsr1();
      break;
    default:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;
      break;
    }
  }
}

int SignalCatcher::WaitForSignal(Thread* self, SignalSet& signals) {
...
  int signal_number = signals.Wait();
...
  return signal_number;
}复制代码

源码分析 (framework)

可见,android framework通过发送信号3来通知虚拟机来进行dump到traces.txt文件中的

dumpStackTraces

ActivityManagerService.java

private static void dumpStackTraces(String tracesPath, ArrayList<Integer> firstPids,
                                    ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
    // 当trace写完后,监听并触发onEvent方法
    FileObserver observer = new FileObserver(tracesPath, FileObserver.CLOSE_WRITE) {
        @Override
        public synchronized void onEvent(int event, String path) {
            notify();
        }
    };

    try {
        // 开始监听
        observer.startWatching();

        // 首先收集firstPids链表里的进程信息
        if (firstPids != null) {
            try {
                int num = firstPids.size();
                for (int i = 0; i < num; i++) {
                    synchronized (observer) {
                        // 发送信号3来触发
                        Process.sendSignal(firstPids.get(i), Process.SIGNAL_QUIT);
                        observer.wait(200);  // Wait for write-close, give up after 200msec
                    }
                }
            } catch (InterruptedException e) {
                Slog.wtf(TAG, e);
            }
        }

        // 收集native进程信息
        ...

        // 最后收集CPU信息
        ...
    } finally {
        observer.stopWatching();
    }
}复制代码
public static File dumpStackTraces(boolean clearTraces, ArrayList<Integer> firstPids,
                                   ProcessCpuTracker processCpuTracker, SparseArray<Boolean> lastPids, String[] nativeProcs) {
    // 获取trace文件路径
    String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);
...

    File tracesFile = new File(tracesPath);
    try {
...
        FileUtils.setPermissions(tracesDir.getPath(), 0775, -1, -1);  // drwxrwxr-x

        if (clearTraces && tracesFile.exists()) tracesFile.delete();
        tracesFile.createNewFile();
        FileUtils.setPermissions(tracesFile.getPath(), 0666, -1, -1); // -rw-rw-rw-
    } catch (IOException e) {
        Slog.w(TAG, "Unable to prepare ANR traces file: " + tracesPath, e);
        return null;
    }

    // [接上]
    dumpStackTraces(tracesPath, firstPids, processCpuTracker, lastPids, nativeProcs);
    return tracesFile;
}复制代码

appNotResponding

ActivityManagerService.java

final void appNotResponding(ProcessRecord app, ActivityRecord activity,
                            ActivityRecord parent, boolean aboveSystem, final String annotation) {
    ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
    SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
...
    final ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);

    // [接上]
    File tracesFile = dumpStackTraces(true, firstPids, processCpuTracker, lastPids,
            NATIVE_STACKS_OF_INTEREST);
...
    Slog.e(TAG, info.toString());
    if (tracesFile == null) {
        // There is no trace file, so dump (only) the alleged culprit's threads to the log
        Process.sendSignal(app.pid, Process.SIGNAL_QUIT);
    }

    addErrorToDropBox("anr", app, app.processName, activity, parent, annotation,
            cpuInfo, tracesFile, null);
...
}复制代码

触发ANR的方式上面我已经列举过了,这里我以触发Broadcast Timeout时进行分析

AppNotResponding.run

BroadcastQueue.java

private final class AppNotResponding implements Runnable {
    private final ProcessRecord mApp;
    private final String mAnnotation;

    public AppNotResponding(ProcessRecord app, String annotation) {
        mApp = app;
        mAnnotation = annotation;
    }

    @Override
    public void run() {
        // [接上]
        mService.appNotResponding(mApp, null, null, false, mAnnotation);
    }
}复制代码

broadcastTimeoutLocked

BroadcastQueue.java

final void broadcastTimeoutLocked(boolean fromMsg) {
...
    String anrMessage = null;
...
    if (app != null) {
        anrMessage = "Broadcast of " + r.intent.toString();
    }
...
    if (anrMessage != null) {
        // [接上]
        mHandler.post(new AppNotResponding(app, anrMessage));
    }
}复制代码

BroadcastHandler.handleMessage

private final class BroadcastHandler extends Handler {
    public BroadcastHandler(Looper looper) {
        super(looper, null, true);
    }

    @Override
    public void handleMessage(Message msg) {
        switch (msg.what) {
...
            case BROADCAST_TIMEOUT_MSG: {
                synchronized (mService) {
                    // [接上]
                    broadcastTimeoutLocked(true);
                }
            } break;
...
        }
    }
};复制代码

setBroadcastTimeoutLocked

BroadcastQueue.java

final void setBroadcastTimeoutLocked(long timeoutTime) {
    if (! mPendingBroadcastTimeoutMessage) {
        Message msg = mHandler.obtainMessage(BROADCAST_TIMEOUT_MSG, this);
        mHandler.sendMessageAtTime(msg, timeoutTime);
        mPendingBroadcastTimeoutMessage = true;
    }
}

final void processNextBroadcast(boolean fromMsg) {
    if (! mPendingBroadcastTimeoutMessage) {
        long timeoutTime = r.receiverTime + mTimeoutPeriod;
        // [接上]
        setBroadcastTimeoutLocked(timeoutTime);
    }
}复制代码

总结

下面总结一下触发广播ANR后输出traces.txt的过程:

  1. 有序广播在注册后发送
  2. AMS设置广播超时时间,通过Handler进行延迟设定
  3. 广播超时触发超时机制
  4. Framework层发送信号给虚拟机
  5. 虚拟机dump出特定进程所有线程信息并存储到traces.txt文件中
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值