SignalCatcher

最新推荐文章于 2024-07-15 16:23:25 发布

啃着地瓜数星星

最新推荐文章于 2024-07-15 16:23:25 发布

阅读量3.3k

点赞数 1

分类专栏： Android Android 源码文章标签： android art虚拟机

本文链接：https://blog.csdn.net/u013989732/article/details/78914528

版权

Android 同时被 2 个专栏收录

24 篇文章 2 订阅

订阅专栏

Android 源码

22 篇文章 1 订阅

订阅专栏

本文基于 Android 7.1

一、SignalCatcher 线程的启动

1.1 StartSignalCatcher

runtime.cc

void Runtime::InitNonZygoteOrPostFork(
    JNIEnv* env, bool is_system_server, NativeBridgeAction action, const char* isa) {
  ...
  StartSignalCatcher();
  ...
}

由上面可知，SignalCatcher 线程是在 InitNonZygoteOrPostFork 方法中启动的
runtime.cc

void Runtime::StartSignalCatcher() {
  if (!is_zygote_) {
    signal_catcher_ = new SignalCatcher(stack_trace_file_);
  }
}

如果不是 zygote 进程，则创建一个 SignalCatcher，由此也可以知道 zygote 进程中是没有 SignalCatcher 线程的，并且用 adb shell ps -t 可以进行确认

1.2 创建 SignalCatcher

1.2.1 SignalCatcher(…)

signal_catcher.cc

SignalCatcher::SignalCatcher(const std::string& stack_trace_file)
    : stack_trace_file_(stack_trace_file),
      lock_("SignalCatcher lock"),
      cond_("SignalCatcher::cond_", lock_),
      thread_(nullptr) {
  SetHaltFlag(false);

  // Create a raw pthread; its start routine will attach to the runtime.
  CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread");

  Thread* self = Thread::Current();
  MutexLock mu(self, lock_);
  while (thread_ == nullptr) {
    cond_.Wait(self);
  }
}

CHECK_PTHREAD_CALL(pthread_create, (&pthread_, nullptr, &Run, this), "signal catcher thread") 实际上会调用 pthread_create(&pthread_, nullptr, &Run, this) 即新创建一个线程，并调用 Run(this) 方法，pthread_ 会指向新创建的线程

1.2.2 SignalCatcher::Run

signal_catcher.cc

void* SignalCatcher::Run(void* arg) {
  SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
  CHECK(signal_catcher != nullptr);

  Runtime* runtime = Runtime::Current();
  // 将当前线程 attach 到当前的 JavaVM
  CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),
                                     !runtime->IsAotCompiler()));

  Thread* self = Thread::Current();
  DCHECK_NE(self->GetState(), kRunnable);
  {
    MutexLock mu(self, signal_catcher->lock_);
    signal_catcher->thread_ = self;
    signal_catcher->cond_.Broadcast(self);
  }

  // Set up mask with signals we want to handle.
  SignalSet signals;
  signals.Add(SIGQUIT);
  signals.Add(SIGUSR1);

  while (true) {
    // 见 1.2.3
    int signal_number = signal_catcher->WaitForSignal(self, signals);
    if (signal_catcher->ShouldHalt()) {
      runtime->DetachCurrentThread();
      return nullptr;
    }

    switch (signal_number) {
    case SIGQUIT:
      signal_catcher->HandleSigQuit();
      break;
    case SIGUSR1:
      signal_catcher->HandleSigUsr1();
      break;
    default:
      LOG(ERROR) << "Unexpected signal %d" << signal_number;
      break;
    }
  }
}

由上可知，其会添加想要 sigwait() 的信号（SIGQUIT、SIGUSR1），并执行 WaitForSignal 等待信号的到来，然后对信号分类进行处理

1.2.3 WaitForSignal

signal_catcher.cc

int SignalCatcher::WaitForSignal(Thread* self, SignalSet& signals) {
  ScopedThreadStateChange tsc(self, kWaitingInMainSignalCatcherLoop);

  // Signals for sigwait() must be blocked but not ignored.  We
  // block signals like SIGQUIT for all threads, so the condition
  // is met.  When the signal hits, we wake up, without any signal
  // handlers being invoked.
  int signal_number = signals.Wait();
  if (!ShouldHalt()) {
    // Let the user know we got the signal, just in case the system's too screwed for us to
    // actually do what they want us to do...
    LOG(INFO) << *self << ": reacting to signal " << signal_number;

    // If anyone's holding locks (which might prevent us from getting back into state Runnable), say so...
    Runtime::Current()->DumpLockHolders(LOG(INFO));
  }

  return signal_number;
}

注意上面的注释：这里 wait 的 Signals 必须是 blocked，但不是 ignored 的. 因为对于所有线程我们将类似于 SIGQUIT 的信号都 block 了（见下一节），因此条件达成。当信号到来时，程序会唤醒，并且没有 signal handlers 被调用

1.2.4 BlockSignals

runtime.cc

bool Runtime::Init(RuntimeArgumentMap&& runtime_options_in) {
  ...
  BlockSignals();
  ...
}

void Runtime::BlockSignals() {
  SignalSet signals;
  signals.Add(SIGPIPE);
  // SIGQUIT is used to dump the runtime's state (including stack traces).
  signals.Add(SIGQUIT);
  // SIGUSR1 is used to initiate a GC.
  signals.Add(SIGUSR1);
  signals.Block();
}

在虚拟机的创建过程中会将信号 block

二、HandleSigQuit

当收到 SIGQUIT 信号，即 signal 3 时，会调用 signal_catcher->HandleSigQuit() 来 dump 一些信息和 stack traces

2.1 SignalCatcher::HandleSigQuit

signal_catcher.cc

void SignalCatcher::HandleSigQuit() {
  Runtime* runtime = Runtime::Current();
  std::ostringstream os;
  // ----- pid 2830 at 2017-11-16 11:22:53 -----
  os << "\n"
      << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";
  // Cmd line: system_server
  DumpCmdLine(os);

  std::string fingerprint = runtime->GetFingerprint();
  // Build fingerprint: 'Xiaomi/cancro_wc_lte/cancro:6.0.1/MMB29M/1.1.1:user/test-keys'
  os << "Build fingerprint: '" << (fingerprint.empty() ? "unknown" : fingerprint) << "'\n";
  // ABI: 'arm'
  os << "ABI: '" << GetInstructionSetString(runtime->GetInstructionSet()) << "'\n";
  // Build type: optimized
  os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";

  runtime->DumpForSigQuit(os);

  if ((false)) {
    std::string maps;
    if (ReadFileToString("/proc/self/maps", &maps)) {
      os << "/proc/self/maps:\n" << maps;
    }
  }
  // ----- end 2830 -----
  os << "----- end " << getpid() << " -----\n";
  Output(os.str());
}

2.2 DumpForSigQuit

runtime.cc

void Runtime::DumpForSigQuit(std::ostream& os) {
  // Zygote loaded classes=4188 post zygote classes=3570
  GetClassLinker()->DumpForSigQuit(os);
  // Intern table: 59686 strong; 10043 weak
  GetInternTable()->DumpForSigQuit(os);
  // JNI: CheckJNI is off; globals=1993 (plus 2995 weak)
  // Libraries: /system/lib/hw/gralloc.msm8974.so ...
  GetJavaVM()->DumpForSigQuit(os);

  oat_file_manager_->DumpForSigQuit(os);
  if (GetJit() != nullptr) {
    GetJit()->DumpForSigQuit(os);
  } else {
    os << "Running non JIT\n";
  }
  TrackedAllocators::Dump(os);
  os << "\n";

  thread_list_->DumpForSigQuit(os);
  BaseMutex::DumpAll(os);
}

thread_list_->DumpForSigQuit(os) 是关键的 dump，会 dump stack traces

2.3 thread_list_->DumpForSigQuit

2.3.1 ThreadList::DumpForSigQuit

thread_list.cc

void ThreadList::DumpForSigQuit(std::ostream& os) {
  {
    ScopedObjectAccess soa(Thread::Current());
    // Only print if we have samples.
    if (suspend_all_historam_.SampleSize() > 0) {
      Histogram<uint64_t>::CumulativeData data;
      suspend_all_historam_.CreateHistogram(&data);
      suspend_all_historam_.PrintConfidenceIntervals(os, 0.99, data);  // Dump time to suspend.
    }
  }
  bool dump_native_stack = Runtime::Current()->GetDumpNativeStackOnSigQuit();
  Dump(os, dump_native_stack);
  // dump 当前进程中没有 attach 的线程的 stack traces
  DumpUnattachedThreads(os, dump_native_stack);
}

2.3.2 ThreadList::Dump

thread_list.cc

void ThreadList::Dump(std::ostream& os, bool dump_native_stack) {
  {
    MutexLock mu(Thread::Current(), *Locks::thread_list_lock_);
    os << "DALVIK THREADS (" << list_.size() << "):\n";
  }
  DumpCheckpoint checkpoint(&os, dump_native_stack);
  size_t threads_running_checkpoint;
  {
    // Use SOA to prevent deadlocks if multiple threads are calling Dump() at the same time.
    ScopedObjectAccess soa(Thread::Current());
    threads_running_checkpoint = RunCheckpoint(&checkpoint);
  }
  if (threads_running_checkpoint != 0) {
    checkpoint.WaitForThreadsToRunThroughCheckpoint(threads_running_checkpoint);
  }
}

由上面可以看到其创建了一个 DumpCheckpoint 对象 checkpoint，然后调用 RunCheckpoint(&checkpoint)，下面我们看一下 DumpCheckpoint 是什么

2.3.3 DumpCheckpoint

thread_list.cc

class DumpCheckpoint FINAL : public Closure {
 public:
  DumpCheckpoint(std::ostream* os, bool dump_native_stack)
      : os_(os),
        barrier_(0),
        backtrace_map_(dump_native_stack ? BacktraceMap::Create(getpid()) : nullptr),
        dump_native_stack_(dump_native_stack) {}

  void Run(Thread* thread) OVERRIDE {
    Thread* self = Thread::Current();
    std::ostringstream local_os;
    {
      ScopedObjectAccess soa(self);
      // 1. dump traces 等
      if (!timeout_threads_.empty()
          && find(timeout_threads_.begin(), timeout_threads_.end(), thread) != timeout_threads_.end()) {
        Thread::DumpState(local_os, thread, thread->GetTid(), true);
      } else {
        thread->Dump(local_os, dump_native_stack_, backtrace_map_.get());
      }
    }
    local_os << "\n";
    {
      // Use the logging lock to ensure serialization when writing to the common ostream.
      MutexLock mu(self, *Locks::logging_lock_);
      *os_ << local_os.str();
    }
    // 2. 每个线程在 Run 函数中 Dump thread 完成后，会通知 barrier_ 对其 count_ -1，当 count_ 为0时，说明所有线程已经完成 dump，同时把 thread_list_ 中完成 dump 的 thread 去掉 
    barrier_.Pass(self, thread);
  }

 private:
  // The common stream that will accumulate all the dumps.
  std::ostream* const os_;
  // The barrier to be passed through and for the requestor to wait upon.
  Barrier barrier_;
  // A backtrace map, so that all threads use a shared info and don't reacquire/parse separately.
  std::unique_ptr<BacktraceMap> backtrace_map_;
  // Whether we should dump the native stack.
  const bool dump_native_stack_;
  std::list<Thread*> timeout_threads_;
};

可以看到：

创建 DumpCheckpoint 对象时，仅仅是对一些成员变量进行赋值
DumpCheckpoint 的 Run 方法主要实现了两方面的功能：
- 其是真正执行 dump 信息的地方
- 每个线程在 Run 函数中 Dump thread 完成后，会通知 barrier_ 对其 count_ -1，当 count_ 为0时，说明所有线程已经完成 dump，同时把 thread_list_ 中完成 dump 的 thread 去掉

下面我们再来看一下 RunCheckpoint 做了什么

2.3.4 ThreadList::RunCheckpoint

thread_list.cc

size_t ThreadList::RunCheckpoint(Closure* checkpoint_function, bool isDumpCheckpoint) { // isDumpCheckpoint 默认为 false
  Thread* self = Thread::Current();
  Locks::mutator_lock_->AssertNotExclusiveHeld(self);
  Locks::thread_list_lock_->AssertNotHeld(self);
  Locks::thread_suspend_count_lock_->AssertNotHeld(self);

  std::vector<Thread*> suspended_count_modified_threads;
  size_t count = 0;
  {
    // Call a checkpoint function for each thread, threads which are suspend get their checkpoint
    // manually called.
    MutexLock mu(self, *Locks::thread_list_lock_);
    MutexLock mu2(self, *Locks::thread_suspend_count_lock_);
    if(isDumpCheckpoint) {
      ((DumpCheckpoint *)checkpoint_function)->SetThreadList(self, list_);
    }
    // 1. 对于 list_ 中的 thread 分情况进行处理
    count = list_.size();
    for (const auto& thread : list_) {
      if (thread != self) {
        while (true) {
          if (thread->RequestCheckpoint(checkpoint_function)) {
            // This thread will run its checkpoint some time in the near future.
            break;
          } else {
            // 对于 suspended 线程，先 modify SuspendCount，然后将其加入 suspended_count_modified_threads 中，后面会继续对 suspended_count_modified_threads 进行处理
            if (thread->GetState() == kRunnable) {
              // Spurious fail, try again.
              continue;
            }
            thread->ModifySuspendCount(self, +1, nullptr, false);
            suspended_count_modified_threads.push_back(thread);
            break;
          }
        }
      }
    }
  }

  // Run the checkpoint on ourself while we wait for threads to suspend.
  // 2. 对于 Signal Catcher 线程，在这里执行 CheckPoint function 的 Run 函数调用，进行 Thread dump
  checkpoint_function->Run(self);

  // Run the checkpoint on the suspended threads.
  for (const auto& thread : suspended_count_modified_threads) {
    if (!thread->IsSuspended()) {
      if (ATRACE_ENABLED()) {
        std::ostringstream oss;
        thread->ShortDump(oss);
        ATRACE_BEGIN((std::string("Waiting for suspension of thread ") + oss.str()).c_str());
      }
      // Busy wait until the thread is suspended.
      const uint64_t start_time = NanoTime();
      do {
        ThreadSuspendSleep(kThreadSuspendInitialSleepUs);
      } while (!thread->IsSuspended());
      const uint64_t total_delay = NanoTime() - start_time;
      // Shouldn't need to wait for longer than 1000 microseconds.
      constexpr uint64_t kLongWaitThreshold = MsToNs(1);
      ATRACE_END();
    }
    // We know for sure that the thread is suspended at this point.
    // 3. 对于 suspended 线程，执行 checkpoint_function 的 Run 方法
    checkpoint_function->Run(thread);
    {
      MutexLock mu2(self, *Locks::thread_suspend_count_lock_);
      // 4. 对于已经 dump 的线程，将其 suspend count -1
      thread->ModifySuspendCount(self, -1, nullptr, false);
    }
  }

  {
    // 5. Imitate ResumeAll, threads may be waiting on Thread::resume_cond_ since we raised their
    // suspend count. Now the suspend_count_ is lowered so we must do the broadcast.
    MutexLock mu2(self, *Locks::thread_suspend_count_lock_);
    Thread::resume_cond_->Broadcast(self);
  }
  // return 的是 thread_list 的 size
  return count;
}

2.3.5 WaitForThreadsToRunThroughCheckpoint

thread_list.cc

class DumpCheckpoint FINAL : public Closure {
 public:
  void WaitForThreadsToRunThroughCheckpoint(size_t threads_running_checkpoint) {
    Thread* self = Thread::Current();
    ThreadState new_state = kWaitingForCheckPointsToRun;
    if(Locks::abort_lock_->IsExclusiveHeld(self) && self->GetState() == kRunnable) {
      new_state = kRunnable;
    }
    ScopedThreadStateChange tsc(self, new_state);
    bool timed_out = barrier_.Increment(self, threads_running_checkpoint, kDumpWaitTimeout);
    if (timed_out) {
      // Avoid a recursive abort.
      LOG(ERROR) << "Unexpected time out during dump checkpoint.";

      std::list<Thread*> list = barrier_.GetThreadList(self);
      timeout_threads_.assign(list.begin(), list.end());
      {
        // abnormal dump
        MutexLock mu(self, *Locks::logging_lock_);
        *os_ << " ------- " << timeout_threads_.size() << " threads dump checkpoint timed out --------\n\n";
      }
      for (const auto& thread : timeout_threads_) {
        bool contains = false;
        {
          MutexLock mu(self, *Locks::thread_list_lock_);
          std::list<Thread*> thread_list = Runtime::Current()->GetThreadList()->GetList();
          contains = find(thread_list.begin(), thread_list.end(), thread) != thread_list.end();
        }
        // 1. detached thread should have already passed the barrier
        // 2. only kRunnable thread have been set a checkpoint function
        // 3. non kRunnable thread is dumped by this thread, will not timeout
        if (contains && thread->HasCheckpointFunction(this)) {
          thread->RunCheckpointFunction();
        }
      }
    }
  }
};

barrier.cc

bool Barrier::Increment(Thread* self, int delta, uint32_t timeout_ms) {
  MutexLock mu(self, lock_);
  SetCountLocked(self, count_ + delta);
  bool timed_out = false;
  if (count_ != 0) {
    uint32_t timeout_ns = 0;
    uint64_t abs_timeout = NanoTime() + MsToNs(timeout_ms);
    for (;;) {
      timed_out = condition_.TimedWait(self, timeout_ms, timeout_ns);
      if (timed_out || count_ == 0) return timed_out;
      // Compute time remaining on timeout.
      uint64_t now = NanoTime();
      int64_t time_left = abs_timeout - now;
      if (time_left <= 0) return true;
      timeout_ns = time_left % (1000*1000);
      timeout_ms = time_left / (1000*1000);
    }
  }
  return timed_out;
}

可以看到 Increment 在两种情况下会返回，timeout 或者 count_ == 0（即所有的线程都完成 dump）

由此，也可以看出 WaitForThreadsToRunThroughCheckpoint 方法的作用就是等待所有的线程都完成 dump，并且对于超时没有完成 dump 的情况进行一些特殊处理

2.3.6 RequestCheckpoint

thread.cc

bool Thread::RequestCheckpoint(Closure* function) {
  union StateAndFlags old_state_and_flags;
  old_state_and_flags.as_int = tls32_.state_and_flags.as_int;
  if (old_state_and_flags.as_struct.state != kRunnable) {
    return false;  // 1. Fail, thread is suspended and so can't run a checkpoint.
  }

  uint32_t available_checkpoint = kMaxCheckpoints;
  for (uint32_t i = 0 ; i < kMaxCheckpoints; ++i) {
    if (tlsPtr_.checkpoint_functions[i] == nullptr) {
      available_checkpoint = i;
      break;
    }
  }
  if (available_checkpoint == kMaxCheckpoints) {
    // 2. No checkpoint functions available, we can't run a checkpoint
    return false;
  }
  // 3. 设置 checkpoint_function
  tlsPtr_.checkpoint_functions[available_checkpoint] = function;

  // Checkpoint function installed now install flag bit.
  // We must be runnable to request a checkpoint.
  DCHECK_EQ(old_state_and_flags.as_struct.state, kRunnable);
  union StateAndFlags new_state_and_flags;
  new_state_and_flags.as_int = old_state_and_flags.as_int;
  new_state_and_flags.as_struct.flags |= kCheckpointRequest;
  bool success = tls32_.state_and_flags.as_atomic_int.CompareExchangeStrongSequentiallyConsistent(
      old_state_and_flags.as_int, new_state_and_flags.as_int);
  if (UNLIKELY(!success)) {
    // The thread changed state before the checkpoint was installed.
    CHECK_EQ(tlsPtr_.checkpoint_functions[available_checkpoint], function);
    tlsPtr_.checkpoint_functions[available_checkpoint] = nullptr;
  } else {
    CHECK_EQ(ReadFlag(kCheckpointRequest), true);
    TriggerSuspend();
  }
  return success;
}

这个方法实际上是针对 kRunnable 线程的，会对其设置 checkpoint_function，当线程运行到 checkpoint 的点时，会执行 checkpoint_functions 中的 function，在我们这种情况下会执行到 DumpCheckpoint 的 Run 方法。

2.4 总结

从上面的分析可以看出：

进行 DumpForSigQuit 时，RunCheckpoint 是最主要的处理，其主要将线程分为 suspended 和 kRunnable 两种情况来对线程进行 dump：
- 对于 suspended 状态的线程，会将其存在一个 suspended_count_modified_threads 结构中，后面会对 suspended_count_modified_threads 中的每个线程执行 DumpCheckpoint 的 Run 方法（即 dump）；这种情况下，对每个 suspended 线程的 dump 运行在 “Signal Catcher” 线程中
- 对于 kRunnable 状态的线程，会对其执行 RequestCheckpoint 操作，即对其设置 checkpoint_function，当线程运行到 checkpoint 的点时，会执行 checkpoint_functions 中的 function；可以看到这种情况下 dump 操作在各个线程中
执行过 RunCheckpoint 方法后，会执行 checkpoint.WaitForThreadsToRunThroughCheckpoint(threads_running_checkpoint)
- threads_running_checkpoint 是整个 thread_list 的 size，也就是需要 dump 的线程的数量
- 每个线程在 Run 函数中 Dump thread 完成后，会通知 barrier_ 对其 count_ -1，当 count_ 为0时，说明所有线程已经完成 dump，同时把 thread_list_ 中完成 dump 的 thread 去掉；这里等待的就是 barrier_ 的 count 变为 0，如果超时未完成则会进行一些处理