做系统稳定性问题分析,当遇到系统卡死时,我们经常要使用“kill -3 pid”来打印System_Server进程各个线程的Java调用栈,根据线程状态及调用栈来更进一步定位问题点,当然某个应该界面卡顿时间长时也可以通过这个命令来抓取Java调用栈进行分析。注意native进程是不能用kill -3来打trace的,而是使用debuggerd。但是某些时候打印不出来trace,要知道原因,自然要知道“kill -3 pid”原理是怎么样的。
“Signal Catcher”线程。由Zygote孵化出来的每个进程会启动一个“Signal Catcher”线程,这个线程就是专门用来接收、处理进程收到的SIGQUIT、SIGUSR1信号的。注意,Zygote进程是不存在“Signal Catcher”线程的,所以是打不出来trace的。利用“ps -t pid”可打印进程pid的所有线程,可以看到有一个“Signal Catcher”线程。
“Signal Catcher”线程启动。启动流程很简单,如下图所示,可根据下面这个流程自行走一遍代码(基于Android 5.1)。
上面这个时序图中,主要逻辑集中在art/runtime/Signal_catcher.cc文件中,下面将具体分析时序图中的run()、HandleSigQuit()、Output()三个函数。
1、run()
- void* SignalCatcher::Run(void* arg) {
- SignalCatcher* signal_catcher = reinterpret_cast<SignalCatcher*>(arg);
- CHECK(signal_catcher != NULL);
-
- Runtime* runtime = Runtime::Current();
- CHECK(runtime->AttachCurrentThread("Signal Catcher", true, runtime->GetSystemThreadGroup(),
- !runtime->IsCompiler()));
- Thread* self = Thread::Current();
- DCHECK_NE(self->GetState(), kRunnable);
- {
- MutexLock mu(self, signal_catcher->lock_);
- signal_catcher->thread_ = self;
- signal_catcher->cond_.Broadcast(self);
- }
-
-
- SignalSet signals;
- signals.Add(SIGQUIT);
- signals.Add(SIGUSR1);
-
- while (true) {
- int signal_number = signal_catcher->WaitForSignal(self, signals);
-
- if (signal_catcher->ShouldHalt()) {
- runtime->DetachCurrentThread();
- return NULL;
- }
-
- switch (signal_number) {
- case SIGQUIT:
- signal_catcher->HandleSigQuit();
- break;
- case SIGUSR1:
- signal_catcher->HandleSigUsr1();
- break;
- default:
- LOG(ERROR) << "Unexpected signal %d" << signal_number;
- break;
- }
- }
- }
2、HandleSigQuit()
- void SignalCatcher::HandleSigQuit() {
- Runtime* runtime = Runtime::Current();
- ThreadList* thread_list = runtime->GetThreadList();
-
-
-
-
- thread_list->SuspendAll();
- Thread* self = Thread::Current();
- Locks::mutator_lock_->AssertExclusiveHeld(self);
- const char* old_cause = self->StartAssertNoThreadSuspension("Handling SIGQUIT");
- ThreadState old_state = self->SetStateUnsafe(kRunnable);
-
- std::ostringstream os;
- os << "\n"
- << "----- pid " << getpid() << " at " << GetIsoDate() << " -----\n";
-
- DumpCmdLine(os);
-
-
- os << "ABI: " << GetInstructionSetString(runtime->GetInstructionSet()) << "\n";
-
- os << "Build type: " << (kIsDebugBuild ? "debug" : "optimized") << "\n";
-
- runtime->DumpForSigQuit(os);
-
- if (false) {
- std::string maps;
- if (ReadFileToString("/proc/self/maps", &maps)) {
- os << "/proc/self/maps:\n" << maps;
- }
- }
- os << "----- end " << getpid() << " -----\n";
- CHECK_EQ(self->SetStateUnsafe(old_state), kRunnable);
- self->EndAssertNoThreadSuspension(old_cause);
- thread_list->ResumeAll();
-
-
- if (self->ReadFlag(kCheckpointRequest)) {
- self->RunCheckpointFunction();
- }
- Output(os.str());
- }
3、Output()
- void SignalCatcher::Output(const std::string& s) {
- if (stack_trace_file_.empty()) {
- LOG(INFO) << s;
- return;
- }
-
- ScopedThreadStateChange tsc(Thread::Current(), kWaitingForSignalCatcherOutput);
- int fd = open(stack_trace_file_.c_str(), O_APPEND | O_CREAT | O_WRONLY, 0666);
- if (fd == -1) {
- PLOG(ERROR) << "Unable to open stack trace file '" << stack_trace_file_ << "'";
- return;
- }
- std::unique_ptr<File> file(new File(fd, stack_trace_file_));
- if (!file->WriteFully(s.data(), s.size())) {
- PLOG(ERROR) << "Failed to write stack traces to '" << stack_trace_file_ << "'";
- } else {
- LOG(INFO) << "Wrote stack traces to '" << stack_trace_file_ << "'";
- }
- }
总结:熟悉了这个流程,以后碰到打不出来trace,通过日志可大致定位问题点。最后再说一下SIGQUIT、SIGUSR1信号处理,SIGQUIT(kill -3 pid)用来打印Java进程trace,SIGUSR1(kill -10 pid)可触发进程进行一次强制GC。