5.Threadlist DumpForSigQuit
看一下ThreadList 的Dump过程:
void ThreadList::DumpForSigQuit(std::ostream& os) {
{
ScopedObjectAccess soa(Thread::Current());
// Only print if we have samples.
if (suspend_all_historam_.SampleSize() > 0) { // 这个数据记录一次SuspendAll所花费的时间,如果记录里有数据就进行dump
Histogram::CumulativeData data;
suspend_all_historam_.CreateHistogram(&data);
suspend_all_historam_.PrintConfidenceIntervals(os, 0.99, data); // Dump time to suspend.
}
}
Dump(os); // Dump thread list
DumpUnattachedThreads(os); // 对于当前进程中,没有Attach 的线程进行Dump
}
void ThreadList::Dump(std::ostream& os) {
{
MutexLock mu(Thread::Current(), *Locks::thread_list_lock_);
os
}
local_os < backtrace_map_;
};
即,Dump Thread list 是通过每个thread执行DumpCheckpoint来Dump 各个thread的状态和backtrace的;
看下每个Thread是如何执行DumpCheckPoint的:
size_t ThreadList::RunCheckpoint(Closure* checkpoint_function) {
Thread* self = Thread::Current();
Locks::mutator_lock_->AssertNotExclusiveHeld(self);
Locks::thread_list_lock_->AssertNotHeld(self);
Locks::thread_suspend_count_lock_->AssertNotHeld(self);
if (kDebugLocking && gAborting == 0) {
CHECK_NE(self->GetState(), kRunnable);
}
std::vectorsuspended_count_modified_threads;
size_t count = 0;
{
// 第一步:Runnable线程和Suspended线程区分对待
// Call a checkpoint function for each thread, threads which are suspend get their checkpoint
// manually called.这里已经说明,让每个thread执行 CheckPoint函数,对于Suspend的线程,我们手动帮它们调用 CheckPoint函数;
MutexLock mu(self, *Locks::thread_list_lock_);
MutexLock mu2(self, *Locks::thread_suspend_count_lock_);
count = list_.size();
for (const auto& thread : list_) {
if (thread != self) {
while (true) {
// 对于Runnable的线程,把checkpoint_function设置到当前线程的 CheckPoint function列表中,当线程执行到CheckPoint时,会执行该CheckPoint function
if (thread->RequestCheckpoint(checkpoint_function)) {
// This thread will run its checkpoint some time in the near future.
break;
} else {
// We are probably suspended, try to make sure that we stay suspended.
// The thread switched back to runnable.
if (thread->GetState() == kRunnable) {
// Spurious fail, try again.
continue;
}
// 对于suspended线程,放到一个集合里,稍后单独处理,为了防止处理过成中线程状态改变,影响处理,在这里把线程的suspend count +1,
// 这样即便线程原有的suspended Request结束时,suspend count仍然不为0,无法进入Runnable状态
thread->ModifySuspendCount(self, +1, false);
suspended_count_modified_threads.push_back(thread);
break;
}
}
}
}
}
// Run the checkpoint on ourself while we wait for threads to suspend.
checkpoint_function->Run(self); // 对于Signal Catcher线程,在这里进行 CheckPoint function的Run函数调用,进行Thread dump
// Run the checkpoint on the suspended threads.
for (const auto& thread : suspended_count_modified_threads) {
if (!thread->IsSuspended()) {
if (ATRACE_ENABLED()) {
std::ostringstream oss;
thread->ShortDump(oss);
ATRACE_BEGIN((std::string("Waiting for suspension of thread ") + oss.str()).c_str());
}
// Busy wait until the thread is suspended.
const uint64_t start_time = NanoTime();
do {
ThreadSuspendSleep(kThreadSuspendInitialSleepUs);
} while (!thread->IsSuspended());
const uint64_t total_delay = NanoTime() - start_time;
// Shouldn't need to wait for longer than 1000 microseconds.
constexpr uint64_t kLongWaitThreshold = MsToNs(1);
ATRACE_END();
if (UNLIKELY(total_delay > kLongWaitThreshold)) {
LOG(WARNING)
{
MutexLock mu2(self, *Locks::thread_suspend_count_lock_);
thread->ModifySuspendCount(self, -1, false); // 当前thread dump 完成后,将其suspend count -1,不在需要保持suspend状态了;
}
}
{
// Imitate ResumeAll, threads may be waiting on Thread::resume_cond_ since we raised their
// suspend count. Now the suspend_count_ is lowered so we must do the broadcast.
MutexLock mu2(self, *Locks::thread_suspend_count_lock_);
Thread::resume_cond_->Broadcast(self); // 通知那些suspended线程,可以Resume了;
}
return count;
}
在这里有两个点需要解释下:
1.线程的kRunnable状态和Suspended状态:
enum ThreadState {
// Thread.State JDWP state
kTerminated = 66, // TERMINATED TS_ZOMBIE Thread.run has returned, but Thread* still around
kRunnable, // RUNNABLE TS_RUNNING runnable
kTimedWaiting, // TIMED_WAITING TS_WAIT in Object.wait() with a timeout
kSleeping, // TIMED_WAITING TS_SLEEPING in Thread.sleep()
kBlocked, // BLOCKED TS_MONITOR blocked on a monitor
kWaiting, // WAITING TS_WAIT in Object.wait()
kWaitingForGcToComplete, // WAITING TS_WAIT blocked waiting for GC
kWaitingForCheckPointsToRun, // WAITING TS_WAIT GC waiting for checkpoints to run
kWaitingPerformingGc, // WAITING TS_WAIT performing GC
kWaitingForDebuggerSend, // WAITING TS_WAIT blocked waiting for events to be sent
kWaitingForDebuggerToAttach, // WAITING TS_WAIT blocked waiting for debugger to attach
kWaitingInMainDebuggerLoop, // WAITING TS_WAIT blocking/reading/processing debugger events
kWaitingForDebuggerSuspension, // WAITING TS_WAIT waiting for debugger suspend all
kWaitingForJniOnLoad, // WAITING TS_WAIT waiting for execution of dlopen and JNI on load code
kWaitingForSignalCatcherOutput, // WAITING TS_WAIT waiting for signal catcher IO to complete
kWaitingInMainSignalCatcherLoop, // WAITING TS_WAIT blocking/reading/processing signals
kWaitingForDeoptimization, // WAITING TS_WAIT waiting for deoptimization suspend all
kWaitingForMethodTracingStart, // WAITING TS_WAIT waiting for method tracing to start
kWaitingForVisitObjects, // WAITING TS_WAIT waiting for visiting objects
kWaitingForGetObjectsAllocated, // WAITING TS_WAIT waiting for getting the number of allocated objects
kStarting, // NEW TS_WAIT native thread started, not yet ready to run managed code
kNative, // RUNNABLE TS_RUNNING running in a JNI native method
kSuspended, // RUNNABLE TS_RUNNING suspended by GC or debugger
};
其中,thread在运行的3中状态:
kRunnable, // 正在运行,可能会存在heap上的内存分配和 java函数跳转
kNative, // 是指在执行 Jni Native method,不会影响Java堆 heap的分配和GC,不存在java函数跳转
kSuspended, //线程其实是在Runnable中 Wait,wait resume condition
kRunnable是指当前线程正在运行,
kSuspended是指当前线程从其他状态要切换到kRunnable状态时,检查当前线程是否有kSuspendRequest,
如果有suspend Request,则进行wait,代码不在继续执行,线程变成kSuspended状态,直到 Suspend count发生变化,变为0后才会切换到Runnable状态;
这也是为什么GC的时候需要 SuspendAll线程,因为Suspend后,此时的heap是被锁定的,不存在对java heap的操作,以便来进行GC线程操作heap;
2.CheckPoint
提到CheckPoint必须要提到safe point;
safepoint:对于ART编译的代码,可以定期轮询当前Runtime来确认是否需要执行某些特定代码;可以认为这些轮询时的点,就是safepoint;
safepoint可以用来实现暂定一个java线程,也可以用来实现Checkpoint机制;
比如:当正在执行java代码的线程A执行到safepoint时,会执行CheckSuspend函数,在发现当前线程有 checkpoint request时,
会在这个点执行线程的CheckPoint函数;如果发现当前线程有suspend request时,会进行SuspendCheck,使得线程进入Suspend状态(暂停);
所以说,ART CheckPoint应该是safepoint的一个功能实现;
下面引用网上一段话:
作者:RednaxelaFX
链接: https://www.zhihu.com/question/48996839/answer/113801448
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
从编译器和解释器的角度看,ART的safepoint有两种:
主动safepoint:编译生成的代码里或者解释代码里有主动检查safepoint的动作,并在发现需要进入safepoint时跳转到相应的处理程序里。
ART的解释器安插主动safepoint的位置在循环的回跳处(backedge,具体来说是在跳转前的源头处)以及方法返回处(return / throw exception)。
ART Optimizing Compiler安插主动safepoint的位置在循环回跳处(backedge,具体来说是在跳转前的源头处)以及方法入口处(entry)。
被动safepoint:所有未内联的方法调用点(call site)都是被动safepoint。这里并没有任何需要主动执行的代码,而就是个普通的方法调用。
之所以要作为safepoint,是因为执行到方法调用点之后,控制就交给了被调用的方法,而被调用的方法可能会进入safepoint,safepoint中可能需要遍历栈帧,因此caller也必须处于safepoint。
安插safepoint的位置的思路是:程序要能够在runtime发出需要safepoint的请求后,及时地执行到最近的safepoint然后把控制权交给runtime。
怎样算“及时”?只要执行时间是有上限(bounded)就可以了,实时性要求并不是很高。
于是进一步假设,向前执行(直线型、带条件分支都算)的代码都会在有限时间内执行完,所以可以不用管;而可能导致长时间执行的代码,要么是循环,要么是方法调用,所以只要在这两种地方插入safepoint就可以保证及时性了。
至于具体在方法入口还是出口、循环回边的源头还是目标处插入safepoint,这是个具体实现的细节,只要选择一边插入就可以了。
所以,对于前面的一行代码:
// 对于Runnable的线程,把checkpoint_function设置到当前线程的 CheckPoint function列表中,当线程执行到CheckPoint时,会执行该CheckPoint function
if (thread->RequestCheckpoint(checkpoint_function)) {
处于Runnable的线程,我们设置了checkpoint_function和 CheckPoint Request,那么这个线程终归要执行到CheckPoint,从而执行check_point function.
前面提到safepoint的实时性要求不高,可以给个时间概念,一个函数的运行时间之内肯定会执行到CheckPoint;
但也会受到其他因素的影响,比如线程调度,假如一个线程A在Runnable状态,将要执行到safepoint,但此时,该线程不在得到调度,就会一直执行不到safepoint;
正对本例中,正常情况下的流程是:Runnable的线程在执行到safepoint时,发现有CheckPoint请求,从而执行CheckPoint函数,
此处CheckPoint函数已经被设置了 DumpCheckPoint的Run()函数,从而进行thread dump;
至此,suspended 状态和 Runnable状态的线程的Dump调用点都说清楚了。