背景描述
公司里面有一个部门产生的故障,故障问题是服务器oom,导致服务不可用。最终排查到的原因是有一个方法,在内部创建了线程池。每次调用该方法都会创建一个线程池,从而导致了oom。
问题分析
抛砖引玉,先思考一下问题
这个oom是由于什么原因造成的呢?是常规的堆内存溢出导致的oom吗?创建的线程池对象,会被gc回收吗?
源码分析
在使用线程池运行任务的时候,做了哪些事情
// 核心运行运行的代码为addWorker() ++++++++++++++++++++
if (workerCountOf(c) < corePoolSize) {
if (addWorker(command, true))
return;
c = ctl.get();
}
if (isRunning(c) && workQueue.offer(command)) {
int recheck = ctl.get();
if (! isRunning(recheck) && remove(command))
reject(command);
else if (workerCountOf(recheck) == 0)
addWorker(null, false);
}
else if (!addWorker(command, false))
reject(command);
// work 构造器
Worker(Runnable firstTask) {
setState(-1); // inhibit interrupts until runWorker
// 传入的任务为firstTask对象 ++++++++++++++++++++
this.firstTask = firstTask;
// 此处将work对象放入到线程中 ++++++++++++++++++++
this.thread = getThreadFactory().newThread(this);
}
Worker w = null;
try {
// 创建work 对象 ++++++++++++++++++++
w = new Worker(firstTask);
final Thread t = w.thread;
if (t != null) {
final ReentrantLock mainLock = this.mainLock;
mainLock.lock();
try {
// Recheck while holding lock.
// Back out on ThreadFactory failure or if
// shut down before lock acquired.
int rs = runStateOf(ctl.get());
if (rs < SHUTDOWN ||
(rs == SHUTDOWN && firstTask == null)) {
if (t.isAlive()) // precheck that t is startable
throw new IllegalThreadStateException();
workers.add(w);
int s = workers.size();
if (s > largestPoolSize)
largestPoolSize = s;
workerAdded = true;
}
} finally {
mainLock.unlock();
}
if (workerAdded) {
// work对象的线程执行 ++++++++++++++++++++
t.start();
workerStarted = true;
}
}
} finally {
if (! workerStarted)
addWorkerFailed(w);
}
// 此处为死循环进行执行。 执行task或者去获取task进行执行 ++++++++++++++++++++
while (task != null || (task = getTask()) != null) {
w.lock();
// If pool is stopping, ensure thread is interrupted;
// if not, ensure thread is not interrupted. This
// requires a recheck in second case to deal with
// shutdownNow race while clearing interrupt
if ((runStateAtLeast(ctl.get(), STOP) ||
(Thread.interrupted() &&
runStateAtLeast(ctl.get(), STOP))) &&
!wt.isInterrupted())
wt.interrupt();
try {
beforeExecute(wt, task);
Throwable thrown = null;
try {
task.run();
} catch (RuntimeException x) {
thrown = x; throw x;
} catch (Error x) {
thrown = x; throw x;
} catch (Throwable x) {
thrown = x; throw new Error(x);
} finally {
afterExecute(task, thrown);
}
} finally {
task = null;
w.completedTasks++;
w.unlock();
}
}
try {
Runnable r = timed ?
workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
// 如果是核心线程,此处会阻塞去获取任务 ++++++++++++++++++++
workQueue.take();
if (r != null)
return r;
timedOut = true;
} catch (InterruptedException retry) {
timedOut = false;
}
为何会oom
从源码看,此处执行的线程会一直阻塞在队列处,进行获取任务的操作,所以该线程会一直存在jvm中。(这一块也正是线程池可以复用的核心逻辑)
因为jvm和系统的限制,创建的线程数量是有上限的,所以当线程数量达到峰值时,就会报OOM:unable to create new native thread.
测试
核心问题点,在与用线程池执行了一个任务,导线程池中的核心线程被启用,然后进入到了阻塞状态
例子1:当不调用线程池时,只创建,不会引起oom
public static void main(String[] args) throws InterruptedException {
AtomicInteger count = new AtomicInteger(0);
while (true) {
// 可以不进行休眠,否则jvm监控图的数据更新速度很慢
Thread.sleep(1);
System.err.println("创建一个线程池:" + count.addAndGet(1));
ThreadExecutor pool = new ThreadExecutor(1, 1, 0L, TimeUnit.SECONDS, new LinkedBlockingQueue<>(1024));
// pool.execute(() -> System.err.println("123"));
}
}
- jvm监控图
这种情况受到影响的只有堆空间,但不会造成OOM,因为并没有去启用这个线程,当堆使用率达到限制就会触发GC进行回收,不会造成内存泄漏,但是超高并发下,堆的使用会造成负荷。
例子2:当调用了线程池,启用了核心线程,就会造成OOM
public static void main(String[] args) throws InterruptedException {
AtomicInteger count = new AtomicInteger(0);
while (true) {
// 必须要休眠,否则来不及打开jvm监控图
Thread.sleep(1);
System.err.println("创建一个线程池:" + count.addAndGet(1));
ThreadExecutor pool = new ThreadExecutor(1, 1, 0L, TimeUnit.SECONDS, new LinkedBlockingQueue<>(1024));
pool.execute(() -> System.err.println("123"));
}
}
- jvm监控图
当调用了线程池的execute方法后,线程池就会去创建核心线程(Work),死循环并阻塞在队列获取任务处,即使GC回收了对象,线程依然会继续阻塞在这里。
我的电脑最大可创建线程数大概为 4*1024 个线程,所以达到峰值后,将无法继续创建线程,这时就会报oom