线程池的子线程异常分析

最新推荐文章于 2022-08-05 20:32:56 发布

长大的小小

最新推荐文章于 2022-08-05 20:32:56 发布

阅读量444

点赞数

分类专栏： juc 文章标签：线程池线程池子线程线程异常分析

本文链接：https://blog.csdn.net/qq_39722475/article/details/105620380

版权

juc 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

问题描述

最近遇到自动任务停止的问题。用到的技术是Quartz、信号量、线程池。

先分析了信号量，排除嫌疑。
怀疑过Quartz的版本太低，还未开始求证。
那就分析线程池了。

一边猜测，一边百度，还受到一个不靠谱的博客误导。。。还以为是子线程的异常没有正确处理，导致整个任务阻塞了。

原谅我一年多前最初写的时候只是知道基础的用法，那时的我还停留在–我们不生产代码，我们只是代码的搬运工–级别。
现在遇到问题，尝试从源码去分析。

java.util.concurrent.AbstractExecutorService

public <T> List<Future<T>> invokeAll(Collection<? extends Callable<T>> tasks)
        throws InterruptedException {
        if (tasks == null)
            throw new NullPointerException();
        ArrayList<Future<T>> futures = new ArrayList<>(tasks.size());
        try {
            for (Callable<T> t : tasks) {
                RunnableFuture<T> f = newTaskFor(t);
                futures.add(f);
                execute(f);
            }
            for (int i = 0, size = futures.size(); i < size; i++) {
                Future<T> f = futures.get(i);
                if (!f.isDone()) {
                    try { f.get(); }
                    catch (CancellationException ignore) {}
                    catch (ExecutionException ignore) {}
                }
            }
            return futures;
        } catch (Throwable t) {
            cancelAll(futures);
            throw t;
        }
    }

怎么从这里开始呢，因为我最初用的是for循环add future task ，再循环future.get。后来改成invokeAll方法，保证所有子线程执行完毕。以前真没点进去看过这个方法。

初看没什么特别，但是异常的处理，非特定异常出现，比如InterruptedException会中断后续的任务执行（cancelAll的含义）。

那什么异常不中止呢？敲黑板，划重点，注意中间的catch块中的ExecutionException。那就继续跟Future的get方法。

java.util.concurrent.FutureTask

public V get() throws InterruptedException, ExecutionException {
        int s = state;
        if (s <= COMPLETING)
            s = awaitDone(false, 0L);
        return report(s);
    }

awaitDone等待任务执行完毕，任务的返回值或者报出异常放到该方法的返回值中。report根据当前任务的状态处理返回结果。

private V report(int s) throws ExecutionException {
        Object x = outcome;
        if (s == NORMAL)
            return (V)x;
        if (s >= CANCELLED)
            throw new CancellationException();
        throw new ExecutionException((Throwable)x);
    }

注意子线程的异常包装成ExecutionException。结合上面最初的代码，也就是某个子线程的异常不会中断其它子线程的执行、正常返回。

以上分析排除子线程异常的影响，那么会不会是队列不够用了呢？

ThreadPoolExecutor

public void execute(Runnable command) {
        if (command == null)
            throw new NullPointerException();
        /*
         * Proceed in 3 steps:
         *
         * 1. If fewer than corePoolSize threads are running, try to
         * start a new thread with the given command as its first
         * task.  The call to addWorker atomically checks runState and
         * workerCount, and so prevents false alarms that would add
         * threads when it shouldn't, by returning false.
         *
         * 2. If a task can be successfully queued, then we still need
         * to double-check whether we should have added a thread
         * (because existing ones died since last checking) or that
         * the pool shut down since entry into this method. So we
         * recheck state and if necessary roll back the enqueuing if
         * stopped, or start a new thread if there are none.
         *
         * 3. If we cannot queue task, then we try to add a new
         * thread.  If it fails, we know we are shut down or saturated
         * and so reject the task.
         */
        int c = ctl.get();
        if (workerCountOf(c) < corePoolSize) {
            if (addWorker(command, true))
                return;
            c = ctl.get();
        }
        if (isRunning(c) && workQueue.offer(command)) {
            int recheck = ctl.get();
            if (! isRunning(recheck) && remove(command))
                reject(command);
            else if (workerCountOf(recheck) == 0)
                addWorker(null, false);
        }
        else if (!addWorker(command, false))
            reject(command);
    }

小于核心线程数，启动新的线程（addWorker），否则扔给workQueue。
我使用的是newFixedThreadPool创建线程池，默认的是LinkedBlockingQueue，这个队列的最大程度是2^31-1，即2147483647。

那需要看添加任务的方法。
ThreadPoolExecutor

// 截取不完整，现实的是该方法的部分
private boolean addWorker(Runnable firstTask, boolean core) {
        retry:
        for (int c = ctl.get();;) {
            // Check if queue empty only if necessary.
            if (runStateAtLeast(c, SHUTDOWN)
                && (runStateAtLeast(c, STOP)
                    || firstTask != null
                    || workQueue.isEmpty()))
                return false;

            for (;;) {
                if (workerCountOf(c)
                    >= ((core ? corePoolSize : maximumPoolSize) & COUNT_MASK))
                    return false;
                if (compareAndIncrementWorkerCount(c))
                    break retry;
                c = ctl.get();  // Re-read ctl
                if (runStateAtLeast(c, SHUTDOWN))
                    continue retry;
                // else CAS failed due to workerCount change; retry inner loop
            }
        }

& COUNT_MASK是个位与计算，限制最大值，那么这个值是多少呢，2^29-1，即536870911。

就反馈的情况，重启前有700笔卡在那里。嗯，看来不是队列不足这个问题。

总结

目前没啥成果，不过有人提出建设性的思路，可能是数据库链接数不够用了，导致的阻塞。后续再说。

补充

之前不知道为什么线程池没有任务了，为什么main方法还不退出（就是那个红色的小方块还在亮着）。现在加以说明：

ThreadPoolExecutor

final void runWorker(Worker w) {
        Thread wt = Thread.currentThread();
        Runnable task = w.firstTask;
        w.firstTask = null;
        w.unlock(); // allow interrupts
        boolean completedAbruptly = true;
        try {
            while (task != null || (task = getTask()) != null) {
                w.lock();
                // If pool is stopping, ensure thread is interrupted;
                // if not, ensure thread is not interrupted.  This
                // requires a recheck in second case to deal with
                // shutdownNow race while clearing interrupt
                if ((runStateAtLeast(ctl.get(), STOP) ||
                     (Thread.interrupted() &&
                      runStateAtLeast(ctl.get(), STOP))) &&
                    !wt.isInterrupted())
                    wt.interrupt();
                try {
                    beforeExecute(wt, task);
                    try {
                        task.run();
                        afterExecute(task, null);
                    } catch (Throwable ex) {
                        afterExecute(task, ex);
                        throw ex;
                    }
                } finally {
                    task = null;
                    w.completedTasks++;
                    w.unlock();
                }
            }
            completedAbruptly = false;
        } finally {
            processWorkerExit(w, completedAbruptly);
        }
    }

这个代码放这，想说的是，以后可以继承ThreadPoolExecutor进行自定义前置和后置处理。

private void processWorkerExit(Worker w, boolean completedAbruptly) {
        if (completedAbruptly) // If abrupt, then workerCount wasn't adjusted
            decrementWorkerCount();

        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try {
            completedTaskCount += w.completedTasks;
            workers.remove(w);
        } finally {
            mainLock.unlock();
        }

        tryTerminate();

        int c = ctl.get();
        if (runStateLessThan(c, STOP)) {
            if (!completedAbruptly) {
                int min = allowCoreThreadTimeOut ? 0 : corePoolSize;
                if (min == 0 && ! workQueue.isEmpty())
                    min = 1;
                if (workerCountOf(c) >= min)
                    return; // replacement not needed
            }
            addWorker(null, false);
        }
    }