线程池原理简单解析

sutonline

已于 2023-08-17 17:51:11 修改

阅读量188

点赞数

分类专栏： JAVA 文章标签： java

于 2021-09-08 22:41:42 首次发布

本文链接：https://blog.csdn.net/kang389110772/article/details/120190693

版权

JAVA 专栏收录该内容

198 篇文章 6 订阅

订阅专栏

记得在4年前面试的时候，在追问为什么要用线程池呢？很简单的回答了因为线程池不用重复创建线程，重复创建线程是一个比较对性能有影响的动作。那么线程池的原理是什么呢? 这个问题一直没去研究，今天总结一下。

总的结构

无论是使用Executors还是Guava的ThreadFactory去构建线程池，都构建的是ThreadPoolExecutor，所以这个就是我们这次分析的目标。

ctl变量

首先不得不提一个神奇的变量ctl，这里我们简单理解它包含了两个部分: 线程池状态前3位和线程池worker数量。可以参见下面一张图.

在这里插入图片描述

核心的worker集合

private final HashSet<Worker> workers = new HashSet<Worker>();

是一个HashSet的worker集合，就是我们工作线程的集合。那么让我们看一下Worker的代码:

private final class Worker
        extends AbstractQueuedSynchronizer
        implements Runnable {
        
        final Thread thread;
        /** Initial task to run.  Possibly null. */
        Runnable firstTask;
        
		Worker(Runnable firstTask) {
            setState(-1); // inhibit interrupts until runWorker
            this.firstTask = firstTask;
            this.thread = getThreadFactory().newThread(this);
        }        
}

Worker对象里主要包含了两个内容: Thread和firstTask，而且Woker实现了Runnable，传给Thread的runnable对象是自己。

那么我们看一下Thread的代码, new之后主要调用了init方法

private void init(ThreadGroup g, Runnable target, String name,
                      long stackSize, AccessControlContext acc,
                      boolean inheritThreadLocals) {
        if (name == null) {
            throw new NullPointerException("name cannot be null");
        }

        this.name = name;

        ...
}

但这里并没有发现任何分配动作，只是新建一个对象。

任务列表

private final BlockingQueue<Runnable> workQueue;

一个线程安全的Queue，这里不多说。

我们看看线程怎么起来并且怎么保持的吧

运行起来

从java.util.concurrent.ThreadPoolExecutor#addWorker方法可以看到，如果worker添加成功，那么就启动Woker中的线程

if (workerAdded) {
   t.start();
   workerStarted = true;
}

那么我们看看Thread.start方法吧.

try {
    start0();
    started = true;
} finally {
...

这里看到调用了Native的start0方法，从JVM.C中发现调用了操作系统创建线程并调用了Thread.run方法。(之所以没贴代码，是因为这段代码的确看不太懂orz)

那么关键的问题来了，当Thread执行完这个任务后，会怎么样?

答案是等待任务。

源码方法是: java.util.concurrent.ThreadPoolExecutor#runWorker

final void runWorker(Worker w) {
   Thread wt = Thread.currentThread();
   Runnable task = w.firstTask;
	 w.firstTask = null;
   ...
	 boolean completedAbruptly = true;
	 try {
 			while (task != null || (task = getTask()) != null) {
            ...
            try {
               ...
                try {
                    task.run();
                } catch (RuntimeException x) {
                    thrown = x; throw x;
                } catch (Error x) {
                    thrown = x; throw x;
                } catch (Throwable x) {
                    thrown = x; throw new Error(x);
                } finally {
                    afterExecute(task, thrown);
                }
            } finally {
                task = null;
                ...
            }
        }
        completedAbruptly = false;
    } finally {
        processWorkerExit(w, completedAbruptly);
   }
}

执行之后会在等待task = getTask()，看到这里我们其实就知道线程其实还在运行，只不过从运行任务变成了等待任务的状态，线程没有退出。

我们再看一下getTask方法。

for (;;) {
    int c = ctl.get();
    int rs = runStateOf(c);

    // Check if queue empty only if necessary.
    if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) {
        decrementWorkerCount();
        return null;
    }

    int wc = workerCountOf(c);

    // Are workers subject to culling?
    boolean timed = allowCoreThreadTimeOut || wc > corePoolSize;

    if ((wc > maximumPoolSize || (timed && timedOut))
        && (wc > 1 || workQueue.isEmpty())) {
        if (compareAndDecrementWorkerCount(c))
            return null;
        continue;
    }

    try {
        Runnable r = timed ?
            workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
            workQueue.take();
        if (r != null)
            return r;
        timedOut = true;
    } catch (InterruptedException retry) {
        timedOut = false;
    }
}

代码中可以看到:

会先判断是否停止状态，停止状态的话就返回null
接着判断是否有core线程也可以减少(默认false)或者当前线程数大于core
如果判断为true时，那么就按照keepAliveTime作为超时时间去拉取，拉取失败就返回null，那么worker就会退出，线程也会结束。

任务是怎么进来的？

这源于一个线上问题。我们的RPC调用可以配置线程池，为了减少响应时间，我把线程池的队列配的很小，固定为8。这导致了经常会碰到线程池溢出的情况，但是从活跃线程池上又很少，不可能出现线程不够用的情况。

在我们的直觉里，我们的任务进来就是给了worker线程，但是事实真的如此吗?
让我们再来看一次源码。

public void execute(Runnable command) {
        if (command == null)
            throw new NullPointerException();
        /*
         * Proceed in 3 steps:
         *
         * 1. If fewer than corePoolSize threads are running, try to
         * start a new thread with the given command as its first
         * task.  The call to addWorker atomically checks runState and
         * workerCount, and so prevents false alarms that would add
         * threads when it shouldn't, by returning false.
         *
         * 2. If a task can be successfully queued, then we still need
         * to double-check whether we should have added a thread
         * (because existing ones died since last checking) or that
         * the pool shut down since entry into this method. So we
         * recheck state and if necessary roll back the enqueuing if
         * stopped, or start a new thread if there are none.
         *
         * 3. If we cannot queue task, then we try to add a new
         * thread.  If it fails, we know we are shut down or saturated
         * and so reject the task.
         */
        int c = ctl.get();
        if (workerCountOf(c) < corePoolSize) {
            if (addWorker(command, true))
                return;
            c = ctl.get();
        }
        if (isRunning(c) && workQueue.offer(command)) {
            int recheck = ctl.get();
            if (! isRunning(recheck) && remove(command))
                reject(command);
            else if (workerCountOf(recheck) == 0)
                addWorker(null, false);
        }
        else if (!addWorker(command, false))
            reject(command);
    }

那么，是否有我们我们直观印象，有空闲线程就可以执行的呢? 是有的，通过看我们的RPC框架代码，其中用到了一个关键的虚拟队列: SynchronousQueue。

我们来看下这个队列的put方法

public void put(E e) throws InterruptedException {
    if (e == null) throw new NullPointerException();
    if (transferer.transfer(e, false, 0) == null) {
        Thread.interrupted();
        throw new InterruptedException();
    }
}

如果transfer为null，则抛出异常。否则就代表成功。

那接下来我们看下transferer的transfer方法实现。

E transfer(E e, boolean timed, long nanos) {
    /* Basic algorithm is to loop trying to take either of
     * two actions:
     *
     * 1. If queue apparently empty or holding same-mode nodes,
     *    try to add node to queue of waiters, wait to be
     *    fulfilled (or cancelled) and return matching item.
     *
     * 2. If queue apparently contains waiting items, and this
     *    call is of complementary mode, try to fulfill by CAS'ing
     *    item field of waiting node and dequeuing it, and then
     *    returning matching item.
     *
     * In each case, along the way, check for and try to help
     * advance head and tail on behalf of other stalled/slow
     * threads.
     *
     * The loop starts off with a null check guarding against
     * seeing uninitialized head or tail values. This never
     * happens in current SynchronousQueue, but could if
     * callers held non-volatile/final ref to the
     * transferer. The check is here anyway because it places
     * null checks at top of loop, which is usually faster
     * than having them implicitly interspersed.
     */

    QNode s = null; // constructed/reused as needed
    boolean isData = (e != null);

    for (;;) {
        QNode t = tail;
        QNode h = head;
        // 如果头部和尾部都是null 代表队列还未初始化完成 就自旋
        if (t == null || h == null)         // saw uninitialized value
            continue;                       // spin

        // 如果头部 = 尾部，或者尾部节点的类型与要加进来的类型一样
        if (h == t || t.isData == isData) { // empty or same-mode
            // 尾部的next指针是否已有数据 是的话就CAS把next指向的数据置为新的tail 目的是保证tail的next为null
            QNode tn = t.next;
            if (t != tail)                  // inconsistent read
                continue;
            if (tn != null) {               // lagging tail
                advanceTail(t, tn);
                continue;
            }
            // 如果是带超时时间并且不能等待 那么就返回null
            if (timed && nanos <= 0)        // can't wait
                return null;
            // 把我们的任务包装成QNode 并把t.next指向我们新建的QNode
            if (s == null)
                s = new QNode(e, isData);
            if (!t.casNext(null, s))        // failed to link in
                continue;

            // 把我们任务的指针作为队尾指针
            advanceTail(t, s);              // swing tail and wait
            // 等待e被匹配(生产者和消费者进行匹配)
            Object x = awaitFulfill(s, e, timed, nanos);
            // 如果s已经取消 那么就清理
            if (x == s) {                   // wait was cancelled
                clean(t, s);
                return null;
            }

            // 如果s还没有脱链 那么就queue向前移动 并且将s.water线程设置为null
            if (!s.isOffList()) {           // not already unlinked
                advanceHead(t, s);          // unlink if head
                if (x != null)              // and forget fields
                    s.item = s;
                s.waiter = null;
            }

            // 返回数据
            return (x != null) ? (E)x : e;
        } else {
            // 如果不是同样的类型，并且有元素                            // complementary-mode
            QNode m = h.next;               // node to fulfill
            if (t != tail || m == null || h != head)
                continue;                   // inconsistent read

            Object x = m.item;
            if (isData == (x != null) ||    // m already fulfilled
                x == m ||                   // m cancelled
                !m.casItem(x, e)) {         // lost CAS
                advanceHead(h, m);          // dequeue and retry
                continue;
            }

            // 将h节点前移
            advanceHead(h, m);              // successfully fulfilled
            // 唤醒m的wait线程
            LockSupport.unpark(m.waiter);
            // 返回元素
            return (x != null) ? (E)x : e;
        }
    }
}

其中awaitFulfill方法是等待匹配的方法。

Object awaitFulfill(QNode s, E e, boolean timed, long nanos) {
    /* Same idea as TransferStack.awaitFulfill */
    final long deadline = timed ? System.nanoTime() + nanos : 0L;
    // 取当前线程
    Thread w = Thread.currentThread();
    int spins = ((head.next == s) ?
                 (timed ? maxTimedSpins : maxUntimedSpins) : 0);
    for (;;) {
    	// 如果是被中断或者超时的话 就取消 否则就尝试直到满足时间或者次数
        if (w.isInterrupted())
            s.tryCancel(e);
        Object x = s.item;
        // 如果s的item不等于e了 就代表e已经被匹配 返回x(因为s会把当前线程进行等待)
        if (x != e)
            return x;
        if (timed) {
            nanos = deadline - System.nanoTime();
            if (nanos <= 0L) {
                s.tryCancel(e);
                continue;
            }
        }
        if (spins > 0)
            --spins;
        // 如果等待线程是null 那么就把当前线程作为s的等待线程
        else if (s.waiter == null)
            s.waiter = w;
        else if (!timed)
            LockSupport.park(this);
        else if (nanos > spinForTimeoutThreshold)
            LockSupport.parkNanos(this, nanos);
    }
}