阻塞线程池

相关文档内容:

排队
所有 BlockingQueue 都可用于传输和保持提交的任务。可以使用此队列与池大小进行交互:
    如果运行的线程少于 corePoolSize,则 Executor 始终首选添加新的线程,而不进行排队。
   如果运行的线程等于或多于 corePoolSize,则 Executor 始终首选将请求加入队列,而不添加新的线程。
   如果无法将请求加入队列,则创建新的线程,除非创建此线程超出 maximumPoolSize,在这种情况下,任务将被拒绝。
排队有三种通用策略:
   直接提交(注:Executors.newCachedThreadPool,按需创建新线程,不让任务入等待队列)。工作队列的默认选项是 SynchronousQueue,它将任务直接提交给线程而不保持它们。在此,如果不存在可用于立即运行任务的线程,则试图把任务加入队列将失败,因此会构造一个新的线程。此策略可以避免在处理可能具有内部依赖性的请求集时出现锁。直接提交通常要求无界 maximumPoolSizes 以避免拒绝新提交的任务。当命令以超过队列所能处理的平均数连续到达时,此策略允许无界线程具有增长的可能性。
   无界队列(注:Executors.newFixedThreadPool,可用线程数固定,但接受任务无阻塞,加入无限制等待队列)。使用无界队列(例如,不具有预定义容量的 LinkedBlockingQueue)将导致在所有 corePoolSize 线程都忙时新任务在队列中等待。这样,创建的线程就不会超过 corePoolSize。(因此,maximumPoolSize 的值也就无效了。)当每个任务完全独立于其他任务,即任务执行互不影响时,适合于使用无界队列;例如,在 Web 页服务器中。这种排队可用于处理瞬态突发请求,当命令以超过队列所能处理的平均数连续到达时,此策略允许无界线程具有增长的可能性。
   有界队列(注:暂无官方实现)。当使用有限的 maximumPoolSizes 时,有界队列(如 ArrayBlockingQueue)有助于防止资源耗尽,但是可能较难调整和控制。队列大小和最大池大小可能需要相互折衷:使用大型队列和小型池可以最大限度地降低 CPU 使用率、操作系统资源和上下文切换开销,但是可能导致人工降低吞吐量。如果任务频繁阻塞(例如,如果它们是 I/O 边界),则系统可能为超过您许可的更多线程安排时间。使用小型队列通常要求较大的池大小,CPU 使用率较高,但是可能遇到不可接受的调度开销,这样也会降低吞吐量。

---------------------------------------------------

如下文所述,execute() 中访问阻塞队列(即使是有界阻塞队列),用的是offer(),不是put()。所以阻塞线程池无法简单通过提供有界阻塞队列来构造ThreadPoolExecutor而实现。

推荐使用信号量控制的实现方法。从下面的实现细节可看出,新类的execute()也是线程安全方法。submit() 底层调用execute(),所以不必重新实现。

---------------------------------------------------

Creating a NotifyingBlockingThreadPoolExecutor

October 23, 2008

A Thread Pool is a useful tool for performing a collection of tasks in parallel. This becomes more and more relevant as CPUs introduce multi-core architectures that can benefit from parallelizing our programs. Java 5 introduced this framework as part of the new concurrency support, with the ThreadPoolExecutor class and other assisting classes. TheThreadPoolExecutor framework is powerful yet flexible enough, allowing user-specific configurations and providing relevant hooks and saturation strategies to deal with a full queue. To best follow this article, you may find it useful to open the"http://java.sun.com/javase/6/docs/api/java/util/concurrent/ThreadPoolExecutor.html"> ThreadPoolExecutor Java API in a parallel tab.

The Need for a Blocking Thread Pool

Recently, my colleague Yaneeve Shekel had the need for a thread pool that would work on several tasks in parallel but would wait to add new tasks until a free thread was there to handle them. This is really not something bizarre: in fact, this need is quite common. Yaneeve needed it to analyze a huge directory with a very long list of files, where there was no point in piling on more and moreFileAnalyzeTask instances without a free thread to handle them. The analyze operation takes some time, while the speed in which we can pile files for analysis is much higher. Thus, not controlling for thread availability for the task would create a huge queue with a possible memory problem, and for no benefit.

Other cases in which you'd need a thread pool that can wait to add new tasks:

  • Doing some in-memory task on a long list of database records. You would not want to run and turn each record to a task in theThreadPoolExecutor queue while the threads are busy with some long operation on previous records, as doing this would exhaust your memory. The right way to do it is to query the database, run over the result set and create enough tasks for a fixed sized queue, and then wait until there is room in the queue. You can use a cursor to represent the result set, but even if you get back a dynamic result set, the database will not reply with the entire bulk of records; it will send you a limited amount of records and update your result set object while you run over it, forwarding to the next records of your result set, thus only forwarding through the result set. When the queue is ready for more tasks, it reads the next records from the database.

  • Analyzing a long file with "independent lines": each line can be analyzed separately by a different thread. Again, there is no sense in reading the entire file intoLineTask objects if there is no available thread to handle them. This scenario is in fact a true need"http://www.ibm.com/developerworks/forums/message.jspa?messageID=13785440"> raised in a forum asking for a recommended solution.

The problem is that ThreadPoolExecutor doesn't give you the required behavior -- blocking when the queue is full -- out of the box. A feature request was even submitted to the Java Bug database ( "http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6648211">Bug Id 6648211, "Need for blockingThreadPoolExecutor"), but it was put on "very low priority," as the user is supposedly able to quite easily implement this behavior.

At a first glance it looks odd; you think that a ThreadPoolExecutor with a boundedBlockingQueue will give you exactly this behavior. But apparently it does not. In fact, by default it throwsRejectedExecutionException if a task is submitted and the queue is full. This happens becauseThreadPoolExecutor.execute(Runnable) does not call the blocking methodBlockingQueue.put(...) when queuing a task, but rather the unblockingQueue.offer(...), with a timeout of 0, which means "try but do not wait.". And if the result is false (offer failed), it calls the saturation policy -- the assignedRejectExecutionHandler for this thread pool -- with the default handler throwing an exception. Though it seems that there is no real logic in this, it is in fact a design decision, allowing the user to react to the fact that a task is rejected rather than just deciding in the framework to wait or block.

Suggested Solutions

There are several ways to allow blocking on a full queue:

  • We may implement our own BlockingThreadPoolExecutor and override theexecute(...) method, so it will call theBlockingQueue.put(...) instead ofBlockingQueue.offer(...). But this may not be so elegant as we interfere quite brutally in howexecute() works (and we cannot callsuper.execute(...) since we do the queuing).

  • There is the option to create a ThreadPoolExecutor with theCallerRunsPolicy reject strategy. This strategy, in the case of a full queue, sends the exceeding task to be executed by the thread that calledexecute() (the producer), thus killing two birds with one stone: the task is handled and the producer is busy in handling the task and not in overloading the queue with additional tasks. There are, however, two flaws in this strategy. First, the task is not handled in the order it was produced; this is usually not so problematic anyhow, as there is no real guarantee on the order of context switch between the worker threads that influences task progress and order. Second, when the producer is working on its task, no one fills the queue. So if one of the worker threads, or more, finish their tasks while the producer is still working, they will become idle. It requires fine configuration tuning of the queue size in order to minimize it, but you can never guarantee to avoid this situation. It would have been nice if there was a way to set theThreadPoolExecutor in a trueLeader-Followers manner (a design pattern in which the producer gets to run the task while a thread from the pool becomes the new producer), but theCallerRunsPolicy strategy does not work like that. (The C++"http://www.cs.wustl.edu/~schmidt/ACE.html">ACE framework for example, implemented the Leader-Followers pattern. For more details on the Leader-Followers pattern, you can follow"http://www.cs.uu.nl/docs/vakken/no/Leader%20Followers.pdf">this presentation.)

  • One can implement a simple "counting" ThreadPoolExecutor that uses aSemaphore initialized to the bound that we want to set, decremented, by callingacquire() atexecute(...), and increased back, by callingrelease() at theafterExecute() hook method, as well as in acatch at the end ofexecute(...) for the reject scenario. The semaphore is acting in this way as a block on the call toexecute(...) and you can in fact use an unboundedBlockingQueue in this case. This solution was suggested by Brian Goetz in a"http://www.ibm.com/developerworks/forums/message.jspa?messageID=13785440"> forum reply, and discussed also in his book"http://www.amazon.com/Java-Concurrency-Practice-Brian-Goetz/dp/0321349601"> Java Concurrency in Practice, by Goetz et al., in listing 8.4. Here is how it will look:

    
    public class BlockingThreadPoolExecutor 
        extends ThreadPoolExecutor {
            
        private Semaphore semaphore;
    
        public BlockingThreadPoolExecutor(..., int bound, ...) {
            super(...);
            this.semaphore = new Semaphore(bound);
        }
    
        @Override
        public void execute(Runnable task) {
            boolean acquired = false;
            do {
                try {
                    semaphore.acquire();
                    acquired = true;
                } catch (InterruptedException e) {
                    // wait forever!
                }                   
            } while(!acquired);
    
            try {
                super.execute(task);
            } catch(RuntimeException e) {
                // specifically, handle RejectedExecutionException  
                semaphore.release();
                throw e;
            } catch(Error e) {
                semaphore.release();
                throw e;
            }
        }
    
        @Override
        protected void afterExecute(Runnable r, Throwable t) {
            semaphore.release();
        }
    }
        

    This is a nice solution. A nice adaptation may be to use tryAcquire(timeout) as it is always a better practice to allow a timeout on blocking operations. But anyway, I personally don't like self-managing the blocking operation when the ThreadPoolExecutor may have its own bounded queue. It doesn't make sense for me. I prefer the following solution that uses the bounded queue blocking and the saturation policy.

  • The fourth solution is to create a ThreadPoolExecutor with a bounded queue and our ownRejectExecutionHandler that will block on the queue waiting for it to be ready to take new tasks. We prefer to wait on the queue with a timeout and to notify the user if the timeout occurs, so that we will not wait forever in case of some problem in pulling the tasks from the queue. However, for most reasonable scenarios, the caller will not have to take any action when the queue is full, as the producer thread will just wait on the queue. I prefer this approach is it seems the most simple using the original design ofThreadPoolExecutor.

Which brings us to this code (see the "#resources">Resources section to download it):


public class BlockingThreadPoolExecutor
    extends ThreadPoolExecutor {

    public BlockingThreadPoolExecutor(
        int poolSize,
        int queueSize,
        long keepAliveTime,
        TimeUnit keepAliveTimeUnit,
        long maxBlockingTime,
        TimeUnit maxBlockingTimeUnit,
        Callable<Boolean> blockingTimeCallback) {

        super(
                poolSize, // Core size
                poolSize, // Max size
                keepAliveTime,
                keepAliveTimeUnit,
                new ArrayBlockingQueue<Runnable>(
                    // to avoid redundant threads
                    Math.max(poolSize, queueSize)
                ), 
                // our own RejectExecutionHandler – see below
                new BlockThenRunPolicy(
                    maxBlockingTime,
                    maxBlockingTimeUnit,
                    blockingTimeCallback
                )
        );

        super.allowCoreThreadTimeOut(true);
    }

    @Override
    public void setRejectedExecutionHandler
      (RejectedExecutionHandler h) {
        throw new unsupportedOperationException(
            "setRejectedExecutionHandler
             is not allowed on this class.");
    }

    // ...
}
    

This is our new blocking thread pool. But as you may see, the real thing is still missing and that is our own newRejectExecutionHandler. In the constructor we pass parameters to our super,ThreadPoolExecutor. We use the full version constructor since the most important parameter that we wish to pass to our base class is theRejectExecutionHandler, which is the last parameter. We create a new object of the typeBlockThenRunPolicy, our own class (presented in a moment). The name of this saturation policy means exactly what it does: if a task is rejected due to saturation, block on the task submission in the producer thread context, and when there is enough capacity to take the task, accept it. We implement theBlockThenRunPolicy class as a private inner class inside ourBlockingThreadPoolExecutor, as no one else should know it.


    // -------------------------------------------------- 
    // Inner private class of BlockingThreadPoolExecutor
    // A reject policy that waits on the queue
    // -------------------------------------------------- 
    private static class BlockThenRunPolicy
        implements RejectedExecutionHandler {

        private long blockTimeout;
        private TimeUnit blocTimeoutUnit;
        private Callable<Boolean> blockTimeoutCallback;

        // Straight-forward constructor
        public BlockThenRunPolicy(...){...}

        // --------------------------------------------------

        @Override
        public void rejectedExecution(
            Runnable task,
            ThreadPoolExecutor executor) {           

            BlockingQueue<Runnable> queue = executor.getQueue();
            boolean taskSent = false;

            while (!taskSent) {

                if (executor.isShutdown()) {
                    throw new RejectedExecutionException(
                        "ThreadPoolExecutor has shutdown 
                         while attempting to offer a new task.");
                }

                try {
                    // offer the task to the queue, for a blocking-timeout
                    if (queue.offer(task, blockTimeout, blocTimeoutUnit)) {
                        taskSent = true;
                    }
                    else {
                        // task was not accepted - call the user's Callback
                        Boolean result = null;
                        try {
                            result = blockTimeoutCallback.call();
                        }
                        catch(Exception e) {
                            // wrap the Callback exception and re-throw
                            throw new RejectedExecutionException(e);
                        }
                        // check the Callback result
                        if(result == false) {
                            throw new RejectedExecutionException(
                              "User decided to stop waiting
                               for task insertion");                        
                        }
                        else {
                            // user decided to keep waiting (may log it)
                            continue;
                        }
                    }
                }
                catch (InterruptedException e) {
                    // we need to go back to the offer call...
                }

            } // end of while for InterruptedException 

        } // end of method rejectExecution

        // --------------------------------------------------

    } // end of inner private class BlockThenRunPolicy
    
    

Note that we may get a timeout when waiting on the queue, on the call to queue.offer(...). It is always the right practice to use a timeout-enabled version of a blocking call, rather than any "wait-forever" version. This way it is easier to be aware of and troubleshoot cases of thread starvation and deadlocks. In this case, we do not log the event of getting the timeout, as we do not have a logger at hand. But still, this is a major event, especially if we set a long timeout that we do not expect to happen. This is why we ask the user to provide a callback so we can report the event and let the user decide whether to just log and keep waiting or stop the wait.

Our solution preserves the default behavior of ThreadPoolExecutor, except for the saturation policy. Since we use inheritance, any setter or getter of the originalThreadPoolExecutor can be used, excluding thesetRejectedExecutionHandler, which we forbid, throwing an exception if called."http://prometheus.codehaus.org">Prometheus, another open source approach to the blocking thread pool problem, used a wrapper solution as a straightforward approach (with the following"http://prometheus.codehaus.org/javadoc/main/index.html">API). However, the wrapper solution requires implementing allExecutorService interface methods -- in order to be a commonExecutorService -- resulting with a quite cumbersome solution compared to our more organic extension.

Almost Done

We have a BlockingThreadPoolExecutor. But bear with me for a few more moments, as we are about to ask for more.

Remember our problem. We have a huge directory filled with files and we wanted to block on the queue if it is full. But we need something more. When all files are sent to the queue, the producer thread knows it is done sending all the files, but it still needs to wait for the worker threads to finish. And we do not want to shut down the thread pool and wait for it to finish that way, as we are going to use it in a few moments again. What we need is a way to wait for the final tasks sent to the thread pool to complete.

To do that we add a "synchronizer" object for the producer to wait on. The producer will wait on a new method we create, which we calledawait(), but there is an underlying condition inside that waits for a signal, and this is our Synchronizer. The thread pool signals theSynchronizer when it is idle; that is, all worker threads are idle. To have this info we simply count the number of currently working threads. We do not rely on thegetActiveCount() method, as its contract and definition are not clear enough; we prefer to simply do it ourselves using anAtomicInteger to make sure that increment and decrement operations are done atomically, without a need to synchronize around++ or--.

Here we use the beforeExecute() andafterExecute() hook methods, but must take care of tasks that failed at the execute point, before assuming position in the queue, in which case decreasing the counter must be done. OurSynchronizer class manages the blocking wait on theawait() method, by waiting on aCondition that is signaled only when there are no tasks in the queue.

The resulting code is this:


public class NotifyingThreadPoolExecutor
    extends ThreadPoolExecutor {

    private AtomicInteger tasksInProcess = new AtomicInteger();
    // using our own private inner class, see below
    private Synchronizer synchronizer = new Synchronizer();

    @Override
    public void execute(Runnable task) {
        // count a new task in process
        tasksInProcess.incrementAndGet();

        try {
            super.execute(task);
        } catch(RuntimeException e) {
            // specifically, handle RejectedExecutionException  
            tasksInProcess.decrementAndGet();
            throw e;
        } catch(Error e) {
            tasksInProcess.decrementAndGet();
            throw e;
        }
    }

    @Override
    protected void afterExecute(Runnable r, Throwable t) {

        super.afterExecute(r, t);

        // synchronizing on the pool (and all its threads)
        // we need the synchronization to avoid more than one signal
        // if two or more threads decrement almost together and come
        // to the if with 0 tasks together
        synchronized(this) {
            tasksInProcess.decrementAndGet();
            if (tasksInProcess.intValue() == 0) {
                synchronizer.signalAll();
            }
        }
    }

    public void await() throws InterruptedException {
        synchronizer.await();
    }

    // (there is also an await with timeout, see the full source code) 
    
}

    

We need now to provide the Synchronizer class that does the actual locking and synchronization work. We prefer to implement theSynchronizer class as a private inner class inside ourNotifyingThreadPoolExecutor, as no one else should know it.



    //--------------------------------------------------------------
    // Inner private class of NotifyingThreadPoolExecutor
    // for signaling when queue is idle
    //--------------------------------------------------------------
    private class Synchronizer {

        private final Lock lock = new ReentrantLock();
        private final Condition done = lock.newCondition();
        private boolean isDone = false;

        // called from the containing class NotifyingThreadPoolExecutor
        private void signalAll() {

            lock.lock(); // MUST lock!
            try {
                isDone = true;
                done.signalAll();
            }
            finally {
                lock.unlock(); // unlock even in case of an exception
            }
        }

        public void await() throws InterruptedException {

            lock.lock(); // MUST lock!
            try {
                while (!isDone) { // avoid signaling on 'spuriously' wake-up
                    done.await();
                }
            }
            finally {
                isDone = false; // for next call to await
                lock.unlock();  // unlock even in case of an exception
            }
        }
        // (there is also an await with timeout, see the full source code) 

    } // end of private inner class Synchronizer

    //--------------------------------------------------------------

    

As we needed both the notifying and the blocking features together, we combined them both to aNotifyingBlockingThreadPoolExecutor, whose code and an example of use can be found in the example source code.

Conclusions

Occasionally there is a need to accomplish something that is not supported out of the box in a framework at hand. The first thought is usually: "it ought to be there, I must have missed something!" Then, after a while, after we search and investigate, we realize that the thing is indeed missing. At this point we are close to convincing ourselves that there is a reason for not having this ability, and we probably don't really need it. ("There must be a reason for not having it there. Who are we to argue?!") But the bravest of us would not compromise. Good frameworks are built to be extended, and so be it.

In this article we presented the need we faced for a blocking thread pool. This need is not a whim; other people, as can be seen in the"#resources">resources list, already raised this need. It is a bit surprising that the Java API does not provide this ability, but as seen, there are couple of good extensions that can be implemented to support this need. While implementing this feature, we went through the difference betweenoffer() andput() on aBlockingQueue, the Rejection Policy of the thread pool framework, thebeforeExecute() andafterExecute() hook methods, and some other relatedjava.concurrent players, such asLock, Signals, and AtomicInteger.

The implementation presented here, together with the sample code provided, may serve both as a good solution for the blocking thread pool need, as well as a reference for other related
ThreadPoolExecutor extensions.

Acknowledgments

I would like to thank Yaneeve Shekel for bringing this problem to me and working with me on parts of the code presented here.

Resources


Amir Kirsh serves as the chief programmer of Comverse and as a staff member at the Academic College of Tel-Aviv Yaffo.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值