偷窃工作

Before talking about Work Stealing, we need to talk about task management. We could use Tomcat as one example. Tomcat will use multiple threads to handle concurrent connections. The number of threads can go very high when there are many concurrent connections at the same time.

在谈论工作窃取之前,我们需要谈论任务管理。 我们可以使用Tomcat为例。 Tomcat将使用多个线程来处理并发连接。 当同时存在多个并发连接时,线程数可能会非常高。

However, a computer has a fixed number of cores. The number of threads that could be run at the same time is equal to the number of cores. All other threads have to be in waiting/ready states. The ready state threads will be scheduled to take over the CPU when the current running thread starts to do blocking IO. Scheduling in and out of threads is done by OS and it takes time. Too many threads will also consume more system resources, like the memory used for the thread data structure. Too many threads will also hurt cache locality.

但是,计算机具有固定数量的内核。 可以同时运行的线程数等于内核数。 所有其他线程必须处于等待/就绪状态。 当当前正在运行的线程开始阻塞IO时,就绪状态线程将被调度为接管CPU。 线程的进出调度是由OS完成的,这需要时间。 太多的线程也会消耗更多的系统资源,例如用于线程数据结构的内存。 线程过多也会损害缓存的局部性。

The ideal number of threads is the number of cores, and each thread is running on its own core without being scheduled out. However, this is the ideal case. Because when one thread is invoking a blocking IO API, the thread will be put into waiting state by OS. Another ready thread will be scheduled in to use the CPU. Currently, many DB drivers are being rewritten to support nonblocking API. These DB drivers include mongo, redis, cassandra, etc. But MySQL DB driver does not support nonblocking IO.

理想的线程数是核心数,每个线程都在自己的核心上运行,而不会被调度。 但是,这是理想的情况。 因为当一个线程正在调用阻塞的IO API时,该线程将被OS置于等待状态。 将安排另一个就绪线程来使用CPU。 当前,许多数据库驱动程序正在被重写以支持非阻塞API。 这些数据库驱动程序包括mongo,redis,cassandra等。但是MySQL数据库驱动程序不支持非阻塞IO。

If we could not control third party blocking API, but we could control our own code. When we are writing our own tasks, if there are no blocking IOs involved, we don’t have to block one thread on the output of another thread.

如果我们无法控制第三方阻止API,但可以控制自己的代码。 在编写自己的任务时,如果不涉及阻塞的IO,则不必在另一个线程的输出上阻塞一个线程。

JDK and third party projects are also doing a lot of work on how to smartly manage task dependencies, trying to introduce as less as threads as possible to squeeze the CPU usage for one thread. This includes ForkJoinPool, Reactor Project, Netty, Project Loom etc.

JDK和第三方项目还在如何巧妙地管理任务依赖项方面进行了大量工作,试图引入尽可能少的线程以压缩一个线程的CPU使用率。 这包括ForkJoinPool ,Reactor Project,Netty,Project Loom等。

Since JDK 7, ForkJoinPool is introduced. The important thing about ForkJoinPool is that it’s created with the following ruling points.

从JDK 7开始,引入了ForkJoinPoolForkJoinPool的重要之处在于它是根据以下规则创建的。

  • Each thread has its own task queue. Each task queue is a cicular array.

    每个线程都有自己的任务队列。 每个任务队列都是一个针状阵列。
  • Each thread uses push and pop to add or remove tasks to its own queue.

    每个线程都使用推和弹出将任务添加或删除到自己的队列中。
  • ForkJoinPool uses the work-stealing algorithm to balance the workload on different threads. The pool maintains a global work queue that stores externally submitted tasks. Each worker thread will pop tasks from its own task queue. If there are no tasks in its own queue, it will try to randomly steal tasks from the shared work queues or other workers. If it fails to find tasks from both shared queues or other threads, it will go to sleep.

    ForkJoinPool的用途 工作窃取算法以平衡不同线程上的工作负载。 该池维护一个全局工作队列,该队列存储外部提交的任务。 每个工作线程将从其自己的任务队列中弹出任务。 如果自己的队列中没有任务,它将尝试从共享工作队列或其他工作线程中随机窃取任务。 如果它无法从共享队列或其他线程中查找任务,它将进入睡眠状态。

push: used by worker thread to push task to the top of its own work queuepop: used by worker thread to pop task from the top of its own workpoll: used by other thread to steal task from the bottom of the work queue of a different thread

Since this work-stealing algorithm is so general, it could be used in all other languages to manage task scheduling. For example, in Kotlin and Python, there are coroutines. A coroutine can be considered as user-space light thread. It does not involve scheduling of OS. It’s completely user-space task scheduling. The benefits of it are that it could use very few threads to manage the tasks. Project Loom is also introducing Fiber in Java. Fiber is similar to Coroutine.

由于此工作窃取算法是如此通用,因此可以在所有其他语言中使用它来管理任务调度。 例如,在KotlinPython中 ,有协程。 协程可以视为用户空间光线程。 它不涉及操作系统调度。 这完全是用户空间的任务调度。 它的好处是它可以使用很少的线程来管理任务。 Project Loom也正在Java中引入Fiber。 纤维类似于协程。

The general trend for user application development is tending to use lighter threads to manage user tasks. It achieves better performance with a lower number of threads, non-blocking IO, and user-space task management.

用户应用程序开发的总体趋势是倾向于使用较轻的线程来管理用户任务。 它以较少的线程数,无阻塞的IO和用户空间任务管理实现了更好的性能。

CompletableFuture

未来发展

CompletableFuture internally uses ForkJoinPool to manage the scheduling of the tasks. The following is one simple example.

CompletableFuture在内部使用ForkJoinPool来管理任务的调度。 以下是一个简单的示例。

CompletableFuture.supplyAsync(() -> {
try {
Thread.sleep(30000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("current thread is "+Thread.currentThread().getId());
return Thread.currentThread().getId();
}).thenCombineAsync(CompletableFuture.supplyAsync(() -> {
try {
Thread.sleep(70000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("current thread is "+Thread.currentThread().getId());
return Thread.currentThread().getId();
}), (l, r) -> l + r)
.thenAcceptAsync(x ->
{
System.out.println("current thread is "+Thread.currentThread().getId());
}).thenRunAsync(() -> {
System.out.println("current thread is "+Thread.currentThread().getId());
});

supplyAsync by its name is saying its forking one Async Task. This task will be submitted to ForkJoinPool. ForkJoinPool will use one worker thread to run it.

supplyAsync的名称是说它分叉了一个异步任务。 该任务将被提交到ForkJoinPool。 ForkJoinPool将使用一个工作线程来运行它。

combineAsync says that it should combine the result of the previous two submitted tasks. Since comebineAsync depends on the previous task results, so CompletableFuture will not start a new thread for this task. It uses the existing thread to run the task when the dependent tasks are finished.

CombineAsync表示应合并前两个已提交任务的结果。 由于comebineAsync取决于先前的任务结果,因此CompletableFuture不会为此任务启动新线程。 相关任务完成后,它将使用现有线程来运行任务。

runAsync is just submitting another independent task to ForkJoinPool.

runAsync只是向ForkJoinPool提交另一个独立的任务。

There are many APIs in CompletableFuture. The way to construct CompletableFuture tasks is very similar to Promise in frontend. When many callbacks are chained together, it could make the code not so readable. Some third-party libraries introduced language constructs async and await in Java similar to Javascript promise.

CompletableFuture中有许多API 构造CompletableFuture任务的方式与前端的Promise非常相似。 当许多回调链接在一起时,可能会使代码不那么可读。 一些第三方库介绍语言结构异步和等待在Java中类似的Javascript承诺。

Work-Stealing algorithm is also used in Reactor Project. And it might also be in many other lower-level task scheduling frameworks, libraries, kernel task management, etc.

工作 算法也用于Reactor项目中。 它也可能存在于许多其他较低级别的任务调度框架,库,内核任务管理等中。

If you want to know the details of Work-Stealing algorithm, you could read the paper and the source code of the ForkJoinPool.

如果您想了解Work-Stealing算法的详细信息,可以阅读ForkJoinPool的论文和源代码

https://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#l1348

https://hg.openjdk.java.net/jdk/jdk11/file/1ddf9a99e4ad/src/java.base/share/classes/java/util/concurrent/ForkJoinPool.java#l1348

https://www.dre.vanderbilt.edu/~schmidt/PDF/work-stealing-dequeue.pdf

https://www.dre.vanderbilt.edu/~schmidt/PDF/work-stealing-dequeue.pdf

Please leave a comment if you find anything wrong.

如果发现任何错误,请发表评论。

翻译自: https://medium.com/@Ryan_Zheng/work-stealing-distilled-d2ed86d3065d

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值