分支/合并框架的目的是以递归方式将可以并行的任务拆分成更小的任务,然后将每个子任务的结果合并起来生成整体结果。它是
ExecutorService
接口的一个实现,它把子任务分配给线程池(称为ForkJoinPool
)中的工作线程。
一. RecursiveTask
想要把任务提交到ForkJoinPool池,必须创建RecursiveTask的一个子类,其中R是并行化任务(以及所有子任务)产生的结果类型,或者如果任务不返回结果,则是RecursiveAction类型。要定义RecursiveTask,只需实现它唯一的抽象方法compute,源码如下:
package java.util.concurrent;
/**
* A recursive result-bearing {@link ForkJoinTask}.
*
* <p>For a classic example, here is a task computing Fibonacci numbers:
*
* <pre> {@code
* class Fibonacci extends RecursiveTask<Integer> {
* final int n;
* Fibonacci(int n) { this.n = n; }
* Integer compute() {
* if (n <= 1)
* return n;
* Fibonacci f1 = new Fibonacci(n - 1);
* f1.fork();
* Fibonacci f2 = new Fibonacci(n - 2);
* return f2.compute() + f1.join();
* }
* }}</pre>
*
* However, besides being a dumb way to compute Fibonacci functions
* (there is a simple fast linear algorithm that you'd use in
* practice), this is likely to perform poorly because the smallest
* subtasks are too small to be worthwhile splitting up. Instead, as
* is the case for nearly all fork/join applications, you'd pick some
* minimum granularity size (for example 10 here) for which you always
* sequentially solve rather than subdividing.
*
* @since 1.7
* @author Doug Lea
*/
public abstract class RecursiveTask<V> extends ForkJoinTask<V> {
private static final long serialVersionUID = 5232453952276485270L;
/**
* The result of the computation.
*/
V result;
/**
* The main computation performed by this task.
* @return the result of the computation
*/
protected abstract V compute();
public final V getRawResult() {
return result;
}
protected final void setRawResult(V value) {
result = value;
}
/**
* Implements execution conventions for RecursiveTask.
*/
protected final boolean exec() {
result = compute();
return true;
}
}
这个方法同时定义了将任务拆分成子任务的逻辑,以及无法再拆分或不方便再拆分时,生成单个子任务结果的逻辑。
- 创建RecursiveTask的一个子类ForkJoinSumCalculator
import lombok.Data;
import lombok.EqualsAndHashCode;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.ForkJoinTask;
import java.util.concurrent.RecursiveTask;
import java.util.stream.LongStream;
@EqualsAndHashCode(callSuper = true)
@Data
//继承RecursiveTask来创建key用于分支/合并框架的任务
public class ForkJoinSumCalculator extends RecursiveTask<Long> {
//求和的数组
private final long[] numbers;
//子任务处理的数组的起始位置
private final int start;
//子任务处理的数组的终止位置
private final int end;
//不再将任务分解为子任务的数组大小
public static final long THRESHOLD = 10_000;
/**
* 私有构造函数用于以递归方式为主任务创建子任务
*
* @param numbers 求和的数组
* @param start 子任务处理的数组的起始位置
* @param end 子任务处理的数组的终止位置
*/
private ForkJoinSumCalculator(long[] numbers, int start, int end) {
this.numbers = numbers;
this.start = start;
this.end = end;
}
/**
* 公共构造函数用于创建主任务
*
* @param numbers 求和的数组
*/
public ForkJoinSumCalculator(long[] numbers) {
this(numbers, 0, numbers.length);
}
/**
* 在子任务不再可分时,计算结果的简单算法
* @return 和
*/
private long computeSequentially(){
long sum = 0;
for (int i = start; i < end; i++) {
sum += numbers[i];
}
return sum;
}
@Override
protected Long compute() {
int length = end - start;
//如果大小小于或等于阈值,顺序计算结果
if (length < THRESHOLD) {
return computeSequentially();
}
//创建一个子任务为数组的前一半求和
ForkJoinSumCalculator leftTask = new ForkJoinSumCalculator(numbers, start, start + length/2);
//利用另一个ForkJoinPool线程异步执行新创建的子任务
leftTask.fork();
//创建一个子任务为数组的后一半求和
ForkJoinSumCalculator rightTask = new ForkJoinSumCalculator(numbers, start + length / 2, end);
//同步执行第二个子任务
Long rightResult = rightTask.compute();
//读取第一个子任务的结果,如果尚未完成就等待
Long leftResult = leftTask.join();
//该任务的结果是两个子任务结果的组合
return leftResult + rightResult;
}
/**
* 实际应用时,使用多个ForkJoinPool是没有什么意义的,一般来说把它实例化一次,然后把实例保存在静态字段中,使之成为单例
* @param n
* @return
*/
public static long forkJoinSum(long n) {
long[] numbers = LongStream.rangeClosed(1, n).toArray();
ForkJoinTask<Long> task = new ForkJoinSumCalculator(numbers);
return new ForkJoinPool().invoke(task);
}
}
当把ForkJoinSumCalculator任务传给ForkJoinPool时,这个任务就由池中的一个线程执行,这个线程会调用任务的compute方法。该方法会检查任务是否小到足以顺序执行,如果不够小则会把要求和的数组分成两半,分给两个新的ForkJoinSumCalculator,而它们也由ForkJoinPool安排执行。
因此,这一过程可以递归重复,把原任务分为更小的任务,直到满足不方便或不可能再进一步拆分的条件。这时会顺序计算每个任务的结果,然后由分支过程创建的(隐含的)任务二叉树遍历回到它的根。接下来会合并每个子任务的部分结果,从而得到总任务的结果。
二. Spliterator
Spliterator是Java 8中加入的另一个新接口;这个名字代表“可分迭代器”(splitable iterator)。和Iterator一样,Spliterator也用于遍历数据源中的元素,但它是为了并行执行而设计的。
Java 8已经为集合框架中包含的所有数据结构提供了一个默认的Spliterator实现。集合实现了Spliterator接口,接口提供了一个spliterator方法。
Spliterator源码如下:
public interface Spliterator<T> {
/**
* If a remaining element exists, performs the given action on it,
* returning {@code true}; else returns {@code false}. If this
* Spliterator is {@link #ORDERED} the action is performed on the
* next element in encounter order. Exceptions thrown by the
* action are relayed to the caller.
*
* @param action The action
* @return {@code false} if no remaining elements existed
* upon entry to this method, else {@code true}.
* @throws NullPointerException if the specified action is null
*/
boolean tryAdvance(Consumer<? super T> action);
/**
* If this spliterator can be partitioned, returns a Spliterator
* covering elements, that will, upon return from this method, not
* be covered by this Spliterator.
*
* <p>If this Spliterator is {@link #ORDERED}, the returned Spliterator
* must cover a strict prefix of the elements.
*
* <p>Unless this Spliterator covers an infinite number of elements,
* repeated calls to {@code trySplit()} must eventually return {@code null}.
* Upon non-null return:
* <ul>
* <li>the value reported for {@code estimateSize()} before splitting,
* must, after splitting, be greater than or equal to {@code estimateSize()}
* for this and the returned Spliterator; and</li>
* <li>if this Spliterator is {@code SUBSIZED}, then {@code estimateSize()}
* for this spliterator before splitting must be equal to the sum of
* {@code estimateSize()} for this and the returned Spliterator after
* splitting.</li>
* </ul>
*
* <p>This method may return {@code null} for any reason,
* including emptiness, inability to split after traversal has
* commenced, data structure constraints, and efficiency
* considerations.
*
* @apiNote
* An ideal {@code trySplit} method efficiently (without
* traversal) divides its elements exactly in half, allowing
* balanced parallel computation. Many departures from this ideal
* remain highly effective; for example, only approximately
* splitting an approximately balanced tree, or for a tree in
* which leaf nodes may contain either one or two elements,
* failing to further split these nodes. However, large
* deviations in balance and/or overly inefficient {@code
* trySplit} mechanics typically result in poor parallel
* performance.
*
* @return a {@code Spliterator} covering some portion of the
* elements, or {@code null} if this spliterator cannot be split
*/
Spliterator<T> trySplit();
/**
* Returns an estimate of the number of elements that would be
* encountered by a {@link #forEachRemaining} traversal, or returns {@link
* Long#MAX_VALUE} if infinite, unknown, or too expensive to compute.
*
* <p>If this Spliterator is {@link #SIZED} and has not yet been partially
* traversed or split, or this Spliterator is {@link #SUBSIZED} and has
* not yet been partially traversed, this estimate must be an accurate
* count of elements that would be encountered by a complete traversal.
* Otherwise, this estimate may be arbitrarily inaccurate, but must decrease
* as specified across invocations of {@link #trySplit}.
*
* @apiNote
* Even an inexact estimate is often useful and inexpensive to compute.
* For example, a sub-spliterator of an approximately balanced binary tree
* may return a value that estimates the number of elements to be half of
* that of its parent; if the root Spliterator does not maintain an
* accurate count, it could estimate size to be the power of two
* corresponding to its maximum depth.
*
* @return the estimated size, or {@code Long.MAX_VALUE} if infinite,
* unknown, or too expensive to compute.
*/
long estimateSize();
/**
* Returns a set of characteristics of this Spliterator and its
* elements. The result is represented as ORed values from {@link
* #ORDERED}, {@link #DISTINCT}, {@link #SORTED}, {@link #SIZED},
* {@link #NONNULL}, {@link #IMMUTABLE}, {@link #CONCURRENT},
* {@link #SUBSIZED}. Repeated calls to {@code characteristics()} on
* a given spliterator, prior to or in-between calls to {@code trySplit},
* should always return the same result.
*
* <p>If a Spliterator reports an inconsistent set of
* characteristics (either those returned from a single invocation
* or across multiple invocations), no guarantees can be made
* about any computation using this Spliterator.
*
* @apiNote The characteristics of a given spliterator before splitting
* may differ from the characteristics after splitting. For specific
* examples see the characteristic values {@link #SIZED}, {@link #SUBSIZED}
* and {@link #CONCURRENT}.
*
* @return a representation of characteristics
*/
int characteristics();
}
分析:
T
是Spliterator遍历的元素的类型。
tryAdvance
方法的行为类似于普通的Iterator,因为它会按顺序一个一个使用Spliterator中的元素,并且如果还有其他元素要遍历就返回true。
trySplit
是专为Spliterator接口设计的,因为它可以把一些元素划出去分给第二个Spliterator(由该方法返回),让它们两个并行处理。
Spliterator还可通过estimateSize
方法估计还剩下多少元素要遍历,因为即使不那么确切,能快速算出来是一个值也有助于让拆分均匀一点。
拆分过程:
将Stream拆分成多个部分的算法是一个递归过程。
第一步是对第一个Spliterator调用trySplit,生成第二个Spliterator。
第二步对这两个Spliterator调用trysplit,这样总共就有了四个Spliterator。
这个框架不断对Spliterator调用trySplit直到它返回null,表明它处理的数据结构不能再分割。
最后,这个递归拆分过程到第四步就终止了,这时所有的Spliterator在调用trySplit时都返回了null。
这个拆分过程也受Spliterator本身的特性影响,而特性是通过characteristics方法声明的。
Spliterator接口声明的最后一个抽象方法是characteristics,它将返回一个int,代表Spliterator本身特性集的编码。使用Spliterator的客户可以用这些特性来更好地控制和优化它的使用。
- 实现Spliterator
1.迭代示例,开发一个简单的方法来数数一个String中的单词数。
//一个迭代式字数统计方法
public static int countWordsIteratively(String s) {
int counter = 0;
boolean lastSpace = true;
for (char c : s.toCharArray()) {
if (Character.isWhitespace(c)) {
lastSpace = true;
} else {
if (lastSpace) {
counter++;
}
lastSpace = false;
} }
return counter;
}
public static void main(String[] args) {
String sentence =
" Nel mezzo del cammin di nostra vita " +
"mi ritrovai in una selva oscura" +
" ché la dritta via era smarrita ";
System.out.println("Found " + countWordsIteratively(sentence) + " words");
}
结果:
Found 19 words
2.以函数式风格重写单词计数器
Stream<Character> stream = IntStream.range(0, sentence.length()) .mapToObj(sentence::charAt);
可以对这个流做归约来计算字数。在归约流时,你得保留由两个变量组成的状态:一个int用来计算到目前为止数过的字数,还有一个boolean用来记得上一个遇到的Character是不是空格。因为Java没有元组,所以必须创建一个新类WordCounter来把这个状态封装起来。
@Data
@AllArgsConstructor
public class WordCounter {
private final int counter;
private final boolean lastSpace;
/**
* 和迭代算法一样,accumulate方法一个个遍历Character
*/
public WordCounter accumulate(Character c) {
if (Character.isWhitespace(c)) {
return lastSpace ? this : new WordCounter(counter, true);
}else {
//上一个字符是空格,而当前遍历的字符不 空格时,将单词计数器加一
return lastSpace ? new WordCounter(counter + 1, false) : this;
}
}
/**
* 合并两个WordCounter,把其计数器加起
*/
public WordCounter combine(WordCounter wordCounter) {
//仅需要计数器的总和,无需关心lastSpace
return new WordCounter(counter + wordCounter.counter, wordCounter.lastSpace);
}
public int getCounter() {
return counter;
}
}
accumulate方法定义了如何更改WordCounter的状态,或更确切地说是用哪个状态来建立新的WordCounter,因为这个类是不可变的。每次遍历到Stream中的一个新的Character时,就会调用accumulate方法。当上一个字符是空格,新字符不是空格时,计数器就加一。
调用第二个方法combine时,会对作用于Character流的两个不同子部分的两个WordCounter的部分结果进行汇总,也就是把两个WordCounter内部的计数器加起来。
public static int countWords(Stream<Character> stream) {
WordCounter wordCounter = stream.reduce(new WordCounter(0, true),
WordCounter::accumulate,
WordCounter::combine);
return wordCounter.getCounter();
}
public static void main(String[] args) {
String sentence =
" Nel mezzo del cammin di nostra vita " +
"mi ritrovai in una selva oscura" +
" ché la dritta via era smarrita ";
// System.out.println("Found " + countWordsIteratively(sentence) + " words");
Stream<Character> stream = IntStream.range(0, sentence.length()) .mapToObj(sentence::charAt);
System.out.println("Found " + WordCounter.countWords(stream) + " words");
}
结果:
Found 19 words
3.尝试用并行流来加快字数统计,如下所示:
System.out.println("Found " + countWords(stream.parallel()) + " words");
结果:
Found 25 words
显然不对,因为原始的String在任意位置拆分,所以有时一个词会被分为两个词,然后数了两次。这就说明,拆分流会影响结果,而把顺序流换成并行流就可能使结果出错。
解决方案就是要确保String不是在随机位置拆开的,而只能在词尾拆开。要做到这一点,你必须为Character实现一个Spliterator,它只能在两个词之间拆开String,然后由此创建并行流。
4.创建WordCounterSpliterator
import java.util.Spliterator;
import java.util.function.Consumer;
public class WordCounterSpliterator implements Spliterator<Character> {
private final String string;
private int currentChar = 0;
public WordCounterSpliterator(String string) {
this.string = string;
}
@Override
public boolean tryAdvance(Consumer<? super Character> action) {
//处理当前字符
action.accept(string.charAt(currentChar++));
//如果还有字符要处理,则返回true
return currentChar < string.length();
}
@Override
public Spliterator<Character> trySplit() {
int currentSize = string.length() - currentChar;
if (currentSize < 10) {
return null;//返回null表示要解析的String已经足够小,可以顺序处理
}
for (int splitPos = currentSize / 2 + currentChar;splitPos < string.length(); splitPos++) {
//让拆分位置前进直到下一个空格从开始到拆分位置的部分
if (Character.isWhitespace(string.charAt(splitPos))) {
//创建一个新WordCounterSpliterator来解析String
Spliterator<Character> spliterator = new WordCounterSpliterator(string.substring(currentChar,splitPos));
//将这个WordCounterSpliterator的起始位置设为拆分位置
currentChar = splitPos;
return spliterator;
}
}
return null;
}
@Override
public long estimateSize() {
return string.length() - currentChar;
}
@Override
public int characteristics() {
return ORDERED + SIZED + SUBSIZED + NONNULL + IMMUTABLE;
}
}
解析:
- tryAdvance方法把String中当前位置的Character传给了Consumer,并让位置加一。作为参数传递的Consumer是一个Java内部类,在遍历流时将要处理的Character传给了一系列要对其执行的函数。这里只有一个归约函数,即WordCounter类的accumulate方法。如果新的指针位置小于String的总长,且还有要遍历的Character,则tryAdvance返回true。
- trySplit方法是Spliterator中最重要的一个方法,因为它定义了拆分要遍历的数据结构的逻辑。就像实现的RecursiveTask的compute方法一样,首先要设定不再进一步拆分的下限。这里用了一个非常低的下限10个Character,仅仅是为了保证程序会对那个比较短的String做几次拆分。如果剩余的Character数量低于下限,你就返回null表示无需进一步拆分。相反,如果你需要执行拆分,就把试探的拆分位置设在要解析的String块的中间。但我们没有直接使用这个拆分位置,因为要避免把词在中间断开,于是就往前找,直到找到一个空格。一旦找到了适当的拆分位置,就可以创建一个新的Spliterator来遍历从当前位置到拆分位置的子串;把当前位置this设为拆分位置,因为之前的部分将由新Spliterator来处理,最后返回。
- 需要遍历的元素的estimatedSize就是这个Spliterator解析的String的总长度和当前遍历的位置的差。
- characteristic方法告诉框架这个Spliterator是ORDERED(顺序就是String中各个Character的次序)、SIZED(estimatedSize方法的返回值是精确的)、SUBSIZED(trySplit方法创建的其他Spliterator也有确切大小)、NONNULL(String中 不 能 有 为null的Character) 和IMMUTABLE( 在解析String时不能再添加Character,因为String本身是一个不可变类)的。
用WordCounterSpliterator来处理并行流了,如下所示:
Spliterator<Character> spliterator = new WordCounterSpliterator(sentence);
Stream<Character> stream = StreamSupport.stream(spliterator, true);
System.out.println("Found " + WordCounter.countWords(stream) + " words");
正确输出:
Found 19 words
注意:
Spliterator在第一次遍历、第一次拆分或第一次查询估计大小时绑定元素的数据源,而不是在创建时就绑定。这种情况下,它称为延迟绑定
(late-binding)的Spliterator。