Fork/Join框架介绍 II 【在文档中查找一个词并返回文档或行中所出现这个词的次数】

最新推荐文章于 2024-04-28 11:15:25 发布

arthur.dy.lee

最新推荐文章于 2024-04-28 11:15:25 发布

阅读量872

点赞数

分类专栏： java多线程文章标签： Fork-Join 多线程

java多线程专栏收录该内容

63 篇文章 0 订阅

订阅专栏

查看《Fork/Join框架介绍 I》请点击

简介

通常，使用Java来开发一个简单的并发应用程序时，会创建一些Runnable对象，然后创建对应的Thread 对象来控制程序中这些线程的创建、执行以及线程的状态。自从Java 5开始引入了Executor和ExecutorService接口以及实现这两个接口的类（比如ThreadPoolExecutor）之后，使得Java在并发支持上得到了进一步的提升。

执行器框架（Executor Framework）将任务的创建和执行进行了分离，通过这个框架，只需要实现Runnable接口的对象和使用Executor对象，然后将Runnable对象发送给执行器。执行器再负责运行这些任务所需要的线程，包括线程的创建，线程的管理以及线程的结束。

Java 7则又更进了一步，它包括了ExecutorService接口的另一种实现，用来解决特殊类型的问题，它就是Fork/Join框架，有时也称分解/合并框架。

Fork/Join框架是用来解决能够通过分治技术（Divide and Conquer Technique）将问题拆分成小任务的问题。在一个任务中，先检查将要解决的问题的大小，如果大于一个设定的大小，那就将问题拆分成可以通过框架来执行的小任务。如果问题的大小比设定的大小要小，就可以直接在任务里解决这个问题，然后，根据需要返回任务的结果。下面的图形总结了这个原理。

这里写图片描述

没有固定的公式来决定问题的参考大小（Reference Size），从而决定一个任务是需要进行拆分或不需要拆分，拆分与否仍是依赖于任务本身的特性。可以使用在任务中将要处理的元素的数目和任务执行所需要的时间来决定参考大小。测试不同的参考大小来决定解决问题最好的一个方案，将ForkJoinPool类看作一个特殊的 Executor 执行器类型。这个框架基于以下两种操作。

分解（Fork）操作：当需要将一个任务拆分成更小的多个任务时，在框架中执行这些任务；
合并（Join）操作：当一个主任务等待其创建的多个子任务的完成执行。
Fork/Join框架和执行器框架（Executor Framework）主要的区别在于工作窃取算法（Work-Stealing Algorithm）。与执行器框架不同，使用Join操作让一个主任务等待它所创建的子任务的完成，执行这个任务的线程称之为工作者线程（Worker Thread）。工作者线程寻找其他仍未被执行的任务，然后开始执行。通过这种方式，这些线程在运行时拥有所有的优点，进而提升应用程序的性能。

为了达到这个目标，通过Fork/Join框架执行的任务有以下限制。

任务只能使用fork()和join() 操作当作同步机制。如果使用其他的同步机制，工作者线程就不能执行其他任务，当然这些任务是在同步操作里时。比如，如果在Fork/Join 框架中将一个任务休眠，正在执行这个任务的工作者线程在休眠期内不能执行另一个任务。
任务不能执行I/O操作，比如文件数据的读取与写入。
任务不能抛出非运行时异常（Checked Exception），必须在代码中处理掉这些异常。
Fork/Join框架的核心是由下列两个类组成的。

ForkJoinPool：这个类实现了ExecutorService接口和工作窃取算法（Work-Stealing Algorithm）。它管理工作者线程，并提供任务的状态信息，以及任务的执行信息。
ForkJoinTask：这个类是一个将在ForkJoinPool中执行的任务的基类。

Fork/Join框架提供了在一个任务里执行fork()和join()操作的机制和控制任务状态的方法。通常，为了实现Fork/Join任务，需要实现一个以下两个类之一的子类。

RecursiveAction：用于任务没有返回结果的场景。
RecursiveTask：用于任务有返回结果的场景。

扩展1

ForkJoinPool类还提供了以下方法用于执行任务。

execute (Runnabletask)：这是本范例中使用的execute()方法的另一种版本。这个方法发送一个Runnable任务给ForkJoinPool类。需要注意的是，使用Runnable对象时ForkJoinPool类就不采用工作窃取算法（Work-StealingAlgorithm），ForkJoinPool类仅在使用ForkJoinTask类时才采用工作窃取算法。
invoke(ForkJoinTasktask)：正如范例所示，ForkJoinPool类的execute()方法是异步调用的，而ForkJoinPool类的invoke()方法则是同步调用的。这个方法直到传递进来的任务执行结束后才会返回。
也可以使用在ExecutorService类中声明的invokeAll()和invokeAny()方法，这些方法接收Callable对象作为参数。使用Callable对象时ForkJoinPool类就不采用工作窃取算法（Work-StealingAlgorithm），因此，最好使用执行器来执行Callable对象。
ForkJoinTask类也包含了在范例中所使用的invokeAll()方法的其他版本，这些版本如下。
invokeAll(ForkJoinTask< ?> … tasks) ：这个版本的方法接收一个可变的参数列表，可以传递尽可能多的ForkJoinTask对象给这个方法作为参数。
invokeAll(Collection< T> tasks)：这个版本的方法接受一个泛型类型T的对象集合（比如，ArrayList对象、LinkedList对象或者TreeSet对象）。这个泛型类型T必须是ForkJoinTask类或者它的子类。
虽然ForkJoinPool类是设计用来执行ForkJoinTask对象的，但也可以直接用来执行Runnable和Callable对象。当然，也可以使用ForkJoinTask类的adapt()方法来接收一个Callable对象或者一个Runnable对象，然后将之转化为一个ForkJoinTask对象，然后再去执行。

Fork/Join合并任务的结果

Fork／Join框架提供了执行任务并返回结果的能力。这些类型的任务都是通过RecursiveTask类来实现的。RecursiveTask类继承了ForkJoinTask类，并且实现了由执行器框架（Executor Framework）提供的Future接口。

在任务中，必须使用Java API文档推荐的如下结构：

if (problem size > size){
    tasks=Divide(task);
    execute(tasks);
    groupResults()
    return result;
}else{
    resolve problem;
    return result;
}

如果任务需要解决的问题大于预先定义的大小，那么就要将这个问题拆分成多个子任务，并使用Fork／Join框架来执行这些子任务。执行完成后，原始任务获取到由所有这些子任务产生的结果，合并这些结果，返回最终的结果。当原始任务在线程池中执行结束后，将高效地获取到整个问题的最终结果。

实例

开发一个应用程序，在文档中查找一个词。我们将实现以下两种任务：
一个文档任务，它将遍历文档中的每一行来查找这个词；
一个行任务，它将在文档的一部分当中查找这个词。
所有这些任务将返回文档或行中所出现这个词的次数。

Main.java

package com.packtpub.java7.concurrency.chapter5.recipe02.core;

import java.util.concurrent.ExecutionException;
import java.util.concurrent.ForkJoinPool;
import java.util.concurrent.TimeUnit;

import com.packtpub.java7.concurrency.chapter5.recipe02.task.DocumentTask;
import com.packtpub.java7.concurrency.chapter5.recipe02.utils.DocumentMock;

/**
 * Main class of the example. 
 */
public class Main {

    /**
     * Main method of the class
     */
    public static void main(String[] args) {

        // Generate a document with 100 lines and 1000 words per line
        DocumentMock mock=new DocumentMock();
        String[][] document=mock.generateDocument(100, 1000, "the");

        // Create a DocumentTask
        DocumentTask task=new DocumentTask(document, 0, 100, "the");

        // Create a ForkJoinPool
        ForkJoinPool pool=new ForkJoinPool();

        // Execute the Task
        pool.execute(task);

        // Write statistics about the pool
        do {
            System.out.printf("******************************************\n");
            System.out.printf("Main: Parallelism: %d\n",pool.getParallelism());
            System.out.printf("Main: Active Threads: %d\n",pool.getActiveThreadCount());
            System.out.printf("Main: Task Count: %d\n",pool.getQueuedTaskCount());
            System.out.printf("Main: Steal Count: %d\n",pool.getStealCount());
            System.out.printf("******************************************\n");

            try {
                TimeUnit.SECONDS.sleep(1);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }

        } while (!task.isDone());

        // Shutdown the pool
        pool.shutdown();

        // Wait for the finalization of the tasks
        try {
            pool.awaitTermination(1, TimeUnit.DAYS);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        // Write the results of the tasks
        try {
            System.out.printf("Main: The word appears %d in the document",task.get());
        } catch (InterruptedException | ExecutionException e) {
            e.printStackTrace();
        }
    }

}

DocumentMock.java

package com.packtpub.java7.concurrency.chapter5.recipe02.utils;

import java.util.Random;

/**
 * This class will simulate a document generating a String array with a determined number
 * of rows (numLines) and columns(numWords). The content of the document will be generated
 * selecting in a random way words from a String array.
 *
 */
public class DocumentMock {

    /**
     * String array with the words of the document
     */
    private String words[]={"the","hello","goodbye","packt","java","thread","pool","random","class","main"};

    /**
     * Method that generates the String matrix
     * @param numLines Number of lines of the document
     * @param numWords Number of words of the document
     * @param word Word we are going to search for
     * @return The String matrix
     */
    public String[][] generateDocument(int numLines, int numWords, String word){

        int counter=0;
        String document[][]=new String[numLines][numWords];
        Random random=new Random();
        for (int i=0; i<numLines; i++){
            for (int j=0; j<numWords; j++) {
                int index=random.nextInt(words.length);
                document[i][j]=words[index];
                if (document[i][j].equals(word)){
                    counter++;
                }
            }
        }
        System.out.printf("DocumentMock: The word appears %d times in the document.\n",counter);
        return document;
    }
}

LineTask.java

package com.packtpub.java7.concurrency.chapter5.recipe02.task;

import java.util.concurrent.ExecutionException;
import java.util.concurrent.RecursiveTask;
import java.util.concurrent.TimeUnit;


/**
 * Task that will process a fragment of a line of the document. If the 
 * fragment is too big (100 words or more), it split it in two parts
 * and throw to tasks to process each of the fragments.
 * 
 * It returns the number of appearances of the word in the fragment it has
 * to process.
 *
 */
public class LineTask extends RecursiveTask<Integer>{

    /**
     * Serial Version of the class. You have to add it because the
     * ForkJoinTask class implements the serializable interface
     */
    private static final long serialVersionUID = 1L;

    /**
     * A line of the document
     */
    private String line[];

    /**
     * Range of positions the task has to process
     */
    private int start, end;

    /**
     * Word we are looking for
     */
    private String word;

    /**
     * Constructor of the class
     * @param line A line of the document
     * @param start Position of the line where the task starts its process
     * @param end Position of the line where the task starts its process
     * @param word Work we are looking for
     */
    public LineTask(String line[], int start, int end, String word) {
        this.line=line;
        this.start=start;
        this.end=end;
        this.word=word;
    }

    /**
     * If the part of the line it has to process is smaller that 100, it
     * calculates the number of appearances of the word in the block. Else,
     * it divides the block in two blocks and throws to LineTask to calculate
     * the number of appearances.
     */
    @Override
    protected Integer compute() {
        Integer result=null;
        if (end-start<100) {
            result=count(line, start, end, word);
        } else {
            int mid=(start+end)/2;
            LineTask task1=new LineTask(line, start, mid, word);
            LineTask task2=new LineTask(line, mid, end, word);
            invokeAll(task1, task2);
            try {
                result=groupResults(task1.get(),task2.get());
            } catch (InterruptedException | ExecutionException e) {
                e.printStackTrace();
            }
        }
        return result;
    }

    /**
     * Groups the results of two LineTasks
     * @param number1 The result of the first LineTask
     * @param number2 The result of the second LineTask
     * @return The sum of the numbers
     */
    private Integer groupResults(Integer number1, Integer number2) {
        Integer result;

        result=number1+number2;
        return result;
    }

    /**
     * Count the appearances of a word in a part of a line of a document
     * @param line A line of the document
     * @param start Position of the line where the method begin to count
     * @param end Position of the line where the method finish the count
     * @param word Word the method looks for
     * @return The number of appearances of the word in the part of the line
     */
    private Integer count(String[] line, int start, int end, String word) {
        int counter;
        counter=0;
        for (int i=start; i<end; i++){
            if (line[i].equals(word)){
                counter++;
            }
        }
        try {
            TimeUnit.MILLISECONDS.sleep(10);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return counter;
    }



}

DocumentTask.java

package com.packtpub.java7.concurrency.chapter5.recipe02.task;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.RecursiveTask;

/**
 * Task that will process part of the document and calculate the number of 
 * appearances of the word in that block. If it has to process
 * more that 10 lines, it divides its part in two and throws two DocumentTask
 * to calculate the number of appearances in each block.
 * In other case, it throws LineTasks to process the lines of the block
 *
 */
public class DocumentTask extends RecursiveTask<Integer> {

    /**
     * Serial Version of the class. You have to include it because
     * the ForkJoinTask class implements the Serializable interface
     */
    private static final long serialVersionUID = 1L;

    /**
     * Document to process
     */
    private String document[][];

    /**
     * Range of lines of the document this task has to process
     */
    private int start, end;

    /**
     * Word we are looking for
     */
    private String word;

    /**
     * Constructor of the class
     * @param document Document to process
     * @param start Starting position of the block of the document this task has to process
     * @param end End position of the block of the document this task has to process
     * @param word Word we are looking for
     */
    public DocumentTask (String document[][], int start, int end, String word){
        this.document=document;
        this.start=start;
        this.end=end;
        this.word=word;
    }

    /**
     * If the task has to process more that ten lines, it divide
     * the block of lines it two subblocks and throws two DocumentTask
     * two process them.
     * In other case, it throws LineTask tasks to process each line of its block
     */
    @Override
    protected Integer compute() {
        Integer result=null;
        if (end-start<10){
            result=processLines(document, start,end,word);
        } else {
            int mid=(start+end)/2;
            DocumentTask task1=new DocumentTask(document,start,mid,word);
            DocumentTask task2=new DocumentTask(document,mid,end,word);
            invokeAll(task1,task2);
            try {
                result=groupResults(task1.get(),task2.get());
            } catch (InterruptedException | ExecutionException e) {
                e.printStackTrace();
            }
        }
        return result;
    }

    /**
     * Throws a LineTask task for each line of the block of lines this task has to process
     * @param document Document to process
     * @param start Starting position of the block of lines it has to process
     * @param end Finish position of the block of lines it has to process
     * @param word Word we are looking for
     * @return
     */
    private Integer processLines(String[][] document, int start, int end,
            String word) {
        List<LineTask> tasks=new ArrayList<LineTask>();

        for (int i=start; i<end; i++){
            LineTask task=new LineTask(document[i], 0, document[i].length, word);
            tasks.add(task);
        }
        invokeAll(tasks);

        int result=0;
        for (int i=0; i<tasks.size(); i++) {
            LineTask task=tasks.get(i);
            try {
                result=result+task.get();
            } catch (InterruptedException | ExecutionException e) {
                e.printStackTrace();
            }
        }
        return new Integer(result);
    }

    /**
     * Method that group the results of two DocumentTask tasks
     * @param number1 Result of the first DocumentTask
     * @param number2 Result of the second DocumentTask
     * @return The sum of the two results
     */
    private Integer groupResults(Integer number1, Integer number2) {
        Integer result;

        result=number1+number2;
        return result;
    }


}

运行结果

DocumentMock: The word appears 9945 times in the document.

Main: Parallelism: 4
Main: Active Threads: 2
Main: Task Count: 1
Main: Steal Count: 0

Main: Parallelism: 4
Main: Active Threads: 4
Main: Task Count: 32
Main: Steal Count: 0

Main: Parallelism: 4
Main: Active Threads: 4
Main: Task Count: 32
Main: Steal Count: 0

Main: Parallelism: 4
Main: Active Threads: 4
Main: Task Count: 32
Main: Steal Count: 0

Main: Parallelism: 4
Main: Active Threads: 4
Main: Task Count: 7
Main: Steal Count: 0

Main: The word appears 9945 in the document

工作原理

在这个范例中，我们实现了两个不同的任务。

DocumentTask类：这个类的任务需要处理由start和end属性决定的文档行。如果这些行数小于10，那么，就每行创建一个LineTask对象，然后在任务执行结束后，合计返回的结果，并返回总数。如果任务要处理的行数大于10，那么，将任务拆分成两组，并创建两个DocumentTask对象来处理这两组对象。当这些任务执行结束后，同样合计返回的结果，并返回总数。
LineTask类：这个类的任务需要处理文档中一行的某一组词。如果一组词的个数小100，那么任务将直接在这一组词里搜索特定词，然后返回查找词在这一组词中出现的次数。否则，任务将拆分这些词为两组，并创建两个LineTask对象来处理这两组词。当这些任务执行完成后，合计返回的结果，并返回总数。
在Main主类中，我们通过默认的构造器创建了ForkJoinPool对象，然后执行DocumentTask类，来处理一个共有100行，每行1,000字的文档。这个任务将问题拆分成DocumentTask对象和LineTask对象，然后当所有的任务执行完成后，使用原始的任务来获取整个文档中所要查找的词出现的次数。由于任务继承了RecursiveTask类，因此能够返回结果。

调用get()方法来获得Task返回的结果。这个方法声明在Future接口里，并由RecursiveTask类实现。

执行程序时，在控制台上，我们可以比较第一行与最后一行的输出信息。第一行是文档生成时被查找的词出现的次数，最后一行则是通过Fork／Join任务计算而来的被查找的词出现的次数，而且这两个数字相同。

扩展2

ForkJoinTask类提供了另一个complete()方法来结束任务的执行并返回结果。这个方法接收一个对象，对象的类型就是RecursiveTask类的泛型参数，然后在任务调用join()方法后返回这个对象作为结果。这一过程采用了推荐的异步任务来返回任务的结果。

由于RecursiveTask类实现了Future接口，因此还有get()方法调用的其他版本：

get(long timeout, TimeUnit unit)：这个版本中，如果任务的结果未准备好，将等待指定的时间。如果等待时间超出，而结果仍未准备好，那方法就会返回null值。
TimeUnit是一个枚举类，有如下的常量：DAYS、HOURS、MICROSECONDS、MILLISECONDS、MINUTES、NANOSECONDS和SECONDS。

转自：http://ifeve.com/java7-concurrency-cookbook-4/