Using JDK 7's Fork/Join Framework

Using JDK 7's Fork/Join Framework

June 27, 2011

Java 7, which is due to be released within a matter of weeks, has many new features. In fact, it contains more new, big features than the previous Java SE version mainly because it's been so long since Java SE 6 was released. Some of the planned features even had to be deferred to JDK 8. Here's a summary of what's new:

  • JSR-292: Support for dynamically typed languages. Languages like Ruby, or Groovy, will now execute on the JVM with performance at or close to that of native Java code
  • JSR-334: Also called Project Coin, this includes many enhancements to the JVM core to treat smaller languages as first-class citizens
  • Improved class loading
  • JSR-166: The new Fork/Join framework for enhanced concurrency support
  • Unicode 6.0 and other Internationalization improvements
  • JSR-203: NIO.2, which includes better file system integration, better asynchronous support, multicast, and so on
  • Windows Vista IPv6 support
  • SDP, SCTP, and TLS 1.2 support
  • JDBC 4.1
  • Swing enhancements, Nimbus look-and-feel, enhanced platform window support, and new sound synthesizer
  • Updated XML and Web Services stack
  • Improved system and JVM reporting framework included with MBean enhancements

What got deferred to JDK 8? Here's a summary list:

  • Modular support for the JVM (Project Jigsaw)
  • Enhanced Java annotations
  • Java Closures (Project Lambda)
  • JSR-296: Swing Framework to eliminate boiler plate code

For a complete list of enhancements and new features, with full details, click here. For now, let's look at the new Fork/Join framework, and how it helps with Java concurrency.

What Is Fork/Join?

Fork/Join is an enhancement to the ExecutorService implementation that allows you to more easily break up processing to be executed concurrently, and recursively, with little effort on your part. It's based on the work of Doug Lea, a thought leader on Java concurrency, at SUNY Oswego. Fork/Join deals with the threading hassles; you just indicate to the framework which portions of the work can be broken apart and handled recursively. It employs a divide and conquer algorithm that works like this in pseudocode (as taken from Doug Lea's paper on the subject):

?
1
2
3
4
5
6
7
8
9
Result doWork(Work work) {
     if (work is small) {
         process the work
     }
     else {
         split up work
         invoke framework to solve both parts
     }
}

It's your job to determine the amount of work to process before splitting it up. If it's too granular, the overhead of the Fork/Join framework may hurt performance. But if it's just right, the advantage of parallelism will increase performance. For instance, the sample application we'll examine will look for XML files to process in a set of directories. If there are too many files, the code will use the Fork/Join framework to recursively break down the workload across multiple threads. Since XML file processing involves a combination of I/O and CPU work, this is a perfect use of Fork/Join.

The framework handles the threads based on available resources. It also employs a second algorithm called work stealing, where idle threads can steal work from busy threads to help spread the load around without spawning new threads. The same type of algorithm is often used in garbage collectors that use parallel worker threads to walk the heap.

Java 7 Fork/Join Processing Example

Let's explore a sample application that checks a set of work directories for new XML files. As the files are processed, they're moved out of the work directories and into a special "processed" directory. This sample is loosely based on a news processing system I worked on years ago, where news articles were written to the appropriate directories as they were published. Then, a worker process that periodically checked the directories would process the files, and make them available on a website.

The code below is the complete Fork/Join XML processing application (minus the actual XML processing details). The main class, XMLProcessingForkJoin, starts off the actual parsing of files within a directory periodically. It uses the ProcessXMLFiles class, which extends the Fork/Join framework's java.util.concurrent.RecursiveAction base class, to recursively split up and process all the files in the source directory.

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
public class XMLProcessingForkJoin {
  
    class ProcessXMLFiles extends RecursiveAction {
        static final int FILE_COUNT_THRESHOLD = 2 ;
        String sourceDirPath;
        String targetDirPath;
        File[] xmlFiles = null ;
  
        public ProcessXMLFiles(String sourceDirPath, String targetDirPath, File[] xmlFiles) {
            this .sourceDirPath = sourceDirPath;
            this .targetDirPath = targetDirPath;
            this .xmlFiles = xmlFiles;
        }
  
        @Override
        protected void compute() {
            try {
                // Make sure the directory has been scanned
                if ( xmlFiles == null ) {
                    File sourceDir = new File(sourceDirPath);
                    if ( sourceDir.isDirectory() ) {
                        xmlFiles = sourceDir.listFiles();
                    }
                }
  
                // Check the number of files
                if ( xmlFiles.length <= FILE_COUNT_THRESHOLD ) {
                    parseXMLFiles(xmlFiles);
                }
                else {
                    // Split the array of XML files into two equal parts
                    int center = xmlFiles.length / 2 ;
                    File[] part1 = (File[])splitArray(xmlFiles, 0 , center);
                    File[] part2 = (File[])splitArray(xmlFiles, center, xmlFiles.length);
  
                    invokeAll( new ProcessXMLFiles(sourceDirPath, targetDirPath, part1 ),
                              new ProcessXMLFiles(sourceDirPath, targetDirPath, part2 ));
  
                }
            }
            catch ( Exception e ) {
                e.printStackTrace();
            }
        }
  
        protected Object[] splitArray(Object[] array, int start, int end) {
            int length = end - start;
            Object[] part = new Object[length];
            for ( int i = start; i < end; i++ ) {
                part[i-start] = array[i];
            }
            return part;
        }
  
        protected void parseXMLFiles(File[] filesToParse) {
            // Parse and copy the given set of XML files
            // ...
        }
    }
  
    public XMLProcessingForkJoin(String source, String target) {
        // Periodically invoke the following lines of code:
        ProcessXMLFiles process = new ProcessXMLFiles(source, target, null );                
        ForkJoinPool pool = new ForkJoinPool();
        pool.invoke(process);
    }
  
    // Start the XML file parsing process with the Java SE 7 Fork/Join framework
    public static void main(String[] args) {
        if ( args.length < 2 ) {
            System.out.println( "args - please specify source and target dirs" );
            System.exit(- 1 );
        }
        String source = args[ 0 ];
        String target = args[ 1 ];
  
        XMLProcessingForkJoin forkJoinProcess = 
                new XMLProcessingForkJoin(source, target);
    }
}

It starts with the main class's constructor, XMLProcessingForkJoin, where a new ProcessXMLFiles object is created and handed off to the Fork/Join framework via a call to ForkJoinPool.invoke(). The framework then calls the object's compute() method. First, a check is made to populate the list of files within the directory. Next, if the number of files to process is at or below a threshold (two files in this case), the files are processed and we're done. Otherwise, the array of files is split into two parts, and two new Fork/Join tasks are created to process each sublist of files, and so on, recursively, until all the files are parsed and processed.

Since the code just parses XML files, I chose to extend RecursiveAction in this application. If your processing actually returns a result that needs to be combined with the results of other Fork/Join subtasks (i.e. sorting, compressing data, tallying numbers, and so on), then you can extend RecursiveTask. I'll take a closer look at this and other changes to the concurrent classes in Java SE 7 in a future blog.

Happy coding!
-EJB

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值