Using JDK 7's Fork/Join Framework
Java 7, which is due to be released within a matter of weeks, has many new features. In fact, it contains more new, big features than the previous Java SE version mainly because it's been so long since Java SE 6 was released. Some of the planned features even had to be deferred to JDK 8. Here's a summary of what's new:
- JSR-292: Support for dynamically typed languages. Languages like Ruby, or Groovy, will now execute on the JVM with performance at or close to that of native Java code
- JSR-334: Also called Project Coin, this includes many enhancements to the JVM core to treat smaller languages as first-class citizens
- Improved class loading
- JSR-166: The new Fork/Join framework for enhanced concurrency support
- Unicode 6.0 and other Internationalization improvements
- JSR-203: NIO.2, which includes better file system integration, better asynchronous support, multicast, and so on
- Windows Vista IPv6 support
- SDP, SCTP, and TLS 1.2 support
- JDBC 4.1
- Swing enhancements, Nimbus look-and-feel, enhanced platform window support, and new sound synthesizer
- Updated XML and Web Services stack
- Improved system and JVM reporting framework included with MBean enhancements
What got deferred to JDK 8? Here's a summary list:
- Modular support for the JVM (Project Jigsaw)
- Enhanced Java annotations
- Java Closures (Project Lambda)
- JSR-296: Swing Framework to eliminate boiler plate code
For a complete list of enhancements and new features, with full details, click here. For now, let's look at the new Fork/Join framework, and how it helps with Java concurrency.
What Is Fork/Join?
Fork/Join is an enhancement to the ExecutorService
implementation that allows you to more easily break up processing to be executed concurrently, and recursively, with little effort on your part. It's based on the work of Doug Lea, a thought leader on Java concurrency, at SUNY Oswego. Fork/Join deals with the threading hassles; you just indicate to the framework which portions of the work can be broken apart and handled recursively. It employs a divide and conquer algorithm that works like this in pseudocode (as taken from Doug Lea's paper on the subject):
1
2
3
4
5
6
7
8
9
|
Result doWork(Work work) {
if
(work is small) {
process the work
}
else
{
split up work
invoke framework to solve both parts
}
}
|
It's your job to determine the amount of work to process before splitting it up. If it's too granular, the overhead of the Fork/Join framework may hurt performance. But if it's just right, the advantage of parallelism will increase performance. For instance, the sample application we'll examine will look for XML files to process in a set of directories. If there are too many files, the code will use the Fork/Join framework to recursively break down the workload across multiple threads. Since XML file processing involves a combination of I/O and CPU work, this is a perfect use of Fork/Join.
The framework handles the threads based on available resources. It also employs a second algorithm called work stealing, where idle threads can steal work from busy threads to help spread the load around without spawning new threads. The same type of algorithm is often used in garbage collectors that use parallel worker threads to walk the heap.
Java 7 Fork/Join Processing Example
Let's explore a sample application that checks a set of work directories for new XML files. As the files are processed, they're moved out of the work directories and into a special "processed" directory. This sample is loosely based on a news processing system I worked on years ago, where news articles were written to the appropriate directories as they were published. Then, a worker process that periodically checked the directories would process the files, and make them available on a website.
The code below is the complete Fork/Join XML processing application (minus the actual XML processing details). The main class, XMLProcessingForkJoin
, starts off the actual parsing of files within a directory periodically. It uses the ProcessXMLFiles
class, which extends the Fork/Join framework's java.util.concurrent.RecursiveAction
base class, to recursively split up and process all the files in the source directory.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
|
public
class
XMLProcessingForkJoin {
class
ProcessXMLFiles
extends
RecursiveAction {
static
final
int
FILE_COUNT_THRESHOLD =
2
;
String sourceDirPath;
String targetDirPath;
File[] xmlFiles =
null
;
public
ProcessXMLFiles(String sourceDirPath, String targetDirPath, File[] xmlFiles) {
this
.sourceDirPath = sourceDirPath;
this
.targetDirPath = targetDirPath;
this
.xmlFiles = xmlFiles;
}
@Override
protected
void
compute() {
try
{
// Make sure the directory has been scanned
if
( xmlFiles ==
null
) {
File sourceDir =
new
File(sourceDirPath);
if
( sourceDir.isDirectory() ) {
xmlFiles = sourceDir.listFiles();
}
}
// Check the number of files
if
( xmlFiles.length <= FILE_COUNT_THRESHOLD ) {
parseXMLFiles(xmlFiles);
}
else
{
// Split the array of XML files into two equal parts
int
center = xmlFiles.length /
2
;
File[] part1 = (File[])splitArray(xmlFiles,
0
, center);
File[] part2 = (File[])splitArray(xmlFiles, center, xmlFiles.length);
invokeAll(
new
ProcessXMLFiles(sourceDirPath, targetDirPath, part1 ),
new
ProcessXMLFiles(sourceDirPath, targetDirPath, part2 ));
}
}
catch
( Exception e ) {
e.printStackTrace();
}
}
protected
Object[] splitArray(Object[] array,
int
start,
int
end) {
int
length = end - start;
Object[] part =
new
Object[length];
for
(
int
i = start; i < end; i++ ) {
part[i-start] = array[i];
}
return
part;
}
protected
void
parseXMLFiles(File[] filesToParse) {
// Parse and copy the given set of XML files
// ...
}
}
public
XMLProcessingForkJoin(String source, String target) {
// Periodically invoke the following lines of code:
ProcessXMLFiles process =
new
ProcessXMLFiles(source, target,
null
);
ForkJoinPool pool =
new
ForkJoinPool();
pool.invoke(process);
}
// Start the XML file parsing process with the Java SE 7 Fork/Join framework
public
static
void
main(String[] args) {
if
( args.length <
2
) {
System.out.println(
"args - please specify source and target dirs"
);
System.exit(-
1
);
}
String source = args[
0
];
String target = args[
1
];
XMLProcessingForkJoin forkJoinProcess =
new
XMLProcessingForkJoin(source, target);
}
}
|
It starts with the main class's constructor, XMLProcessingForkJoin
, where a new ProcessXMLFiles
object is created and handed off to the Fork/Join framework via a call to ForkJoinPool.invoke()
. The framework then calls the object's compute()
method. First, a check is made to populate the list of files within the directory. Next, if the number of files to process is at or below a threshold (two files in this case), the files are processed and we're done. Otherwise, the array of files is split into two parts, and two new Fork/Join tasks are created to process each sublist of files, and so on, recursively, until all the files are parsed and processed.
Since the code just parses XML files, I chose to extend RecursiveAction
in this application. If your processing actually returns a result that needs to be combined with the results of other Fork/Join subtasks (i.e. sorting, compressing data, tallying numbers, and so on), then you can extend RecursiveTask
. I'll take a closer look at this and other changes to the concurrent classes in Java SE 7 in a future blog.
Happy coding!
-EJB