Java as a Scientific Programming Language
Java as a Scientific Programming Language (Part 1): More Issues for Scientific Programming in Java
By Ken Ritley
Java was designed as a modern, object-oriented programming language. Its features such as platform-independence, ability to manage libraries, threads, etc. are important for modern scientific programs. These are good reasons for a scientist to choose Java.
But Java was developed without the specific needs of scientists in mind. This means that in some respects scientists must exercise a bit of creativity and patience.
For the scientist thinking about migrating to Java, there are many important issues to consider. What follows is a useful list of some of them.
Java, like C, does not support an intrinsic complex number type.
For those who may not know, a complex number (say, c) is simply an ordered pair of two numbers (say, a and b), which obeys some very simple rules of arithmetic.
The fundamental equations which describe almost everything in the universe — from weather systems to black holes to the way in which tiger populations depend on how many rabbits they eat — are based on complex numbers, not ordinary numbers. Scientific programs have to be able to handle complex numbers efficiently, and scientific programmers have to be able to code complex formulas easily.
It's actually straightforward to "patch" Java's lack of a complex number data type. One can use a class (let's call it Complex) with two member variables (say, realC and imagC), and separate public methods for all the needed arithmetic operations (addition, subtraction, etc.).
But there a few problems with this. One problem is such programs may be more inefficient, because the Java compiler will be forced to create a new object for each instance of a complex number. For loops with hundreds of thousands of iterations, use of an intrinsic complex data type is obviously much faster.
Further, instead of being able to code a simple complex number equation such as c3=c1*c2,Java forces the scientist to code it, for example, such as c3=complexMultiply(c1,c2). This means the complex number "patch" class must be distributed with each Java program, and depending on how these classes are implemented and defined, Java library subroutines may not easily integrate with one another.
This also breaks the Golden Rule of Programming, that well-written programs should be easily understandable by people. As we'll discuss in Part 2, in a scientific program it's perfectly acceptable and even desirable to use variable names such as e, m,and c — because those are exactly how scientists write E=mc2! But by forcing such nice equations like c3=c1*c2to be written with notation like c3=complexMultiply(c1,c2)means that they become harder to debug and more prone to errors.
Just for fun, here's a snapshot of a scientific programmer's nightmare (see the sidebar "Scientific Programs Should be Understandable by Scientists").
|Scientific Programs Should be Understandable by Scientists|
Imagine a scientific program, which may have DOZENS of formulas that look similar to the example above.
In Fortran, the equation looks like this:
vp = CSQRT( (1-v**2/c**2)/(1-v/c) )
But in Java, it might look like this:
vp = complexSqrt( complexDivide( complexSubtract(1, (complexDivide( complexPow(v,2), complexPow(c,2)))),
However, the Java version won't give the right results, because there's (intentionally) a typing error!
Can you find it? Do you want to debug a scientific program with DOZENS of these formulas?
But not to worry: The language C also lacks an intrinsic complex data type, and this hasn't stopped scientists from writing programs in this language! There are some standards for managing complex numbers, and there's even hope that the makers of Java will see fit to include complex numbers as a future extension of the language.
The Precision Problem
The same Java program will give the same results on all systems. But ... will they be the right results?
For business applications, Java's arithmetic and mathematics offer certainly enough accuracy.
But for some applications, the IEEE 754 standard is actually a constraint. For example, if a hardware platform offers more than the required precision and Java insists that numbers be rounded, then not only may accuracy sacrificed but also execution speed. For those who are interested, here are some details of how Java handles numerics, more details, and even more details!
The prospective scientific Java programmer should rest easy on this issue: for the majority of the scientific community, IEEE 754 is enough and these "constraints" are likely to be unimportant.
The IDEs of March
Wander the halls of any scientific research institution and you'll see the same problem played out in many offices: To write and run Fortran or C program, a scientist will TELNET to a remote workstation (even a very slow machine!). That's where command-line compilers and ASCII text-editors live, and those are the only programming tools a scientist may know. The program output is then FTP'd back to the PC, where it is plotted and analyzed — frequently not without trouble because of linefeed/carriage-return differences. The whole procedure is then repeated until the scientist's fingers start to bleed.
It's a very sad loss of productivity since, say, the late 1980s, when a good scientific laboratory might have run a single VMS-based DEC system, loaded with all the tools a scientist needed.
The problem, of course, is not that the productivity has dropped, merely the relative productivity. Many scientists trained before about 1990 have never taken the time to explore the new tools available to programmers. Today, new scientists grow up with PCs in the home and they learn about these new scientific programming tools in college. In fact, scientific courses at universities are changing as a result of these tools.
So ... older scientists take note: Modern programs are written by modern programmers using modern IDEs. An IDE (integrated development environment) is a user-friendly shell which wraps the editor, compiler, debugger, and output window into one user-friendly package [please see Debugging in Java: Techniques for Bug Eradication
]. "I don't have the time to learn a specific new gadget" is a common excuse among some scientists. But really, it's no excuse because IDEs are as universal as CD players. There's a box to put the source code, buttons to click on to compile, start and stop it, and a little output window that shows what's playing.
Nevertheless, a scientist needs to exercise caution. IDEs bring with them a new set of problems — compatibility problems — because they may combine highly-portable source code with nonportable library functions or files. For Java the sitution is not so bad. Because all aspects of the Java language are standardized, graphics included, very little effort is required to write IDE-independent source code. We'll discuss this in more detail in Part 2.
The User-Interface Issue
Be they theoretical results from a numerical simulation, images from a microscope, or else experimental data points with errorbars — scientific data needs to be seen to be understood. There are excellent Fortran and C compilers for every type of machine, and these languages are almost completely portable — except for graphics.
Because graphics and GUIs are an intrinsic part of Java, they also enjoy "write-once, run-anywhere" status. This includes basic tools for setting up GUIs with windows and boxes and buttons, as well as advanced tools for image processing (such as the Java Advanced Imaging JAI classes). And because of Java's popularity, the scientist new to Java will find the Internet full of free tools and source code for visualizing data, plotting equations, analyzing images, etc.
|The Hardware Problem in the Scientific Laboratory|
Scientists working at the European Synchtrotron Radiation Facility (ESRF) in Grenoble, France, know the problems that mixed-hardware environments create.
The ESRF is one of over 30 multi-billion-dollar laboratories worldwide that provide scientists with access to an intense beam of x-rays. It's here where molecular biologists unravel the atomic structure of proteins, physicists explore the basic properties of matter, materials scientists synthesize new materials, and physicians design new methods to combat cancer.
Scientists from around the world travel here to perform experiments at the ESRF, which exclusively uses Unix-based workstations and software to control the experiments and collect the data. But upon returning home to analyze their data, many of these scientists either prefer or else need to use tools only available on PCs. There are some solutions to help bridge the hardware gap, but until now scientists have always been forced to chose one platform for writing new software, then "damage-control" the consequences.
The Hardware Independence
Scientific laboratories are filled with both high-performance Unix-based workstations as well as PCs. The workstations offer speed and stability. PCs offer a more user-friendly work environment with plenty of tools for data analysis — but also tools like spreadsheets, word processors, and presentation managers that scientists need to write reports and present their findings. As shown in our example (see the sidebar "The Hardware Problem in the Scientific Laboratory"), until now scientists have always been forced to chose one platform for writing new software, then "damage-control" the consequences.
The numerical standardization of Java provides the ultimate scientific solution to this hardware problem: from loop iteration (a problem in Fortran), to the choice of fdlibm mathematical functions, right down to the 64-bit IEEE 754 implementation of floating point arithmetic. This means "write-once, run-anywhere" scientific programs in Java will crunch numbers in exactly the same way on the workstation in the lab as the data are being collected, on the PC in the office, or even on a notebook computer during flights to scientific conferences!
So ... while Java may not have been designed with the scientist in mind, its powerful features make it an important if somewhat less-than-ideal platform for modern scientific programming. And with some creativity — and perhaps a bit of patience — any scientific programmer can take full advantage of what Java has to offer.