JVM is a very important part for JAVA. It stands for Java Virtual Machine.
The gap between human to machine
Before understanding the theory of how JVM works, we'd better be aware of why we called Java, C, and C# as high level language.
The common characteristics of those languages are they are human readable, but they are not machine readable. Before the machine running the application written by high level languages, the source code must be converted to a specific format that can be understood by computer (CPU), no matter which language you use.
//People: it's easy, just multiplication between x and y.
int x = 1;
int y = 2;
int z;
z = x*y;
//Machine: sorry, I cannot understand your code.
Computers understand only one language, the machine code. Machine code is a sequence of binary digits which only contains 1 or 0. Owing the manufacturers will decide the meanings of each combination of sequence of bits, it extremely important to convert the source code into the proper machine code for the corresponding CPU, or application/ software cannot be performed as expected, beacuse the CPU cannot concisely understand it (I called it compatiable issues).
Who can compensate the gap?
To make up the gap between human and machine, in other words, converting the high level source code to machine code, we must need a "translator" to do this work. Such a translator is called compiler.
As mentioned earlier, there exists multiple CPUs, so we need separate compilers for different hardwares (CPUs). For example, the same C code will have to be compiled using Apple Macintosh compatiable compilers in order to run on the Apple computers; if the users also want to run the same code on Windows running on the Intel platform, then user need another C-compiler for Windows.
Simply put a compiler converts a source code file (which is a simple text file) into an executable file that can be run on the host computer. But in effect, the process is more complex than it.
Below is an example of how C compiler works (the details pls see link: http://www.codeproject.com/Articles/1825/The-Common-Language-Runtime-CLR-and-Java-Runtime-E#_interpreters):
Note:
1. if the picture above is lost, see the copy in the "My Picture"!
What is interpreters?
Looks similar but different with compiler, interpreters are another extreme to running programming languages. Pure interpreters do not do translation work like compilers.
Interpreters take the code written by high level language code and execute them one by one, so Pure interpreters have no chance to do any code optimizations at all. And it also unable to check the syntax like compilers.
Examples of pure interpreters are some scripting languages that interact with operating systems. The shell scripts in Linux, the Batch files (.bat) and command files (.cmd) in Windows are all examples of pure interpreted languages.
Below is a figure shows how pure interpreters work:
What is the hybrid approach?
But most of the popular modern languages are not pure interpreter based, they are either compiled (like C and C++) or hybrid approach (like Java).
Below is a figure that shows how the hybrid compiler-interpreter work:
As is obvious from the above diagrams, today's popular interpreted languages are not purely-interpreted. They follow the "compilation" technique to produce an intermediate code (e.g. Microsoft's Intermediate Language - MSIL, Sun's Java Byte Code etc.). It is this intermediate language that the interpreter works on, and not the original high level source code. This approach rids (avoids) many of the problems inherent in pure-interpreted languages, and gives many of the advantages of fully-compiled languages.
The execution mechanism of compiled and interpreted language:
A compiler does this conversion off-line and in one go (as discussed in the Who can compensate the gap? section); whereas the interpreter does this conversion one-program statement-by-one.
A compiled program runs in a fetch-execute cycle whereas an interpreted program runs in adecode-fetch-execute cycle. The decoding is done by the interpreter, whereas the fetch and execute operations are done by the CPU. In an interpreter the bottleneck is the decoding phase, and hence an interpreted program may be 30-100% slower than a compiled program.
Below are two figures that illustrates the flow of execution of compilers (first figure) and interpreters (second figure):
It is evident from the above flowcharts, that an interpreted program has an overhead of decoding each statement one-by-one; thus in an interpreted program the bottleneck is the decoding process.
Both compiled and interpreted approaches have their own advantages and disadvantages, the details are not seeked later. Readers must NOTE THAT both of those two approaches eventually convert the source code to machine language, but the process are different.
Compare and Contrast Compiled and Interpreted languages (extreme important to link the concept of compiler&interpreter with the next section which discuss the Java platform independence, JIT compiler and .NET IL compiler):
Languages can be developed either as fully-compiled, pure-interpreted, or hybrid compiled-interpreted. As a matter of fact, most of the current programming languages have both a compiled and interpreted versions available.
Both compiled and interpreted approaches have their advantages and disadvantages. Let's start with the compiled languages.
Compiled languages (Sample: C and C++)
- One of the biggest advantages of Compiled languages is their execution speed. A program written in C/C++ runs 30-70 % faster then an equivalent program written in Java.
- Compiled code also takes less memory as compared to an interpreted program.
- On the down side - a compiler is much more difficult to write than an interpreter.
- A compiler does not provide much help in debugging a program - how many times have you received a "Null pointer exception" in your C code and have spent hours trying to figure out where in your source code did the exception occurred. (Maybe this is the reason of why debugging C program is such an annoying work!!!)
- The executable Compiled code is much bigger in size than an equivalent interpreted code e.g. a C/C++ .exe file is much bigger than an equivalent Java .class file
- Compiled programs are targeted towards a particular platform and hence are platform dependent.
- Compiled programs do not allow security to be implemented with in the code - e.g. a compiled program can access any area of the memory, and can do whatever it wants with your PC (most of the viruses are made in compiled languages).
- Due to loose security and platform dependence - a compiled language is not particularly suited to be used to develop Internet or web-based applications.
Interpreted languages
- Interpreted language provides excellent debugging support. A Java programmer only spends a few minutes fixing a "Null pointer exception", because Java runtime not only specifies the nature of exception but also gives the exact line number and function call sequence (the famous stack trace information) where the exception occurred. This facility is something that a compiled language can never provide.
- Another advantage is that Interpreters are much easier to build then a compiler.
- One of the biggest advantages of Interpreters is that they make platform-independence possible.
- Interpreted language also allow high degree of security - something badly needed for an Internet application.
- An intermediate language code size is much smaller than a compiled executable code.
- Platform independence, and tight security are the two most important factors that make an interpreted language ideally suited for Internet and web-based applications.
- Interpreted languages have some serious drawbacks. The interpreted applications take up more memory and CPU resources. This is because in order to run a program written in interpreted language; the corresponding interpreter must be run first. Interpreters are sophisticated, intelligent and resource hungry programs and they take up lot of CPU cycles and RAM.
- Due to interpreted application's decode-fetch-execute cycle; they are much slower than compiled programs.
- Interpreters also do lot of code-optimization, security violation checking at run-time; these extra steps take up even more resources and further slows the application down.
Platform dependence issues for compiled languages:
As explained above, after the compilers compile the source code to the .obj code, then a linker converts it to an executable code. Both the .obj and the executable code are mahince/ platform dependent.
In brief, C/ C++ are platform dependent and it is a shortcoming of it.
How about Java?
To develop a Java application, there are a package you must have: the JDK (Java Development Kit) and install it on the computer. Like the SDK (Software Development Kit) of other languages, the JDK is a comprehensive set of software that includes all the bits and pieces required for developing Java applications.
JDK includes:
- JVM (Java Virtual Machine)
- JRE (Java Runtime Environment) - Note that JVM is actually a part of JRE.
- Java packages and framework classes
- Javac (compiler)
- Java debugger.
After complete the application, programmer can use compiler to compile the source code (.java) and produce the class file (.class). The class file is an intermediate java byte code file.
The byte code file is tricky becasue this file is the machine independent intermediate code that can be executed on any computer that with the JRE installed.
What makes Java the platform independence is the UBIQUITY of JRE. JREs are available for most of the commercial and popular platforms. Programmers compelete the code once and the same program will run on any platform.
Note that the JDK must compatiable with the platform, which means that differnt platform need to install different JDK. See the below figure:
What is JVM? (extract from web, see the resource at reference section)
Before I discuss the JVM in details, let me clarify a few related terms.
- Java Development Kit (JDK): This includes ALL the basic Java framework packages, a compiler (javac), JRE, a JVM, debugger etc. in short all you need to develop, debug, compile and run our Java program.
- Java Runtime Environment (JRE): This is a subset of the JDK. It does not include a debugger, compiler, and framework classes. This includes the bare minimum that a computer needs in order to run a .class file (mainly JVM and essential APIs).
- Java Virtual Machine (JVM): JVM is a part of JRE. The .class file is passed over to JVM which then runs the program. The JRE ensures that the code does not violate any of the security restrictions. Remember that the byte-code (.class file) is not directly run on the host machine; it needs to be converted to the host machine's language. This conversion is done by the JVM. While converting the JVM ensures the security and may also optimize the code. There are many commercial JVMs available in the market - different JVMs have different capabilities, and varying degree of performance. In order to produce efficient, code with minimum delay a JVM needs to have great amount of intelligence built into it. Which would also make the JVM larger in size. Remember that for a Java program to run, the JVM must be loaded in the memory, and it is obvious that a large sized JVM would need much more computer resources than a compact one. So there has to be a fine balance between the size of a JVM and its capabilities. This is why a Java program is always 30-70% slower than equivalent C++ program.
The initial JVMs were extremely slow and were resource hungry - because actually, it interprets the byte code. In recent years lot of efficient JVMs have surfaced. These JVMs use different compilation techniques to produce efficient machine code in as less a time as possible. One such technique is called Just-In-Time (JIT) compilation (introduced since Java 1.1). This technique has also been used in .NET.
Just In Time Compilation (JIT):
Just In Time Compilation (JIT): JIT compilation is neither a traditional compiler (ahead-of-time compiler) nor a pure interpreter, it is a compiler, but it work like an interpreter, it is a hybrid beast! See below:
1. JIT works not before the execution of the program, but along with the program (along with the program running is what looks like an interpreter, rather than a compiler, but it still not interpreting). DO NOT THINK that the bytecode has been translated into native machine code already before you run the program! It is WRONG! The bytecode will be performed by JVM (exactly JIT) when you just start to run the program (anyway, it is part of the runtime environment).
2. Even start to run the program, JIT does not compile all the bytecode, it contains sophisticated logic to decide when to compile which part of the bytecode. This is why this approach of compile named Just In Time compilation.
The name "Hotspot" of Sun (Oracle) JVM is chosen because of the ability of this Virtual Machine to find "hot" spots in code.
What optimizations does JIT?
Let's look closely at more optimizations done by JIT.
Inline methods - instead of calling method on an instance of the object it copies the method to caller code. The hot methods should be located as close to the caller as possible to prevent any overhead.
Eliminate locks if monitor is not reachable from other threads
Replace interface with direct method calls for method implemented only once to eliminate calling of virtual functions overhead
Join adjacent synchronized blocks on the same object
Eliminate dead code
Drop memory write for non-volatile variables
Remove prechecking NullPointerException and IndexOutOfBoundsException
Et cetera
Below is another piece of description about JIT in wikipedia:
Just-in-time (JIT) compilers promise to improve the performance of Java applications. Rather than letting the JVM run byte code, a JIT compiler translates code into the host machine's native language. Thus, applications gain the performance enhancement of compiled code while maintaining Java's portability.
Although the JIT compile provides great improvement in program's execution speed (compared with the initial pure interpreted process), it involves the overhead of converting the byte-code to native code at runtime. It is for this reason that despite the JIT the Java programs are still slower that an equivalent C/C++ program.
A Java Applet is a special Java program that is only allowed to run inside a browser window. When you embed a Java Applet in your web page, the browser sees the Applet tag and downloads the byte code (the .class file) for the applet from the specified location. Once the byte code is downloaded, the browser uses the JVM (included in the browser itself) to run the Applet, ensuring that the Applet does not execute any insecure APIs - mainly the APIs that access the client machine hardware.
Given the concept of the JVM, it is obvious that any programming language that compiles into Java byte code can use the JVM for running the program. We are all aware of how Java code (.java) is converted into byte code (.class) which is then run by the JVM on the host machine. What if we make a compiler of C++, that converts a C++ source file (.c or .cpp) into a java-byte code file (.class) rather than into an .obj file. Theoretically it is possible, whether it is practical or not is a different issue all together. In fact there have been many languages that have compilers which produce java byte code that can then be run by the JVM, for example, Groovy. This article belittles Microsoft's claim that the CLR is the only platform to support the language antagonism. JVM can also (and in fact already is) be used by different languages.
See the below figure that illustrates how JVM works in brief:
TO BE CONTINUED (CLR part is not demonstrated in thei article, see my another article c# stuff which you must know that discuss the CLR knowledge particularly)!
Conclusion:
Notice:
this article is adapted from the:
The Common Language Runtime (CLR) and Java Runtime Environment (JRE)
written by Kashif Manzoor .