Java 虚拟机是如何工作的

引言

在高级程序设计语言,如:CC++中,我们用一种人类可读的格式写程序,然后由一个叫编译器的程序把它翻译成一种二进制格式的可执行代码,这种代码能被机器理解和执行。可执行代码依赖于我们用来执行程序的计算机;它是设备相关的。在Java中,程序的编写和执行的过程是非常相似的,但是有一个重要差别是它允许我们写出设备无关的代码。

利用一个编译器,所有的Java程序都能被编译成一种叫字节码的中间级代码。我们可以将编译后的字节码运行在任何一台安装了Java运行时环境的计算机上。Java运行时环境由Java虚拟机和它的配套代码组成。

Java虚拟机是一个模拟设备

创建Java字节码的一个难点是这些字节码是为一个不存在的设备编译的。这个设备叫做Java虚拟机,它只存在于我们计算机的内存中。让Java编译器为一个不存在的设备创建字节码只是让Java架构变成中性(设备无关)的巧妙过程的一半。Java解释器必须让我们的电脑和字节码文件觉得它们是运行在一个真实的设备上。Java解释器在虚拟机和真实设备之间充当中间人来完成这个任务。(见下图)

1 – Java虚拟机在一个物理设备上模拟运行

Java虚拟的负责翻译Java字节码,将其翻译翻译成行为或操作系统调用。例如:一个建立一个远程设备socket连接的请求会包含一个操作系统调用。不同的操作系统用不同的方式处理socket连接,但是程序员并不需要担心这些细节。处理这些翻译是Java虚拟机的任务,所以开发人员完全不必关心运行Java软件的计算机的操作系统和CPU架构的差异。(见下图)

2 – Java虚拟机处理(字节码)翻译的过程

Java虚拟机的基本组成部分

在我们的电脑内存中创建一个虚拟机需要构造真实计算机没一个主要功能以及程序执行的环境。这些功能可以被分为七大基本部分:

·         一系列的寄存器

·         一个栈

·         一个执行环境

·         一个垃圾回收堆

·         一个常量池

·         一个方法存储区

·         一个指令集

寄存器

Java虚拟机的寄存器和我们计算机中的计算机是相似的。然而,由于虚拟机试基于栈的,它的寄存器不是用于传递和接收参数。在Java中,寄存器保存机器的状态以及在每行字节码执行执行后进行状态更新,保持状态。下面的四个寄存器保存了虚拟机的状态:

·         框架指针寄存器(frame):包含指向当前方法执行环境的指针。

·         操作数栈顶指针寄存器(optop):包含了指向操作对数栈的栈顶指针, 用于算数表达式求值。

·         程序计数器寄存器(pc):包含了下一个要被执行的字节码的地址。

·         变量寄存器(vars包含了指向局部变量的指针。

这些寄存器都是32位的,且能立即分配。这也许是因为编译器需要知道局部变量的大小和操作数栈,以及翻译器需要知道执行环境的大小。

Java虚拟机用操作数栈为方法和操作提供参数,然后将结果返回。所有的字节码指令从栈中获取操作数,在操作数上进行操作(运算),并且将结果返回到栈中。和虚拟机的寄存器一样,操作数栈的位宽也是32位。

操作数栈遵循后进先出的原则,并且要求栈中的操作数有一个特定的顺序。例如:字节码指令isub需要在栈顶存放两个整数,这也就意味着以前指令集的操作必须在栈中压入了两个整数。isub指令退出栈顶的两个操作数,对他们进行减法运算,然后将运算结果压入栈中。

Java中,整型是一种原始数据类型。每种原始数据类型都有特定的指令来对该类型的数据进行操作(运算)。例如:lsub指令用来执行长整型的减法,fsub用来执行浮点数的减法,dsub用来执行双精度浮点数的减法。真是因为这样,将两个整型数据放在栈顶,然后把他们当做一个长整型数据时非法的。然后将一个64位长度的长整型数据放入栈中,它在栈中占用两个32位的位置。

我们Java程序中的每一个方法都有与之相对应的堆框架(stack frame),堆框架保存方法状态需要三中类型的数据:方法的局部变量、方法的执行环境以及方法的操作数栈。尽管局部变量区和执行环境数据集的大小一般是在方法调用之前就分配了,操作数栈的大小随着方法的字节码指令的执行而改变。由于Java栈是32位的,64位的操作数不能保证64位对齐。

执行环境

执行环境作为一个数据集保存在栈中,它用来处理动态链接、正常方法返回和异常的产生。为了处理动态链接,执行环境包含方法的符号引用、当前方法和当前类的变量。这些符号调用会通过动态链接到符号表翻译成实际的方法调用。

每当一个方法正常完成的时候,调用方法将获得一个返回值。执行环境处理正常的方法返回时通过恢复调用者的寄存器和增加调用者的程序计数器值以跳过方法执行的指令来实现的。

 

如果当前方法的运行正常完成,调用方法将会获得一个返回值。调用方法执行一个具有正确返回值的返回指令时完成这个操作。

如果调用方法执行一个具有不正确返回值类型的返回指令时,方法会抛出一个异常或错误。异常可能发生在动态链接失败(如:无法找到类文件),或者运行时错误(如:数组引用超出数组范围)。当错误发生时,执行环境生成一个异常。

垃圾回收堆

每个运行在Java运行时环境的程序都会有一个垃圾回收堆分配给它。由于类的实例对象都是从堆里分配,这个堆也叫做内存分配池。在大多数系统中,堆的大小被默认设置为1MB

尽管堆的大小在我们启动程序的时候就设定了,但它可以扩大,例如:当一个新对象被分配是,为了保证堆不会变的过大,那些不在使用的对象将会自动销毁或者由Java虚拟机进行垃圾回收。

Java后台线程自动执行垃圾回收,每个在Java运行时环境中运行的线程用于与之相关的两个栈:第一个栈用于Java代码,第二个栈用与垃圾收集代码(C code)。这些栈所用的内存从总系统内存池中获取。每当一个线程开始执行,它被分配一个最大栈用于Java代码和垃圾收集代码。在大多数系统中为Java代码分配的最大栈空间默认是400KB,为垃圾收集代码分配的最大栈空间默认是128KB

如果我们系统用内存限制,我们可以强制Java执行更激进的清理操作从而减少总的内存使用量。这个通过缩减Java代码和垃圾收集代码的最大空间来实现。如果我们的系统拥有大量的内存,我们可以迫使Java执行更少侵略性的清理,因此减少了后台处理的数量。这个通过增加Java代码和垃圾收集代码的最大空间来实现。

常量池

在堆中的每个类都有一个与之相关的常量池。由于常量池不会改变,它们通常在编译的时候创建,常量池中的项对特定类中的任何方法中用到的(常量)名称进行编码。这类里包含出现常量的个数,以及指定一连串特定常量在类描述里的偏移量。

所有的常量相关信息遵循基于常量类型的特定格式。例如:类级别的常量通常用于表示一个类或一个接口,并且拥有如下的格式:

CONSTANT_Class_info {
    u1   tag;
    u2 name_index;
}

其中tag是常量类的值, name_index是类的名称. int[][]的类名称是[[I. Thread[]的类名称是[Ljava.lang.Thread;.

方法(代码)区

Java 的方法区类似于其它编程语言运行时环境中的编译后的代码区域。他存储与编译后的代码中的方法相关的字节码指令,以及需要用于动态链接的执行环境符号表。任何调试信息以及可能需要的方法有关的其它信息也存储在这个区域。

字节码指令集

尽管程序员更喜欢用高级格式写代码,我们的电脑不能直接执行这些代码,这就是我们为什么要在Java程序运行之前对其进行编译的愿意。一般来说,编译代码并不是机器可读的代码(机器代码)格式,也不是一种中级格式的代码,如:汇编语言或Java字节码。

Java虚拟机所用的字节码指令类似于汇编指令。如果你曾经用过汇编语言,你就知道指令集为了更高的效率将自身分解到最小限度,对于在屏幕上输出之类的任务,要通过使用一系列的指令来完成。例如,Java语言允许我们只用一行指令就能在屏幕上进行输出,就像代码:

System.out.println("Hello world!"); 

在编译的时候,Java编译器将这行输入语句转换成以下的字节码:

   0 getstatic #6 <Field java.lang.System.out Ljava/io/PrintStream;>
   3 ldc #1 <String "Hello world!"> 
   5 invokevirtual #7 <Method java.io.PrintStream.println(Ljava/lang/String;)V> 
   8 return 

Java开发工具包(JDK)提供一种叫Java类文件反编译程序的查验字节码的工具。我们可以在命令行中输入javap命令来执行反编译。

由于字节码指令时基于一种低级(语言)格式的,我们的程序执行速度接近于程序被编译成机器语言的执行速度。所有的机器指令都用一系列的01表示。在低级语言中,01的系列被一些合适的助记符代替,如字节码指令isub。类似于汇编语言,字节码指令的基本格式是:

<operation>   <operands(s)> 

因此,字节码指令集中的指令时有1字节的指定了要执行的操作的操作码,和操作所需要0个或者多个参数或数据组成。

总结

Java虚拟机只存在于我们电脑的内存之中。在电脑的内存中再创造一个设备需要七大关键组成:一系列寄存器、一个栈(stack)、一个执行环境、一个垃圾回收堆、一个常量池、一个方法存储区和一种将它们联系起来的机制。这种机制是字节码指令集。

为了调查字节码,我们用javaclass文件进行反编译。通过详细调查字节码指令,我们获取了关于Java虚拟机内部工作(原理)和Java本身的有价值见解。每个字节码指令执行一个特定范围的有限功能,例如:将一个对象压入栈中或从栈中取出一个对象。这些基本功能的组合表述了Java编程语言中定义的复杂的高级语句。这样看起来很棒,有些时候许多字节码指令只是为了实现一个简单Java语句操作。当我们应用这些字节码指令与Java虚拟机的七大组成时,Java获得了它的平台无关性并成为了世界上最强大和最通用的编程语言。

 

文章来源:http://www.codeproject.com/Articles/30422/How-the-Java-Virtual-Machine-JVM-Works

 

Introduction

In high-level programming languages such as C and C++, we write a program in a human-readable format, and a program called a compiler translates it to a binary format called executable code that the computer can understand and execute. The executable code depends upon the computer machine that we use to execute our program; it is machine dependent. In Java, this process of writing to executing a program is very similar, but with one important difference that allows us to write Java programs that are machine independent.

Using an interpreter, all Java programs are compiled to an intermediate level called byte code. We can run the compiled byte code on any computer with the Java runtime environment installed on it. The runtime environment consists of a virtual machine and its supporting code.

JVM is an Emulation

The difficult part of creating Java byte code is that the source code is compiled for a machine that does not exist. This machine is called the Java Virtual Machine, and it exists only in the memory of our computer. Fooling the Java compiler into creating byte code for a nonexistent machine is only one-half of the ingenious process that makes the Java architecture neutral. The Java interpreter must also make our computer and the byte code file believe they are running on a real machine. It does this by acting as the intermediary between the Virtual Machine and our real machine. (See figure below.)

Figure 1 - JVM emulation run on a physical machine

The Java Virtual Machine is responsible for interpreting Java byte code and translating this into actions or Operating System calls. For example, a request to establish a socket connection to a remote machine will involve an Operating System call. Different Operating Systems handle sockets in different ways - but the programmer doesn't need to worry about such details. It is the responsibility of the JVM to handle these translations so that the Operating System and the CPU architecture on which the Java software is running is completely irrelevant to the developer. (See figure below.)

Figure 2 - JVM handles translations

The Basic Parts of the Java Virtual Machine

Creating a Virtual Machine within our computer's memory requires building every major function of a real computer down to the very environment within which programs operate. These functions can be broken down into seven basic parts:

·         A set of registers

·         A stack

·         An execution environment

·         A garbage-collected heap

·         A constant pool

·         A method storage area

·         An instruction set

Registers

The registers of the Java Virtual Machine are similar to the registers in our computer. However, because the Virtual Machine is stack based, its registers are not used for passing or receiving arguments. In Java, registers hold the machine's state, and are updated after each line of byte code is executed, to maintain that state. The following four registers hold the state of the virtual machine:

·         frame, the reference frame, and contains a pointer to the execution environment of the current method.

·         optop, the operand top, and contains a pointer to the top of the operand stack, and is used to evaluate arithmetic expressions.

·         pc, the program counter, and contains the address of the next byte code to be executed.

·         vars, the variable register, and contains a pointer to local variables.

All these registers are 32 bits wide, and are allocated immediately. This is possible because the compiler knows the size of the local variables and the operand stack, and because the interpreter knows the size of the execution environment.

The Stack

The Java Virtual Machine uses an operand stack to supply parameters to methods and operations, and to receive results back from them. All byte code instructions take operands from the stack, operate on them, and return results to the stack. Like registers in the Virtual Machine, the operand stack is 32 bits wide.

The operand stack follows the last-in first-out (LIFO) methodology, and expects the operands on the stack to be in a specific order. For example, the isub byte code instruction expects two integers to be stored on the top of the stack, which means that the operands must have been pushed there by the previous set of instructions. isub pops the operands off the stack, subtracts them, and then pushes the results back onto the stack.

In Java, integers are a primitive data type. Each primitive data type has unique instructions that tell it how to operate on operands of that type. For example, the lsub byte code is used to perform long integer subtraction, the fsub byte code is used to perform floating-point subtraction, and the dsub byte code is used to perform long integer subtraction. Because of this, it is illegal to push two integers onto the stack and then treat them as a single long integer. However, it is legal to push a 64-bit long integer onto the stack and have it occupy two 32-bit slots.

Each method in our Java program has a stack frame associated with it. The stack frame holds the state of the method with three sets of data: the method's local variables, the method's execution environment, and the method's operand stack. Although the sizes of the local variable and the execution environment data sets are always fixed at the start of the method call, the size of the operand stack changes as the method's byte code instructions are executed. Because the Java stack is 32 bits wide, 64-bit numbers are not guaranteed to be 64-bit aligned.

The Execution Environment

The execution environment is maintained within the stack as a data set, and is used to handle dynamic linking, normal method returns, and exception generation. To handle dynamic linking, the execution environment contains symbolic references to methods and variables for the current method and current class. These symbolic calls are translated into actual method calls through dynamic linking to a symbol table.

Whenever a method completes normally, a value is returned to the calling method. The execution environment handles normal method returns by restoring the registers of the caller and incrementing the program counter of the caller to skip the method call instruction. Execution of the program then continues in the calling method's execution environment.

If execution of the current method completes normally, a value is returned to the calling method. This occurs when the calling method executes a return instruction appropriate to the return type.

If the calling method executes a return instruction that is not appropriate to the return type, the method throws an exception or an error. Errors that can occur include dynamic linkage failure, such as a failure to find a class file, or runtime errors, such as a reference outside the bounds of an array. When errors occur, the execution environment generates an exception.

The Garbage-Collected Heap

Each program running in the Java runtime environment has a garbage-collected heap assigned to it. Because instances of class objects are allocated from this heap, another word for the heap is memory allocation pool. By default, the heap size is set to 1MB on most systems.

Although the heap is set to a specific size when we start a program, it can grow, for example, when new objects are allocated. To ensure that the heap does not get too large, objects that are no longer in use are automatically deallocated or garbage-collected by the Java Virtual Machine.

Java performs automatic garbage collection as a background thread. Each thread running in the Java runtime environment has two stacks associated with it: the first stack is used for Java code; the second is used for C code. Memory used by these stacks is drawn from the total system memory pool. Whenever a new thread starts execution, it is assigned a maximum stack size for the Java code and for the C code. By default, on most systems, the maximum size of the Java code stack is 400KB, and the maximum size of the C code stack is 128KB.

If our system has memory limitations, we can force Java to perform more aggressive cleanup and thus reduce the total amount of memory used. To do this, reduce the maximum size of the Java and C code stacks. If our system has lots of memory, we can force Java to perform less aggressive cleanup, thus reducing the amount of background processing. To do this, increase the maximum size of the Java and C code stacks.

The Constant Pool

Each class in the heap has a constant pool associated with it. Because constants do not change, they are usually created at compile time. Items in the constant pool encode all the names used by any method in a particular class. The class contains a count of how many constants exist, and an offset that specifies where a particular listing of constants begins within the class description.

All information associated with a constant follows a specific format based on the type of the constant. For example, class-level constants are used to represent a class or an interface, and have the following format:

Collapse | Copy Code

CONSTANT_Class_info {
    u1   tag;
    u2 name_index;
}

where tag is the value of CONSTANT_Class, and the name_index provides the string name of the class. The class name for int[][] is [[I. The class name for Thread[] is [Ljava.lang.Thread;.

The Method Area

Java's method area is similar to the compiled code areas of the runtime environments used by other programming languages. It stores byte code instructions that are associated with methods in the compiled code, and the symbol table the execution environment needs for dynamic linking. Any debugging or additional information that might need to be associated with a method is stored in this area as well.

The Byte Code Instruction Set

Although programmers prefer to write code in a high-level format, our computer cannot execute this code directly, which is why we must compile Java programs before we can run them. Generally, compiled code is either in a machine-readable format called machine language or in an intermediate-level format such as the assembly language or Java byte code.

The byte code instructions used by the Java Virtual Machine resemble Assembler instructions. If you have ever used Assembler, you know that the instruction set is streamlined to a minimum for the sake of efficiency, and that tasks, such as printing to the screen, are accomplished using a series of instructions. For example, the Java language allows us to print to the screen using a single line of code, such as:

Collapse | Copy Code

System.out.println("Hello world!"); 

At compile time, the Java compiler converts the single-line print statement to the following byte code:

Collapse | Copy Code

   0 getstatic #6 <Field java.lang.System.out Ljava/io/PrintStream;>
   3 ldc #1 <String "Hello world!"> 
   5 invokevirtual #7 <Method java.io.PrintStream.println(Ljava/lang/String;)V> 
   8 return 

The JDK provides a tool for examining byte code called the Java class file disassembler. We can run the disassembler by typing javap at the command line.

Because the byte code instructions are in such a low-level format, our programs execute at nearly the speed of programs compiled to machine language. All instructions in machine language are represented by byte streams of 0s and 1s. In a low-level language, byte streams of 0s and 1s are replaced by suitable mnemonics, such as the byte code instruction isub. As with assembly language, the basic format of a byte code instruction is:

Collapse | Copy Code

<operation>   <operands(s)> 

Therefore, an instruction in the byte code instruction set consists of a 1-byte opcode specifying the operation to be performed, and zero or more operands that supply parameters or data that will be used by the operation.

Summary

The Java Virtual Machine exists only in the memory of our computer. Reproducing a machine within our computer's memory requires seven key objects: a set of registers, a stack, an execution environment, a garbage-collected heap, a constant pool, a method storage area, and a mechanism to tie it all together. This mechanism is the byte code instruction set.

To examine byte code, we can use the Java class file disassembler, javap. By examining bytecode instructions in detail, we gain valuable insight into the inner workings of the Java Virtual Machine and Java itself. Each byte code instruction performs a specific function of extremely limited scope, such as pushing an object onto the stack or popping an object off the stack. Combinations of these basic functions represent the complex high-level tasks defined as statements in the Java programming language. As amazing as it seems, sometimes dozens of byte code instructions are used to carry out the operation specified by a single Java statement. When we use these byte code instructions with the seven key objects of the Virtual Machine, Java gains its platform independence and becomes the most powerful and versatile programming language in the world.

 

 

  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值