在 OpenJDK 中,Java 字节码由解释器(Interpreter)或编译器执行。解释器是 JVM 启动后首先用来执行字节码的模块。而解释器的核心机制之一,就是“分发表(Dispatch Table)机制”,它实现了字节码的快速派发。
本篇文章将基于 OpenJDK 17 的源码,深入剖析解释器中的分发表的初始化、入口设置与使用流程,特别关注 TemplateInterpreterGenerator 类中的关键实现。
一、什么是分发表(Dispatch Table)?
分发表是一个根据字节码值(Bytecodes::Code)快速跳转到对应处理代码地址的表。在解释执行过程中,解释器会根据当前字节码值,从分发表中查出对应的处理函数地址,并跳转执行。
它本质上是一个以字节码为索引的函数指针表,如下图:
dispatch_table[0x60] -> entry for iadd dispatch_table[0x2a] -> entry for aload_0 ...
二、初始化流程概览
解释器初始化分发表的过程主要由以下几个函数完成:
void TemplateInterpreterGenerator::set_entry_points_for_all_bytes(); void TemplateInterpreterGenerator::set_entry_points(Bytecodes::Code code); void TemplateInterpreterGenerator::set_short_entry_points(...); void TemplateInterpreterGenerator::set_vtos_entry_points(...);
整个流程如下:
-
遍历所有字节码。
-
判断字节码是否被定义。
-
如果是合法字节码,调用
set_entry_points()设置执行入口。 -
否则调用
set_unimplemented(),标记为非法或未实现。
三、set_entry_points_for_all_bytes 源码解读
void TemplateInterpreterGenerator::set_entry_points_for_all_bytes() { for (int i = 0; i < DispatchTable::length; i++) { Bytecodes::Code code = (Bytecodes::Code)i; if (Bytecodes::is_defined(code)) { set_entry_points(code); } else { set_unimplemented(i); } } }
要点:
-
DispatchTable::length表示字节码总数(通常为 256)。 -
使用
Bytecodes::is_defined()判断字节码是否有效。 -
有效字节码设置执行入口,无效则标记为
_unimplemented_bytecode。
四、设置字节码的 EntryPoint
void TemplateInterpreterGenerator::set_entry_points(Bytecodes::Code code) { Template* t = TemplateTable::template_for(code); ... set_short_entry_points(t, ...); set_wide_entry_point(t, ...); ... Interpreter::_normal_table.set_entry(code, entry); Interpreter::_wentry_point[code] = wep; }
每个字节码有多种执行路径(基于 TOS 类型):
-
bep: byte entry point -
cep: char -
iep: int -
lep: long -
fep: float -
dep: double -
aep: reference -
vep: void
解释器会根据运行时栈顶的数据类型(TOS,Top Of Stack)选择跳转到不同的入口地址。
五、生成具体入口代码
1. set_short_entry_points()
void TemplateInterpreterGenerator::set_short_entry_points(...) { switch (t->tos_in()) { case itos: vep = __ pc(); __ pop(itos); iep = __ pc(); generate_and_dispatch(t); break; case atos: ... ... case vtos: set_vtos_entry_points(t, ...); break; } }
此处根据字节码使用的数据类型,设置对应的弹栈(__pop())和 generate_and_dispatch() 分发逻辑。
2. set_vtos_entry_points()
专门处理栈顶为 vtos(void)的场景,首先模拟将各种类型的数据重新压入栈顶,然后统一跳转到执行逻辑。
void TemplateInterpreterGenerator::set_vtos_entry_points(...) { Label L; lep = __ pc(); __ push_l(); __ jmp(L); ... vep = __ pc(); __ bind(L); generate_and_dispatch(t); }
六、解释器执行中的分发表使用
dispatch_next()
每次执行完一条字节码后,解释器通过分发表跳转到下一条指令的处理函数。
void InterpreterMacroAssembler::dispatch_next(TosState state, int step, bool generate_poll) { load_unsigned_byte(rbx, Address(_bcp_register, step)); // 获取下一个字节码 increment(_bcp_register, step); // 移动 bcp dispatch_base(state, Interpreter::dispatch_table(state), true, generate_poll); }
dispatch_table(state)
static address* dispatch_table(TosState state) { return _active_table.table_for(state); }
返回当前 TOS 类型对应的分发表(8张表之一)。
七、生成字节码执行逻辑(generate_and_dispatch)
address TemplateInterpreterGenerator::generate_normal_entry(bool synchronized) { ... __ dispatch_next(vtos); }
在解释器的主执行路径中,dispatch_next() 被调用以跳转到下一条字节码执行入口。
八、支持调试输出(可选)
代码中还注入了调试打印逻辑,用于观察方法执行情况:
__ call(RuntimeAddress( CAST_FROM_FN_PTR(address, TemplateInterpreterGenerator::print_debug_info) ));
打印内容包括:
-
当前执行的方法名
-
参数数量
-
局部变量数量
-
调用栈情况
这些内容可以帮助我们理解解释器初始化局部变量、分发表设置等关键细节。
九、总结
分发表机制是解释器实现中提升执行效率的关键。OpenJDK 使用分发表将每条字节码快速映射到对应的执行入口。通过:
-
初始化所有字节码入口
-
根据 TOS 设置多种分发路径
-
支持调试和异常字节码处理
解释器得以在结构化和可扩展的框架下高效地处理 Java 字节码执行。
##源码
void TemplateInterpreterGenerator::set_entry_points_for_all_bytes() {
for (int i = 0; i < DispatchTable::length; i++) {
Bytecodes::Code code = (Bytecodes::Code)i;
if (Bytecodes::is_defined(code)) {
set_entry_points(code);
} else {
set_unimplemented(i);
}
}
}
void TemplateInterpreterGenerator::set_safepoints_for_all_bytes() {
for (int i = 0; i < DispatchTable::length; i++) {
Bytecodes::Code code = (Bytecodes::Code)i;
if (Bytecodes::is_defined(code)) Interpreter::_safept_table.set_entry(code, Interpreter::_safept_entry);
}
}
void TemplateInterpreterGenerator::set_unimplemented(int i) {
address e = _unimplemented_bytecode;
EntryPoint entry(e, e, e, e, e, e, e, e, e, e);
Interpreter::_normal_table.set_entry(i, entry);
Interpreter::_wentry_point[i] = _unimplemented_bytecode;
}
void TemplateInterpreterGenerator::set_entry_points(Bytecodes::Code code) {
CodeletMark cm(_masm, Bytecodes::name(code), code);
// initialize entry points
assert(_unimplemented_bytecode != NULL, "should have been generated before");
assert(_illegal_bytecode_sequence != NULL, "should have been generated before");
address bep = _illegal_bytecode_sequence;
address zep = _illegal_bytecode_sequence;
address cep = _illegal_bytecode_sequence;
address sep = _illegal_bytecode_sequence;
address aep = _illegal_bytecode_sequence;
address iep = _illegal_bytecode_sequence;
address lep = _illegal_bytecode_sequence;
address fep = _illegal_bytecode_sequence;
address dep = _illegal_bytecode_sequence;
address vep = _unimplemented_bytecode;
address wep = _unimplemented_bytecode;
// code for short & wide version of bytecode
if (Bytecodes::is_defined(code)) {
Template* t = TemplateTable::template_for(code);
assert(t->is_valid(), "just checking");
set_short_entry_points(t, bep, cep, sep, aep, iep, lep, fep, dep, vep);
}
if (Bytecodes::wide_is_defined(code)) {
Template* t = TemplateTable::template_for_wide(code);
assert(t->is_valid(), "just checking");
set_wide_entry_point(t, wep);
}
// set entry points
EntryPoint entry(bep, zep, cep, sep, aep, iep, lep, fep, dep, vep);
Interpreter::_normal_table.set_entry(code, entry);
Interpreter::_wentry_point[code] = wep;
}
void TemplateInterpreterGenerator::set_wide_entry_point(Template* t, address& wep) {
assert(t->is_valid(), "template must exist");
assert(t->tos_in() == vtos, "only vtos tos_in supported for wide instructions");
wep = __ pc(); generate_and_dispatch(t);
}
void TemplateInterpreterGenerator::set_short_entry_points(Template* t, address& bep, address& cep, address& sep, address& aep, address& iep, address& lep, address& fep, address& dep, address& vep) {
assert(t->is_valid(), "template must exist");
switch (t->tos_in()) {
case btos:
case ztos:
case ctos:
case stos:
ShouldNotReachHere(); // btos/ctos/stos should use itos.
break;
case atos: vep = __ pc(); __ pop(atos); aep = __ pc(); generate_and_dispatch(t); break;
case itos: vep = __ pc(); __ pop(itos); iep = __ pc(); generate_and_dispatch(t); break;
case ltos: vep = __ pc(); __ pop(ltos); lep = __ pc(); generate_and_dispatch(t); break;
case ftos: vep = __ pc(); __ pop(ftos); fep = __ pc(); generate_and_dispatch(t); break;
case dtos: vep = __ pc(); __ pop(dtos); dep = __ pc(); generate_and_dispatch(t); break;
case vtos: set_vtos_entry_points(t, bep, cep, sep, aep, iep, lep, fep, dep, vep); break;
default : ShouldNotReachHere(); break;
}
}
void TemplateInterpreterGenerator::set_vtos_entry_points(Template* t,
address& bep,
address& cep,
address& sep,
address& aep,
address& iep,
address& lep,
address& fep,
address& dep,
address& vep) {
assert(t->is_valid() && t->tos_in() == vtos, "illegal template");
Label L;
#ifndef _LP64
fep = __ pc(); // ftos entry point
__ push(ftos);
__ jmp(L);
dep = __ pc(); // dtos entry point
__ push(dtos);
__ jmp(L);
#else
fep = __ pc(); // ftos entry point
__ push_f(xmm0);
__ jmp(L);
dep = __ pc(); // dtos entry point
__ push_d(xmm0);
__ jmp(L);
#endif // _LP64
lep = __ pc(); // ltos entry point
__ push_l();
__ jmp(L);
aep = bep = cep = sep = iep = __ pc(); // [abcsi]tos entry point
__ push_i_or_ptr();
vep = __ pc(); // vtos entry point
__ bind(L);
generate_and_dispatch(t);
}
address TemplateInterpreterGenerator::generate_normal_entry(bool synchronized) {
// determine code generation flags
bool inc_counter = UseCompiler || CountCompiledCalls || LogTouchedMethods;
// ebx: Method*
// rbcp: sender sp
address entry_point = __ pc();
const Address constMethod(rbx, Method::const_offset());
const Address access_flags(rbx, Method::access_flags_offset());
const Address size_of_parameters(rdx,
ConstMethod::size_of_parameters_offset());
const Address size_of_locals(rdx, ConstMethod::size_of_locals_offset());
// get parameter size (always needed)
__ movptr(rdx, constMethod);
__ load_unsigned_short(rcx, size_of_parameters);
// rbx: Method*
// rcx: size of parameters
// rbcp: sender_sp (could differ from sp+wordSize if we were called via c2i )
__ load_unsigned_short(rdx, size_of_locals); // get size of locals in words
__ subl(rdx, rcx); // rdx = no. of additional locals
// YYY
// __ incrementl(rdx);
// __ andl(rdx, -2);
// see if we've got enough room on the stack for locals plus overhead.
generate_stack_overflow_check();
//yym-gaizao
// #ifdef DEBUG_PRINT_METHOD_NAME
// ---yym--- 打印代码移动到堆栈检查之后
{
// 保存寄存器状态
__ push(rax);
__ push(rcx);
__ push(rdx);
__ push(rdi);
__ push(rsi);
__ push(r8);
__ push(r9);
__ push(r10);
__ push(r11);
NOT_LP64(__ get_thread(r15_thread));
__ push(r15); // 共保存 10 个寄存器 = 80 字节
// 计算原始 RSP: 当前 RSP + 保存的寄存器大小 + 红色区域
const int saved_regs_size = 10 * wordSize; // 10个寄存器 * 8字节
const int red_zone_size = 128; // 红色区域大小
__ lea(rsi, Address(rsp, saved_regs_size + red_zone_size));
// 准备参数
__ movptr(rdi, rbx); // Method*
__ mov(r8, rcx); // params_size
__ mov(r9, rdx); // locals_size
// 对齐栈指针 (16字节对齐)
__ subptr(rsp, 32);
// 安全调用
__ call(RuntimeAddress(
CAST_FROM_FN_PTR(address,
TemplateInterpreterGenerator::print_debug_info)
));
__ addptr(rsp, 32); // 恢复栈指针
// 恢复寄存器
__ pop(r15);
NOT_LP64(__ restore_thread(r15_thread));
__ pop(r11);
__ pop(r10);
__ pop(r9);
__ pop(r8);
__ pop(rsi);
__ pop(rdi);
__ pop(rdx);
__ pop(rcx);
__ pop(rax);
}
// #endif
// get return address
__ pop(rax);
// compute beginning of parameters
__ lea(rlocals, Address(rsp, rcx, Interpreter::stackElementScale(), -wordSize));
// rdx - # of additional locals
// allocate space for locals
// explicitly initialize locals
{
Label exit, loop;
__ testl(rdx, rdx);
__ jcc(Assembler::lessEqual, exit); // do nothing if rdx <= 0
__ bind(loop);
__ push((int) NULL_WORD); // initialize local variables
__ decrementl(rdx); // until everything initialized
__ jcc(Assembler::greater, loop);
__ bind(exit);
}
// initialize fixed part of activation frame
generate_fixed_frame(false);
// make sure method is not native & not abstract
#ifdef ASSERT
__ movl(rax, access_flags);
{
Label L;
__ testl(rax, JVM_ACC_NATIVE);
__ jcc(Assembler::zero, L);
__ stop("tried to execute native method as non-native");
__ bind(L);
}
{
Label L;
__ testl(rax, JVM_ACC_ABSTRACT);
__ jcc(Assembler::zero, L);
__ stop("tried to execute abstract method in interpreter");
__ bind(L);
}
#endif
// Since at this point in the method invocation the exception
// handler would try to exit the monitor of synchronized methods
// which hasn't been entered yet, we set the thread local variable
// _do_not_unlock_if_synchronized to true. The remove_activation
// will check this flag.
const Register thread = NOT_LP64(rax) LP64_ONLY(r15_thread);
NOT_LP64(__ get_thread(thread));
const Address do_not_unlock_if_synchronized(thread,
in_bytes(JavaThread::do_not_unlock_if_synchronized_offset()));
__ movbool(do_not_unlock_if_synchronized, true);
__ profile_parameters_type(rax, rcx, rdx);
// increment invocation count & check for overflow
Label invocation_counter_overflow;
if (inc_counter) {
generate_counter_incr(&invocation_counter_overflow);
}
Label continue_after_compile;
__ bind(continue_after_compile);
// check for synchronized interpreted methods
bang_stack_shadow_pages(false);
// reset the _do_not_unlock_if_synchronized flag
NOT_LP64(__ get_thread(thread));
__ movbool(do_not_unlock_if_synchronized, false);
// check for synchronized methods
// Must happen AFTER invocation_counter check and stack overflow check,
// so method is not locked if overflows.
if (synchronized) {
// Allocate monitor and lock method
lock_method();
} else {
// no synchronization necessary
#ifdef ASSERT
{
Label L;
__ movl(rax, access_flags);
__ testl(rax, JVM_ACC_SYNCHRONIZED);
__ jcc(Assembler::zero, L);
__ stop("method needs synchronization");
__ bind(L);
}
#endif
}
// start execution
#ifdef ASSERT
{
Label L;
const Address monitor_block_top (rbp,
frame::interpreter_frame_monitor_block_top_offset * wordSize);
__ movptr(rax, monitor_block_top);
__ cmpptr(rax, rsp);
__ jcc(Assembler::equal, L);
__ stop("broken stack frame setup in interpreter");
__ bind(L);
}
#endif
// jvmti support
__ notify_method_entry();
__ dispatch_next(vtos);
// invocation counter overflow
if (inc_counter) {
// Handle overflow of counter and compile method
__ bind(invocation_counter_overflow);
generate_counter_overflow(continue_after_compile);
}
return entry_point;
}
void InterpreterMacroAssembler::dispatch_next(TosState state, int step, bool generate_poll) {
// load next bytecode (load before advancing _bcp_register to prevent AGI)
load_unsigned_byte(rbx, Address(_bcp_register, step));
// advance _bcp_register
increment(_bcp_register, step);
dispatch_base(state, Interpreter::dispatch_table(state), true, generate_poll);
}
static address* dispatch_table(TosState state) { return _active_table.table_for(state); }
address* table_for(TosState state) { return _table[state]; }
980

被折叠的 条评论
为什么被折叠?



