JVM JIT(JAMVM)

最新推荐文章于 2023-02-28 14:25:26 发布

置顶 OSTCB

最新推荐文章于 2023-02-28 14:25:26 发布

阅读量970

点赞数 1

分类专栏： jvm java Java 技术 Android技术文章标签： jvm java jit

本文链接：https://blog.csdn.net/ganyao939543405/article/details/84333848

版权

Android技术同时被 3 个专栏收录

57 篇文章 17 订阅

订阅专栏

java

17 篇文章 0 订阅

订阅专栏

Java 技术

13 篇文章 1 订阅

订阅专栏

简介

为了提高虚拟机的执行效率，对于解释执行的虚拟机来说，解析字节码指令和指令分发的开销时非常巨大的。
可以想象 Native Code 执行 1 + 2 仅仅需要一行机器指令，而栈则需要取指令->匹配指令 handler -> 跳转到 handler -> 执行 handler 至少 4 条机器指令。简单来说，执行一条 jvm 指令需要耗费数倍于同等机器指令的周期。
对于偶尔执行的方法或许还能接受，如果是频繁执行的方法，或者循环体内的代码，则会浪费大量的 CPU 时钟周期。
于是，JVM 选择在运行时将某些段 JVM 频繁执行的字节码编译成当前平台的机器码执行，热点代码的执行效率就有点接近原生代码了。

哪些代码需要 JIT

JVM 在决定哪些代码需要 JIT 时一定是有规则的。
早期 JVM 的规则比较简单，就是看哪些方法执行次数较多，超过设定的阈值之后便开始 JIT。
但是这个粒度其实很粗糙，即使在同一个方法内，也会有冷热代码之分，比如方法体内部有一个次数较高的循环，循环体内部就是热点代码，循环体外部就是冷代码。
目前代码监控分为几种方案：

简单方法计数器
较为复杂的计数器，有方法计数器和回边计数器
Trace
。。。。

关于 JAMVM 的实现:

如果编译时开启了 profiling，则会在运行时分析执行的热度来决定是否需要 JIT
否则，则在第一次解释执行代码的时候 JIT

profiling

开启 profiling 后，JAMVM 就会在待分析的代码块头部添加性能监控指令，OPC_PROFILE_REWRITER，那么，每次执行该块代码的时候，都会自增计数器。

添加性能监控指令

void addToProfile(MethodBlock *mb, BasicBlock *block, Thread *self) {
    ..............
    //Code 块开始指令被替换为性能分析入口
    block->start->handler = handler_entry_points[0][OPC_PROFILE_REWRITER];
    ..............
}

当执行到性能监控指令的时候

DEF_OPC_RW(OPC_PROFILE_REWRITER, ({
        void *handler = inlineProfiledBlock(pc, mb, FALSE);

        if(handler != NULL)
            goto *handler;
    });)

//如果已达到执行阈值，则在块的配置文件列表中搜索并内联。配置文件列表是按方法，块将添加到列表的头部。
//测试显示，在前2个条目中找到70％的搜索，在前4个条目中找到90％。这比使用表占用率的命中率降低的哈希表更加一致
void *inlineProfiledBlock(Instruction *pc, MethodBlock *mb, int force_inlining) {
	
    ............
    //遍历找到 profile info
    for(info = mb->profile_info; info != NULL && info->block->start != pc;
        info = info->next);
    
    //计数器自增，达到阈值则开始 JIT
    if(info != NULL && (force_inlining ||
                        info->profile_count++ >= profile_threshold)) {

        inlineBlock(mb, info->block, self);
        return NULL;
    }

    ret = info == NULL ? NULL : (void*)info->handler;
    ...........
    return ret;
}

开始 JIT

JAMVM 的 JIT 叫做 inlining，其实和内联类似，确切的说是代码拷贝，将每行字节码指令所对应的处理 label，我们叫 handler 的代码段拷贝连接起来。
这样在运行的时候就省去了分发代码的过程，这样的 JIT 其实和完整的 JIT 在效率上还是有些小的差距的。但是好在良好的可移植性，因为拷贝的是现成的代码。

可以看到在方法体内部排列着许多 label,每个 label 其实就是对应 JVM 指令的 Handler

void 解释器方法() {
	label_opc_add1:
		nativecode1
	label_opc_add2:
		nativecode2
	label_opc_add3:
		nativecode3
	label_opc_add4:
		nativecode4			
}

在 JAMVM 解析完一行行字节码指令后，每行代码都对应着一个 PC 指针，每个 PC 指针都有该行质量对应的 Handler 即指令对应 label 的代码地址。

goto 解释器解释执行

在这里插入图片描述

当 JIT inlining 方法中的部分代码之后

在这里插入图片描述

Inlining 会把待 JIT 的代码段中的每行 JVM 指令对应的 handler Native 代码逐个拷贝到 Cache 中，并且将拷贝完的 JVM 指令段整体替换成一个指向该 Cache 的 handler，并且再 Cache 末尾加上跳转回来的 Cache 指令。
这样我们就通过代码拷贝完成了一个 “JIT” 编译器，这样出来的代码跟正经的 JIT 还是有点性能差距的，好在这样便于移植，对平台汇编的依赖低。
具体性能低在哪里？除去 Hotspot 这种主流 JVM 中的 JIT 会对代码做的一些优化以外，这种内联拷贝的 JIT 在代码中，每行 JVM 指令对应的机器码还是会加上 “PC ++” 这种 PC 指针自增的代码。

实现

那么简单过一下 inlining JIT 的实现代码

首先对待内联的代码块做一些预处理，替换一些必要的的指令

//开始内联代码块
void inlineBlocks(MethodBlock *mb, BasicBlock *start, BasicBlock *end) {
    BasicBlock *block, *terminator = end->next;
    int ins_start = 0;
    //遍历代码块
    for(block = start; block != terminator; block = block->next) {
        int i;

        for(i = 0; i < block->length; i++) {
            int cache_depth = block->opcodes[i].cache_depth;
            int opcode = block->opcodes[i].opcode;
            int op1, op2, op3;

            /* The block opcodes contain the "un-quickened" opcode.
               This could have been quickened to one of several quick
               versions. */
            //替换指令到“快速指令”以适应 lining 后的运行   
            switch(opcode) {
                case OPC_LDC:
                    op1 = OPC_LDC_QUICK;
                    op2 = op3 = OPC_LDC_W_QUICK;
                    break;
            	........        
            }
     }
     ........
}

开始内联

void inlineSequence(MethodBlock *mb, BasicBlock *start, int ins_start,
                    BasicBlock *end, int ins_end) {
    CodeBlockHeader *hashed_block;
    TestCodeBlock *block;
    int code_len;
    //拷贝对应 handler 代码的末尾地址
    char *pntr;

    /* Calculate sequence length */
    //计算代码长度 = 所有字节码指令对应的 handler 代码长度 + 末尾 goto 跳转块的长度
    code_len = goto_len + blockSeqCodeLen(start, ins_start, end, ins_end);

    /* The prospective sequence is generated in malloc-ed memory
       so that an existing sequence can be found in the block cache
       even when no code memory is available */
    block = sysMalloc(code_len + sizeof(TestCodeBlock));

    /* Store length at beginning of sequence */
    block->code_len = code_len;

    /* Concatenate the handler bodies together */
    //拷贝对应的 handler 代码 *
    pntr = blockSeqCodeCopy(mb, block, start, ins_start, end, ins_end);

    /* Add the dispatch onto the end of the super-instruction */
    //代码块末尾拷贝 goto 语句的代码用于分发跳转到其他块
    memcpy(pntr, goto_start, goto_len);

    /* Look up new block in inlined block cache */
    hashed_block = findCodeBlock(block);
    sysFree(block);

    if(hashed_block != NULL) {
        TRACE("%s.%s Inlined sequence %d, %d\n",
              CLASS_CB(mb->class)->name, mb->name,
              INUM(mb, start, ins_start),
              INUM(mb, end, ins_end));

        /* Replace the start handler with new inlined block,
           and update block joins to point within the sequence */
        //更新 handler，替换被内联的指令   
        updateSeqStarts(mb, (char*)(hashed_block + 1), start, ins_start,
                        end, ins_end);
    }
}

首先需要计算拷贝的代码块的长度，然后为代码 Cache 分配空间

要注意的是每个 label 下对应的代码大小是在 JVM 启动时计算后保存在 handler_sizes 数组中的

static int handler_sizes[HANDLERS][LABELS_SIZE];
static int goto_len;

这里的 HANDLERS 对应前面说到的栈顶缓存的三个等级，因为每个等级都有一个对应的 Handler 实现
LABELS_SIZE 就是 JVM 指令对应的 Handler 数量了,也就是 label 的数量

#define LABELS_SIZE  256
#ifdef USE_CACHE
#define HANDLERS      3
#else
#define HANDLERS      1

遍历 LABEL 计算保存每个 LABEL 的大小

int calculateRelocatability(int handler_sizes[HANDLERS][LABELS_SIZE]) {
    .....
    //计算所有 label 对应的 jit code 即 native code 的大小，并保存到 handler_sizes 中
    for(i = 0; i < HANDLERS; i++) {
        int j;

        memcpy(sorted_ends, handlers1[END_LABELS+i], LABELS_SIZE * sizeof(char *));
        qsort(sorted_ends, LABELS_SIZE, sizeof(char *), compare);

        for(j = 0; j < LABELS_SIZE; j++) {
            char *entry = handlers1[ENTRY_LABELS+i][j];
            char *end = handlers1[END_LABELS+i][j];
            int len = end - entry;

            if(len > 0) {
                char *nearest_end = findNextLabel(sorted_ends, entry);

                if(nearest_end == end) {
                    if(memcmp(entry, handlers2[ENTRY_LABELS+i][j], len) != 0)
                        len = MEMCMP_FAILED;
                } else
                    len = END_REORDERED;
            } else
                len = END_BEFORE_ENTRY;

            handler_sizes[i][j] = len;
        }
    }
    return goto_len;
}

开始拷贝代码

实际就是便利代码块中的 JVM 指令，找到对应 LABEL 中的代码逐一拷贝到 Cahce 中

//拷贝 Inline Code
char *insSeqCodeCopy(char *code_pntr, Instruction *ins_start_pntr, char **map,
                     BasicBlock **patchers, BasicBlock *block, int start,
                     int len) {

    Instruction *instructions = &block->start[start];
    OpcodeInfo *opcodes = &block->opcodes[start];
    int opcode = OPC_NOP, size = 0, depth, i;

    map[instructions - ins_start_pntr] = code_pntr;

    for(i = 0; i < len; i++) {
        code_pntr += size;
        opcode = opcodes[i].opcode;
        depth = opcodes[i].cache_depth;
        size = handler_sizes[depth][opcode];
        //拷贝字节码指令对应的 label 下的 native 代码
        memcpy(code_pntr, instructions[i].handler, size);
    }

    if(branch_patching && opcode >= OPC_IFEQ && opcode <= OPC_JSR) {
        block->u.patch.addr = code_pntr + branch_patch_offsets[depth]
                                                  [opcode - OPC_IFEQ];
        block->u.patch.next = *patchers;
        *patchers = block;
    }

    return code_pntr + size;
}

拷贝代码段末尾的 goto 跳转指令，让 CPU 在 JIT 代码顺序执行完成之后能跳到下一段代码或者跳回 goto 解释器。

 memcpy(pntr, goto_start, goto_len);

把拷贝完成的代码块塞到 Method 结构体中，替换原有的指令

void updateSeqStarts(MethodBlock *mb, char *code_pntr, BasicBlock *start,
                     int ins_start, BasicBlock *end, int ins_end) {

    TRACE("Updating start block (%d len %d) %p\n", INUM(mb, start, ins_start),
          start->length - ins_start, code_pntr);

    start->start[ins_start].handler = code_pntr;
    MBARRIER();

    if(start != end) {
        code_pntr += insSeqCodeLen(start, ins_start, start->length - ins_start);

        for(start = start->next; start != end; start = start->next) {
            TRACE("Updating block join (%d len %d) %p\n", INUM(mb, start, 0),
                  start->length, code_pntr);

            start->start->handler = code_pntr;
            MBARRIER();

            code_pntr += insSeqCodeLen(start, 0, start->length);
        }

        TRACE("Updating end block (%d len %d) %p\n", INUM(mb, end, 0),
              ins_end + 1, code_pntr);

        end->start->handler = code_pntr;
        MBARRIER();
    }
}

这样解释器下一次经过该段代码时就回顺着 handler 进到被 JIT 的代码

OSTCB

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
JVM JIT(JAMVM)

简介为了提高虚拟机的执行效率，对于解释执行的虚拟机来说，解析字节码指令和指令分发的开销时非常巨大的。可以想象 Native Code 执行 1 + 2 仅仅需要一行机器指令，而栈则需要取指令-&amp;amp;gt;匹配指令 handler -&amp;amp;gt; 跳转到 handler -&amp;amp;gt; 执行 handler 至少 4 条机器指令。简单来说，执行一条 jvm 指令需要耗费数倍于同等机器指令的周期。对于偶尔...
复制链接

扫一扫

专栏目录