JVM学习笔记之七

最新推荐文章于 2021-10-19 09:30:00 发布

17号技师

最新推荐文章于 2021-10-19 09:30:00 发布

阅读量191

点赞数

分类专栏： java JVM

本文链接：https://blog.csdn.net/weixin_44737877/article/details/105752970

版权

java 同时被 2 个专栏收录

34 篇文章 2 订阅

订阅专栏

JVM

13 篇文章 0 订阅

订阅专栏

6. 运行期优化

6.1 即时编译

分层编译（TieredCompilation）

先看看一个例子

public class JIT1 {
    public static void main(String[] args) {
        for (int i = 0; i < 200; i++) {
            long star = System.nanoTime();
            for (int j = 0; j < 1000; j++) {
                new Object();
            }
            long end = System.nanoTime();
            System.out.printf("%d\t%d\n", i, (end - star));
        }
    }
}
===================输出=================================
0	39400
1	27100
2	25700
3	31600
......
......
196	400
197	400
198	400
199	400

原因是什么呢？

JVM 将执行状态分为 5 各层次

0 层，解析执行（Interpreter）
1 层，使用 C1 即时编译器编译执行（不带 profiling）
2 层，使用 C1 即时编译器编译执行（带基本的 profiling）
3 层，使用 C1 即时编译器编译执行（带完全的 profiling）
4 层，使用 C2 即时编译器编译执行

profiling是指在运行过程中收集一些程序执行状态的数据，例如【方法的调用次数】，【循环的回边次数】等

即时编译器（JIT）于解析器的区别

解析器是将字节码解析为机器码，下次即使遇到相同的字节码，仍会重复执行
JIT 是将一些字节码编译为机器码，并存入 Code Cache，下次遇到相同的代码，直接执行，无需再编译
解析器是将字节码解析为针对所有平台都通用的机器码
JIT 会根据平台的类型，生成平台特定的机器码

对于占据大部分的不常用的代码，我们无需耗费时间将其编译成机器码，而是采取解析执行的方式运行；另一方面，对于占据小部分的热点代码，我们则可以将其编译成机器码，以达到理想的运行速度。执行效率上简单比较比较一下 Interpreter < C1 < C2 ，总的目标是发现热点代码（hotspot 名称的由来），优化之。
刚才的一种优化手段称之为【逃逸分析】，发现新建的对象是否逃逸（也就是在外部没有使用到该对象）。可以使用 -XX:-DoEscapeAnalysis 关闭逃逸分析，再运行刚才示例代码

/**
 * @description: -XX:+PrintCompilation -XX:-DoEscapeAnalysis
 *                  关闭逃逸分析
 * @author: Seldom
 * @time: 2020/4/25 17:13
 */
public class JIT1 {
    public static void main(String[] args) {
        for (int i = 0; i < 200; i++) {
            long star = System.nanoTime();
            for (int j = 0; j < 1000; j++) {
                new Object();
            }
            long end = System.nanoTime();
            System.out.printf("%d\t%d\n", i, (end - star));
        }
    }
}
====================输出===============================
0	42800
1	27700
2	28100
3	27900
......
197	10100
198	9100
199	9200

方法内联（Inlining）

public class JIT2 {
    private static int square(final int i) {
        return i * i;
    }

    public static void main(String[] args) {
        System.out.println(square(9));
    }
}

如果发现 square 是热点方法，并且长度不天太长时，会进行内联，所谓的内联就是把方法内代码拷贝、粘贴到调用者的位置：

System.out.println(9 * 9);

还能进行常量折叠（constant folding）的优化

System.out.println(81);

实验：

import java.util.concurrent.ForkJoinPool;

/**
 * @description:    -XX:+UnlockDiagnosticVMOptions
 *                  -XX:+PrintInlining 打印是否内联，配合上一个
 *                  -XX:CompileCommand=dontinline,*JIT3.square
 *                  -xx:PrintCompilation
 * @author: Seldom
 * @time: 2020/4/25 17:13
 */
public class JIT3 {

    public static void main(String[] args) {
        int x = 0;
        for (int i = 0; i < 500; i++) {
            long start = System.nanoTime();
            for (int j = 0; j < 1000; j++) {
                x = square(9);
            }
            long end = System.nanoTime();
            //System.out.printf("%d\t%d\t%d\n", i, x, (end - start));
        }
    }

    private static int square(final int i) {
        return i * i;
    }
}
=========================================
1. 不使用任何参数，发现有输出0的
2. 禁用后就不会出现0了

字段优化

JMH 基准测试请参考：http://openjdk.java.net/projects/code-tools/jmh/
创建 maven 工程，添加依赖

    <dependency>
      <groupId>org.openjdk.jmh</groupId>
      <artifactId>jmh-core</artifactId>
      <version>1.21</version>
    </dependency>
    <dependency>
      <groupId>org.openjdk.jmh</groupId>
      <artifactId>jmh-generator-annprocess</artifactId>
      <version>1.21</version>
      <scope>provided</scope>
    </dependency>

编写基准测试代码

package jmh;


import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.Random;
import java.util.concurrent.ThreadLocalRandom;

/**
 * @description:
 * @Warmup 热身，充分优化
 * @Measurement 几轮测试
 * @author: Seldom
 * @time: 2020/4/25 20:03
 */
@Warmup(iterations =  3, time = 1)
@Measurement(iterations = 5, time = 1)
@State(Scope.Benchmark)
public class JMH {

    int[] elements = randomInts(1_000);

    private static int[] randomInts(int size){
        Random random = ThreadLocalRandom.current();
        int[] values = new int[size];
        for (int i = 0; i < size; i++) {
            values[i] = random.nextInt();
        }
        return values;
    }

    @Benchmark
    public void test1() {
        for (int i = 0; i < elements.length; i++) {
            doSum(elements[i]);
        }
    }

    @Benchmark
    public void test2() {
        int[] local = this.elements;
        for (int i = 0; i < local.length; i++) {
            doSum(local[i]);
        }
    }

    @Benchmark
    public void test3() {
        for (int element : elements) {
            doSum(element);
        }
    }

    static int sum = 0;


    /**
     * 运行的时候是否进行内联
     * Mode.INLINE 允许内联
     * Mode.DONT_INLINE 不允许
     * @param x
     */
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    static void doSum(int x) {
        sum+=x;
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(JMH.class.getSimpleName())
                .forks(1)
                .build();
        new Runner(opt).run();
    }
}
=================允许内联=======================================
Benchmark   Mode  Cnt        Score         Error  Units
JMH.test1  thrpt    5  2746974.821 ±  608243.266  ops/s
JMH.test2  thrpt    5  2516252.427 ± 1856789.216  ops/s
JMH.test3  thrpt    5  2807244.854 ±  134342.018  ops/s
==================不允许========================================
Benchmark   Mode  Cnt       Score       Error  Units
JMH.test1  thrpt    5  349174.504 ± 26968.673  ops/s
JMH.test2  thrpt    5  439675.968 ± 12869.592  ops/s
JMH.test3  thrpt    5  438972.916 ± 12290.416  ops/s
===============================================================
ops/s 吞吐量
Score 得分

分析

在刚才的示例当中，doSum 方法是否内联会影响 elements 成员变量读取的优化
如果刚才 doSum 方法内联了，刚才的 test1 方法会被优化成下面的样子（伪代码）：

 	@Benchmark
    public void test1() {
    	// elements.length 首次读取回缓存起来 -> int[] local
        for (int i = 0; i < elements.length; i++) { // 后续 999 次求长度 <- local
            sum += elements[i]; // 1000 次取下标 i 的元素 <- local
        }
    }

可以节省 1999 次 Field 读取操作
但如果 doSum 方法没有内联，则不会进行上面的优化

可以练习一下：elements 添加 volatile 属性，看看区别

===================内联==============================
Benchmark   Mode  Cnt        Score       Error  Units
JMH.test1  thrpt    5   559935.572 ± 51157.743  ops/s
JMH.test2  thrpt    5  2832519.761 ± 36512.447  ops/s
JMH.test3  thrpt    5  2840574.083 ± 59400.081  ops/s
==================不内联==============================
Benchmark   Mode  Cnt        Score       Error  Units
JMH.test1  thrpt    5  302513.195 ±  37563.712  ops/s
JMH.test2  thrpt    5  393454.706 ± 143898.389  ops/s
JMH.test3  thrpt    5  391034.345 ± 160287.677  ops/s

6.2 反射优化

package reflect;

import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;

/**
 * @description: 反射优化
 * @author: Seldom
 * @time: 2020/4/25 20:55
 */
public class Reflect1 {
    public static void foo() {
        System.out.println("foo...");
    }

    public static void main(String[] args) throws Exception {
        Method foo = Reflect1.class.getMethod("foo");
        for (int i = 0; i <= 16; i++) {
            System.out.printf("%d\t", i);
            foo.invoke(null);
        }
        System.in.read();
    }
}

foo.invoke 前面 0~15 次调用使用的是 MethodAccessor 的 NativeMethodAccessorImpl 实现

========NativeMethodAccessorImpl 种的判断语句=============
++this.numInvocations > ReflectionFactory.inflationThreshold()
==============继续查看===================================
private static int inflationThreshold = 15; // 阈值

使用 **arthas-boot.jar
**

debug 运行代码
查看 var3.getClass()
记录这个值 sun.reflect.GeneratedMethodAccessor1
运行代码
cmd 运行 java -jar arthas-boot.jar
选择自己的 java 进程
连接后输入 help 可以查看帮助
输入 jad sun.reflect.GeneratedMethodAccessor1

/*
 * Decompiled with CFR.
 *
 * Could not load the following classes:
 *  reflect.Reflect1
 */
package sun.reflect;

import java.lang.reflect.InvocationTargetException;
import reflect.Reflect1;
import sun.reflect.MethodAccessorImpl;

public class GeneratedMethodAccessor1
extends MethodAccessorImpl {
    /*
     * Loose catch block
     * Enabled aggressive block sorting
     * Enabled unnecessary exception pruning
     * Enabled aggressive exception aggregation
     * Lifted jumps to return sites
     */
    public Object invoke(Object object, Object[] arrobject) throws InvocationTargetException {
    	// 比较奇葩的做法，如果有参数，那么抛非法参数异常
        block4: {
            if (arrobject == null || arrobject.length == 0) break block4;
            throw new IllegalArgumentException();
        }
        try {
        	// 可以看到，已经是直接调用了
            Reflect1.foo();
            // 因为没有返回值
            return null;
        }
        catch (Throwable throwable) {
            throw new InvocationTargetException(throwable);
        }
        catch (ClassCastException | NullPointerException runtimeException) {
            throw new IllegalArgumentException(super.toString());
        }
    }
}

注意

sun.reflect.noInflation 可以用来禁用膨胀（直接生成 GeneratedMethodAccessort1，但首次生成比较耗时，如果仅反射一次，不划算）
sun.reflect.inInflationThreshold 可以修改膨胀阈值

17号技师

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
JVM学习笔记之七

6. 运行期优化6.1 即时编译分层编译（TieredCompilation）先看看一个例子public class JIT1 { public static void main(String[] args) { for (int i = 0; i < 200; i++) { long star = System.nanoTime(...
复制链接

扫一扫