JMH例子详解1-11（共38个）

最新推荐文章于 2022-03-29 15:43:07 发布

一本郑经

最新推荐文章于 2022-03-29 15:43:07 发布

阅读量1.7k

点赞数

分类专栏： JMH 文章标签： java 基准测试官方例子

本文链接：https://blog.csdn.net/guojun8446/article/details/115602883

版权

JMH 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

前言

源码：https://github.com/frank4600/jmh
JMH专栏文章

1 HelloWorld

public class JMHSample_01_HelloWorld {

    @Benchmark
    public void wellHelloThere() {
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(JMHSample_01_HelloWorld.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();
    }

}

输出

# JMH version: 1.29
# VM version: JDK 1.8.0_221, Java HotSpot(TM) 64-Bit Server VM, 25.221-b11
# VM invoker: C:\Program Files\Java\jdk1.8.0_221\jre\bin\java.exe
# VM options: -Dfile.encoding=UTF-8 -Duser.country=CN -Duser.language=zh -Duser.variant
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.frank.kmh.samples.JMHSample_01_HelloWorld.wellHelloThere

# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration   1: 4274070288.284 ops/s
# Warmup Iteration   2: 4267169096.855 ops/s
# Warmup Iteration   3: 4062262047.465 ops/s
# Warmup Iteration   4: 3972325977.661 ops/s
# Warmup Iteration   5: 3871290584.386 ops/s
Iteration   1: 3756577746.113 ops/s
Iteration   2: 3839831352.045 ops/s
Iteration   3: 3914366005.824 ops/s
Iteration   4: 3984215516.669 ops/s
Iteration   5: 4030862703.572 ops/s


Result "com.frank.kmh.samples.JMHSample_01_HelloWorld.wellHelloThere":
  3905170664.844 ±(99.9%) 423921676.486 ops/s [Average]
  (min, avg, max) = (3756577746.113, 3905170664.844, 4030862703.572), stdev = 110091113.162
  CI (99.9%): [3481248988.359, 4329092341.330] (assumes normal distribution)


# Run complete. Total time: 00:01:40

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                Mode  Cnt           Score           Error  Units
JMHSample_01_HelloWorld.wellHelloThere  thrpt    5  3905170664.844 ± 423921676.486  ops/s

使用 @Benchmark 来标记需要基准测试的方法，然后需要写一个 main 方法来启动基准测试。

2 BenchmarkModes

这个例子主要讲基准测试的几个种模式，先看下主要代码：

@Benchmark
    @BenchmarkMode(Mode.Throughput)
    @OutputTimeUnit(TimeUnit.SECONDS)
    public void measureThroughput() throws InterruptedException {
        TimeUnit.MILLISECONDS.sleep(100);
    }

    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void measureAvgTime() throws InterruptedException {
        TimeUnit.MILLISECONDS.sleep(100);
    }

    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void measureSamples() throws InterruptedException {
        TimeUnit.MILLISECONDS.sleep(100);
    }

    @Benchmark
    @BenchmarkMode(Mode.SingleShotTime)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void measureSingleShot() throws InterruptedException {
        TimeUnit.MILLISECONDS.sleep(100);
    }

    @Benchmark
    @BenchmarkMode({Mode.Throughput, Mode.AverageTime, Mode.SampleTime, Mode.SingleShotTime})
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void measureMultiple() throws InterruptedException {
        TimeUnit.MILLISECONDS.sleep(100);
    }

    @Benchmark
    @BenchmarkMode(Mode.All)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public void measureAll() throws InterruptedException {
        TimeUnit.MILLISECONDS.sleep(100);
    }

@OutputTimeUnit

各种模式使用的单位时间，如第1个方法统计每秒执行多少次调用，第2个方法统计平均每个调用要多少毫秒

@BenchmarkMode

基准测试的模式

Mode.Throughput 吞吐量模式，每单位时间执行多少次调用
Mode.AverageTime 平均时间模式，平均每次调用的执行时间
Mode.SampleTime 采样模式，取99%，95%等数据进行计算
Mode.SingleShotTime 只运行一次。往往同时把 warmup 次数设为0，用于测试冷启动时的性能。
<以上4种自由组合> [Mode.Throughput, Mode.AverageTime, Mode.SampleTime, Mode.SingleShotTime }
Mode.All 4种模式都用上

为了测试方便，你可以减少迭代次数，降低迭代时间
以下例子预热1次，运行10毫秒；迭代2次，运行50毫秒

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(JMHSample_02_BenchmarkModes.class.getSimpleName())
                .forks(1)
                .warmupIterations(1).warmupTime(TimeValue.milliseconds(10))
                .measurementIterations(2).measurementTime(TimeValue.milliseconds(50))
                .build();

        new Runner(opt).run();
    }

以下是最后的结果，（采样的p0.00指的是什么，还不知道）

Benchmark                                                              Mode  Cnt       Score   Error   Units
JMHSample_02_BenchmarkModes.measureAll                                thrpt    2      ≈ 10⁻⁵          ops/us
JMHSample_02_BenchmarkModes.measureMultiple                           thrpt    2      ≈ 10⁻⁵          ops/us
JMHSample_02_BenchmarkModes.measureThroughput                         thrpt    2      10.007           ops/s
JMHSample_02_BenchmarkModes.measureAll                                 avgt    2  100140.100           us/op
JMHSample_02_BenchmarkModes.measureAvgTime                             avgt    2  100215.850           us/op
JMHSample_02_BenchmarkModes.measureMultiple                            avgt    2  100202.400           us/op
JMHSample_02_BenchmarkModes.measureAll                               sample    2  100073.472           us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.00              sample       100007.936           us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.50              sample       100073.472           us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.90              sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.95              sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.99              sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.999             sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.9999            sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p1.00              sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureMultiple                          sample    2   99876.864           us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.00    sample        99614.720           us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.50    sample        99876.864           us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.90    sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.95    sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.99    sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.999   sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.9999  sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p1.00    sample       100139.008           us/op
JMHSample_02_BenchmarkModes.measureSamples                           sample    2   99811.328           us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.00      sample        99614.720           us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.50      sample        99811.328           us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.90      sample       100007.936           us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.95      sample       100007.936           us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.99      sample       100007.936           us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.999     sample       100007.936           us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.9999    sample       100007.936           us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p1.00      sample       100007.936           us/op
JMHSample_02_BenchmarkModes.measureAll                                   ss    2  100370.350           us/op
JMHSample_02_BenchmarkModes.measureMultiple                              ss    2  100890.550           us/op
JMHSample_02_BenchmarkModes.measureSingleShot                            ss    2  100254.750           us/op

3 State

@State 的用法，用于多线程的测试,可以像 spring 一样自动注入这些变量。

@State(Scope.Thread)：作用域为线程，可以理解为一个 ThreadLocal 变量
@State(Scope.Benchmark)：作用域为本次 JMH 测试，线程共享
@State(Scope.Group)：作用域为 group，将在后文看到

例子1 Scope.Thread

public class JMHSample_03_States_Thread {
    @State(Scope.Thread)
    public static class ThreadState {
        volatile int x = 0;
    }
    @Benchmark
    public void measureUnshared(ThreadState state) throws Exception{
        state.x++;
        TimeUnit.MILLISECONDS.sleep(800);
        System.out.println(state.x);
    }
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(JMHSample_03_States_Thread.class.getSimpleName())
                .threads(3)
                .forks(1)
                .warmupIterations(1).warmupTime(TimeValue.milliseconds(10))
                .measurementIterations(2).measurementTime(TimeValue.seconds(3))
                .build();
        new Runner(opt).run();
    }
}

输出

Iteration   1: 3
4
3
4
4
5
5
6
5
7
6
6
8
7
3.726 ops/s

threads是开启3个线程，相同的值会输出3遍

例子2 Scope.Benchmark

@State(Scope.Benchmark)
    public static class ThreadState {
        volatile AtomicInteger x = new AtomicInteger(0);
    }
    @Benchmark
    public void measureUnshared(ThreadState state) throws Exception{

        TimeUnit.MILLISECONDS.sleep(800);
        System.out.println(state.x.addAndGet(1));
    }
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(JMHSample_03_States_Benchmark.class.getSimpleName())
                .threads(3)
                .forks(1)
                .warmupIterations(1).warmupTime(TimeValue.milliseconds(10))
                .measurementIterations(2).measurementTime(TimeValue.seconds(3))
                .build();
        new Runner(opt).run();
    }

# Fork: 1 of 1
# Warmup Iteration   1: 2
3
1
4
6
5
7
8
3.740 ops/s
Iteration   1: 10
9
11
12
13
14
15
17
16
18
20
19
21
22

共享一个对象，用线程安全的Integer。

4 JMHSample_04_DefaultState

@State 注解可以直接写在类上，表明类的所有属性的作用域。否则会报错的

@State(Scope.Benchmark)
public class JMHSample_04_DefaultState_Benchmark {
    AtomicInteger x = new AtomicInteger(1);

5 @Setup 和 @TearDown

@Setup 用于基准测试前的初始化动作， @TearDown 用于基准测试后的动作，例子5-7都是关于它们的。都可以传入 Level 参数，Level 参数表明粒度，粒度从粗到细分别是:

Level.Trial：Benchmark 级别
Level.Iteration：执行迭代级别
Level.Invocation：每次方法调用级别
官方例子7用@Setup(Level.Invocation),让基准测试方法睡眠一段时间，两个进程，一个有睡眠，一个没有。

8 死码消除

许多基准测试的失败是死代码消除（DCE）：编译器足够聪明，可以推断出某些计算是多余的，并且可以完全消除它们。如果被淘汰的部分是我们的基准代码，那么我们就有麻烦了。解决办法之一就是用 return语句，就不会在编译期被去掉。

例子1 return

public class JMHSample_08_DeadCode {
    private double x = Math.PI;

    @Benchmark
    public void baseline() {
        // do nothing, this is a baseline
    }

    @Benchmark
    public void measureWrong() {
        // This is wrong: result is not used and the entire computation is optimized away.
        Math.log(x);
    }

    @Benchmark
    public double measureRight() {
        // This is correct: the result is being used.
        return Math.log(x);
    }
}

结果：

Benchmark                           Mode  Cnt   Score   Error  Units
JMHSample_08_DeadCode.baseline      avgt    2   0.251          ns/op
JMHSample_08_DeadCode.measureRight  avgt    2  17.758          ns/op
JMHSample_08_DeadCode.measureWrong  avgt    2   0.255          ns/op

还有一种死码消除的方法是Blackholes （黑洞），用法如下：

例子2 Blackholes

    @Benchmark
    public void measureRight_2(Blackhole bh) {
        bh.consume(Math.log(x1));
        bh.consume(Math.log(x2));
    }

结果：

Benchmark                               Mode  Cnt   Score   Error  Units
JMHSample_09_Blackholes.baseline        avgt    2  17.809          ns/op
JMHSample_09_Blackholes.measureRight_1  avgt    2  32.758          ns/op
JMHSample_09_Blackholes.measureRight_2  avgt    2  35.107          ns/op
JMHSample_09_Blackholes.measureWrong    avgt    2  17.629          ns/op

constant-folding（常量折叠），也会导致基础测试失败

// JMHSample_10_ConstantFold
    private double x = Math.PI;
    private final double wrongX = Math.PI;

    @Benchmark
    public double baseline() {
        return Math.PI;
    }

    @Benchmark
    public double measureWrong_1() {
        return Math.log(Math.PI);
    }

    @Benchmark
    public double measureWrong_2() {
        return Math.log(wrongX);
    }

    @Benchmark
    public double measureRight() {
        return Math.log(x);
    }

Benchmark                                 Mode  Cnt   Score   Error  Units
JMHSample_10_ConstantFold.baseline        avgt    2   1.988          ns/op
JMHSample_10_ConstantFold.measureRight    avgt    2  16.381          ns/op
JMHSample_10_ConstantFold.measureWrong_1  avgt    2   1.971          ns/op
JMHSample_10_ConstantFold.measureWrong_2  avgt    2   1.931          ns/op

11 不要用循环

循环，编译器会进行一系列的优化，

    private int reps(int reps) {
        int s = 0;
        for (int i = 0; i < reps; i++) {
            s += (x + y);
        }
        return s;
    }

Benchmark                               Mode  Cnt  Score   Error  Units
JMHSample_11_Loops.measureRight         avgt    2  2.006          ns/op
JMHSample_11_Loops.measureWrong_1       avgt    2  2.081          ns/op
JMHSample_11_Loops.measureWrong_10      avgt    2  0.214          ns/op
JMHSample_11_Loops.measureWrong_100     avgt    2  0.028          ns/op
JMHSample_11_Loops.measureWrong_1000    avgt    2  0.020          ns/op
JMHSample_11_Loops.measureWrong_10000   avgt    2  0.016          ns/op
JMHSample_11_Loops.measureWrong_100000  avgt    2  0.015          ns/op

答疑解惑

为什么不要用for呢？
答：例子中的解释如下：

It would be tempting for users to do loops within the benchmarked method. (This is the bad thing Caliper taught everyone). These tests explain why this is a bad idea. Looping is done in the hope of minimizing the overhead of calling the test method, by doing the operations inside the loop instead of inside the method call. Don’t buy this argument; you will see there is more magic happening when we allow optimizers to merge the loop iterations.

You might notice the larger the repetitions count, the lower the “perceived” cost of the operation being measured. Up to the point we do each addition with 1/20 ns, well beyond what hardware can actually do.

This happens because the loop is heavily unrolled/pipelined, and the operation to be measured is hoisted from the loop. Morale: don’t overuse loops, rely on JMH to get the measurement right.

翻译：
用户在基准测试方法中进行循环会很诱人。（这是Caliper教给每个人的坏东西）。这些测试说明了为什么这是一个坏主意。通过在循环内部而不是在方法调用内部进行操作，可以实现循环，以最大程度地减少调用测试方法的开销。不要买这个论点。当我们允许优化器合并循环迭代时，您将看到发生了更多的魔术。

您可能会注意到重复次数越多，所衡量的操作的“感知”成本就越低。到现在为止，我们每次加法都以1/20 ns进行，这远远超出了硬件实际能完成的工作。

发生这种情况的原因是，回路已严重展开/流水线化，并且要测量的操作已从回路中吊起。 Morale：不要过度使用循环，请依靠JMH正确进行测量。

看完这一段，依然一头雾水，从这个帖子Understanding loops performance in jvm中得到一个解释是，多核CPU情况下，JIT会进行优化，大至的优化效果会如下：

// pseudo code
int pipelines = 5;
for(int i = 0; i < length; i += pipelines){
    s += (x + y);
    s += (x + y);
    s += (x + y);
    s += (x + y);
    s += (x + y);
}