JMH性能测试框架
JMH介绍
-
JMH,即Java Microbenchmark Harness,这是专门用于进行代码的微基准测试的一套工具API。它是一个由 OpenJDK/Oracle里面那群开发了Java编译器的大牛们所开发的Micro Benchmark Framework。何谓Micro Benchmark呢?简单来说就是在method层面上的benchmark,精度可以精确到微秒级。可以看出JMH主要使用在当你已经找出了热点函数,而需要对热点函数进行进一步的优化时,就可以使用JMH对优化的效果进行定量的分析。
-
比较典型的使用场景:
- 想定量的知道某个函数需要执行多长时间,以及执行时间和输入n的相关性
- 一个函数有两种不同实现(例如实现A使用了FixedThreadPool,实现B使用了ForkJoinPool),不知道哪种实现性能更好
-
学习使用方法,主要可以看官方提供的Code Sample写的非常浅显易懂。
JMH使用
- maven引入
<properties>
<jmh.version>1.21</jmh.version>
</properties>
<dependencies>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
- 第一个Benchmark示例
@BenchmarkMode({Mode.SampleTime})
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations=3, time = 5, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations=1,batchSize = 100000000)
@Threads(2)
@Fork(1)
@State(Scope.Benchmark)
public class MyClass {
Lock lock = new ReentrantLock();
long i = 0;
AtomicLong atomicLong = new AtomicLong(0);
@Benchmark
public void measureLock() {
lock.lock();
i++;
lock.unlock();
}
@Benchmark
public void measureCAS() {
atomicLong.incrementAndGet();
}
@Benchmark
public void measureNoLock() {
i++;
}
}
-
这个示例有三个函数,分别是测试使用不同方式的性能,加锁,使用CAS(先比较再交换)以及无锁状态
-
对于benchmark有两种测试方式
- 第一种是直接mvn install生产jar包,在命令行中执行jar包
- 第二种是写一个main函数,代码如下:
Options options = new OptionsBuilder() .include(MyClass.class.getSimpleName()) .output("D:/Benchmark.log") .build(); new Runner(options).run();
-
测试结果如下:
# JMH version: 1.21
# VM version: JDK 1.8.0_144, Java HotSpot(TM) 64-Bit Server VM, 25.144-b01
# VM invoker: D:\sdk\java\jdk1.8\jre\bin\java.exe
# VM options: -ea -Didea.test.cyclic.buffer.size=1048576 -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2019.1.3\lib\idea_rt.jar=57547:C:\Program Files\JetBrains\IntelliJ IDEA 2019.1.3\bin -Dfile.encoding=UTF-8
# Warmup: 3 iterations, 5 ms each
# Measurement: 1 iterations, 10 s each, 100000000 calls per op
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Sampling time
# Benchmark: com.ctrip.car.testweb.javacode.MyClass.measureCAS
# Run progress: 0.00% complete, ETA 00:00:30
# Fork: 1 of 1
# Warmup Iteration 1: ≈ 10⁻⁴ ms/op
# Warmup Iteration 2: ≈ 10⁻⁴ ms/op
# Warmup Iteration 3: ≈ 10⁻⁴ ms/op
Iteration 1: 3270.359 ±(99.9%) 538.912 ms/op
measureCAS·p0.00: 2805.989 ms/op
measureCAS·p0.50: 3296.723 ms/op
measureCAS·p0.90: 3527.410 ms/op
measureCAS·p0.95: 3527.410 ms/op
measureCAS·p0.99: 3527.410 ms/op
measureCAS·p0.999: 3527.410 ms/op
measureCAS·p0.9999: 3527.410 ms/op
measureCAS·p1.00: 3527.410 ms/op
Result "com.ctrip.car.testweb.javacode.MyClass.measureCAS":
N = 7
mean = 3270.359 ±(99.9%) 538.912 ms/op
Histogram, ms/op:
[2800.000, 2850.000) = 1
[2850.000, 2900.000) = 0
[2900.000, 2950.000) = 0
[2950.000, 3000.000) = 0
[3000.000, 3050.000) = 0
[3050.000, 3100.000) = 0
[3100.000, 3150.000) = 0
[3150.000, 3200.000) = 0
[3200.000, 3250.000) = 2
[3250.000, 3300.000) = 1
[3300.000, 3350.000) = 1
[3350.000, 3400.000) = 0
[3400.000, 3450.000) = 0
[3450.000, 3500.000) = 0
[3500.000, 3550.000) = 2
Percentiles, ms/op:
p(0.0000) = 2805.989 ms/op
p(50.0000) = 3296.723 ms/op
p(90.0000) = 3527.410 ms/op
p(95.0000) = 3527.410 ms/op
p(99.0000) = 3527.410 ms/op
p(99.9000) = 3527.410 ms/op
p(99.9900) = 3527.410 ms/op
p(99.9990) = 3527.410 ms/op
p(99.9999) = 3527.410 ms/op
p(100.0000) = 3527.410 ms/op
# JMH version: 1.21
# VM version: JDK 1.8.0_144, Java HotSpot(TM) 64-Bit Server VM, 25.144-b01
# VM invoker: D:\sdk\java\jdk1.8\jre\bin\java.exe
# VM options: -ea -Didea.test.cyclic.buffer.size=1048576 -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2019.1.3\lib\idea_rt.jar=57547:C:\Program Files\JetBrains\IntelliJ IDEA 2019.1.3\bin -Dfile.encoding=UTF-8
# Warmup: 3 iterations, 5 ms each
# Measurement: 1 iterations, 10 s each, 100000000 calls per op
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Sampling time
# Benchmark: com.ctrip.car.testweb.javacode.MyClass.measureLock
# Run progress: 33.33% complete, ETA 00:00:28
# Fork: 1 of 1
# Warmup Iteration 1: 0.001 ±(99.9%) 0.001 ms/op
# Warmup Iteration 2: ≈ 10⁻³ ms/op
# Warmup Iteration 3: ≈ 10⁻³ ms/op
Iteration 1: 6339.690 ±(99.9%) 33223.544 ms/op
measureLock·p0.00: 3435.135 ms/op
measureLock·p0.50: 3940.549 ms/op
measureLock·p0.90: 14042.530 ms/op
measureLock·p0.95: 14042.530 ms/op
measureLock·p0.99: 14042.530 ms/op
measureLock·p0.999: 14042.530 ms/op
measureLock·p0.9999: 14042.530 ms/op
measureLock·p1.00: 14042.530 ms/op
Result "com.ctrip.car.testweb.javacode.MyClass.measureLock":
N = 4
mean = 6339.690 ±(99.9%) 33223.544 ms/op
Histogram, ms/op:
[ 0.000, 1250.000) = 0
[ 1250.000, 2500.000) = 0
[ 2500.000, 3750.000) = 1
[ 3750.000, 5000.000) = 2
[ 5000.000, 6250.000) = 0
[ 6250.000, 7500.000) = 0
[ 7500.000, 8750.000) = 0
[ 8750.000, 10000.000) = 0
[10000.000, 11250.000) = 0
[11250.000, 12500.000) = 0
[12500.000, 13750.000) = 0
[13750.000, 15000.000) = 1
[15000.000, 16250.000) = 0
[16250.000, 17500.000) = 0
[17500.000, 18750.000) = 0
Percentiles, ms/op:
p(0.0000) = 3435.135 ms/op
p(50.0000) = 3940.549 ms/op
p(90.0000) = 14042.530 ms/op
p(95.0000) = 14042.530 ms/op
p(99.0000) = 14042.530 ms/op
p(99.9000) = 14042.530 ms/op
p(99.9900) = 14042.530 ms/op
p(99.9990) = 14042.530 ms/op
p(99.9999) = 14042.530 ms/op
p(100.0000) = 14042.530 ms/op
# JMH version: 1.21
# VM version: JDK 1.8.0_144, Java HotSpot(TM) 64-Bit Server VM, 25.144-b01
# VM invoker: D:\sdk\java\jdk1.8\jre\bin\java.exe
# VM options: -ea -Didea.test.cyclic.buffer.size=1048576 -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2019.1.3\lib\idea_rt.jar=57547:C:\Program Files\JetBrains\IntelliJ IDEA 2019.1.3\bin -Dfile.encoding=UTF-8
# Warmup: 3 iterations, 5 ms each
# Measurement: 1 iterations, 10 s each, 100000000 calls per op
# Timeout: 10 min per iteration
# Threads: 2 threads, will synchronize iterations
# Benchmark mode: Sampling time
# Benchmark: com.ctrip.car.testweb.javacode.MyClass.measureNoLock
# Run progress: 66.67% complete, ETA 00:00:14
# Fork: 1 of 1
# Warmup Iteration 1: ≈ 10⁻⁴ ms/op
# Warmup Iteration 2: ≈ 10⁻⁴ ms/op
# Warmup Iteration 3: ≈ 10⁻⁴ ms/op
Iteration 1: 261.960 ±(99.9%) 4.585 ms/op
measureNoLock·p0.00: 183.239 ms/op
measureNoLock·p0.50: 262.144 ms/op
measureNoLock·p0.90: 272.944 ms/op
measureNoLock·p0.95: 278.449 ms/op
measureNoLock·p0.99: 281.543 ms/op
measureNoLock·p0.999: 281.543 ms/op
measureNoLock·p0.9999: 281.543 ms/op
measureNoLock·p1.00: 281.543 ms/op
Result "com.ctrip.car.testweb.javacode.MyClass.measureNoLock":
N = 77
mean = 261.960 ±(99.9%) 4.585 ms/op
Histogram, ms/op:
[180.000, 190.000) = 1
[190.000, 200.000) = 0
[200.000, 210.000) = 0
[210.000, 220.000) = 0
[220.000, 230.000) = 0
[230.000, 240.000) = 0
[240.000, 250.000) = 1
[250.000, 260.000) = 25
[260.000, 270.000) = 35
[270.000, 280.000) = 13
Percentiles, ms/op:
p(0.0000) = 183.239 ms/op
p(50.0000) = 262.144 ms/op
p(90.0000) = 272.944 ms/op
p(95.0000) = 278.449 ms/op
p(99.0000) = 281.543 ms/op
p(99.9000) = 281.543 ms/op
p(99.9900) = 281.543 ms/op
p(99.9990) = 281.543 ms/op
p(99.9999) = 281.543 ms/op
p(100.0000) = 281.543 ms/op
# Run complete. Total time: 00:00:39
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
MyClass.measureCAS sample 7 3270.359 ± 538.912 ms/op
MyClass.measureCAS:measureCAS·p0.00 sample 2805.989 ms/op
MyClass.measureCAS:measureCAS·p0.50 sample 3296.723 ms/op
MyClass.measureCAS:measureCAS·p0.90 sample 3527.410 ms/op
MyClass.measureCAS:measureCAS·p0.95 sample 3527.410 ms/op
MyClass.measureCAS:measureCAS·p0.99 sample 3527.410 ms/op
MyClass.measureCAS:measureCAS·p0.999 sample 3527.410 ms/op
MyClass.measureCAS:measureCAS·p0.9999 sample 3527.410 ms/op
MyClass.measureCAS:measureCAS·p1.00 sample 3527.410 ms/op
MyClass.measureLock sample 4 6339.690 ± 33223.544 ms/op
MyClass.measureLock:measureLock·p0.00 sample 3435.135 ms/op
MyClass.measureLock:measureLock·p0.50 sample 3940.549 ms/op
MyClass.measureLock:measureLock·p0.90 sample 14042.530 ms/op
MyClass.measureLock:measureLock·p0.95 sample 14042.530 ms/op
MyClass.measureLock:measureLock·p0.99 sample 14042.530 ms/op
MyClass.measureLock:measureLock·p0.999 sample 14042.530 ms/op
MyClass.measureLock:measureLock·p0.9999 sample 14042.530 ms/op
MyClass.measureLock:measureLock·p1.00 sample 14042.530 ms/op
MyClass.measureNoLock sample 77 261.960 ± 4.585 ms/op
MyClass.measureNoLock:measureNoLock·p0.00 sample 183.239 ms/op
MyClass.measureNoLock:measureNoLock·p0.50 sample 262.144 ms/op
MyClass.measureNoLock:measureNoLock·p0.90 sample 272.944 ms/op
MyClass.measureNoLock:measureNoLock·p0.95 sample 278.449 ms/op
MyClass.measureNoLock:measureNoLock·p0.99 sample 281.543 ms/op
MyClass.measureNoLock:measureNoLock·p0.999 sample 281.543 ms/op
MyClass.measureNoLock:measureNoLock·p0.9999 sample 281.543 ms/op
MyClass.measureNoLock:measureNoLock·p1.00 sample 281.543 ms/op
- 总结如下,使用lock方式最长需要14042ms,使用CAS方式最长需要3527ms,使用无锁方式最长需要281ms
JMH基本概念
Mode
-
Mode表示JMH进行Benchmark时所使用的模式。通常是测量的维度不同或是测量的方式不同。目前JMH共有四种模式:
- Throughput:整体吞吐量,例如"1s内可以执行多少次调用"
- AverageTime:调用的平均时间,例如"每次调用平均耗时XXX毫秒"
- SampleTime:随机取样,最后输出取样结果的分布,例如"99%的调用在XXX毫秒以内"
- SingleShotTime:以上模式都是默认一次iteration是1s,唯有SingleShotTime是只运行一次。往往同时把warmup次数设为0,用于测试冷启动时的性能。
-
Iteration
- Iteration是JMH进行测试的最小单位。在大部分模式下,一次Iteration代表的是一秒,JMH会在这一秒内不断调用需要benchmark的方法,然后根据模式进行采样,计算吞吐量,计算平均执行时间等。
-
Warmup
- Warmup是指在实际进行benchmark前先进行预热的行为。为什么需要预热?因为JVM的JIT机制的存在,如果某个函数被调用多次以后,JVM会尝试将其编译成机器码从而提高执行速度。所以为了让benchmark的结果更加接近真实情况就需要进行预热。
@BenchmarkMod
- 基准测试类型。这里选择的是SampleTime
@Warmup
- 这里就是对预热的轮数以及时间的一些控制
@Measurement
- 就是一些基本的测试参数
- iterations 进行的测试轮数
- time 每轮进行的时长
- timeUnit 时长单位
@Threads
- 每个进程中的测试线程,一般是CPU乘以2
@Fork
- 进行fork的次数,如果fork数是2代表JMH会fork出线程来进行测试
@OutputTimeUnit
- 基准测试结果的时间类型,可以选择秒、毫秒和微秒
@Benchmark
- 方法级注解,表示该方法是需要进行benchmark的对象,用法和JUnit的@Test类似
@Param
- 属性级注解,@Param可以用来指定某项参数的多种情况。特别适合用来测试一个函数在不同的参数输入的情况下的性能
@Setup
- 方法级注解,作用是我们需要在测试之前进行一些准备工作,不如对数据的一些初始化之类的
@TearDown
- 方法级注解,作用就是测试后的一些结束工作,比如关闭线程池,数据库连接等
@State
- 当使用@Setup注解的时候,必须在类上加这个参数
- State 用于声明某个类是一个“状态”,然后接受一个 Scope 参数用来表示该状态的共享范围。 因为很多 benchmark 会需要一些表示状态的类,JMH 允许你把这些类以依赖注入的方式注入到 benchmark 函数里。Scope 主要分为三种。
- Thread:该状态为每个线程独享
- Group:该状态为同一个组里面所有线程共享
- Benchmark:该状态在所有线程间共享