前言
1 HelloWorld
public class JMHSample_01_HelloWorld {
@Benchmark
public void wellHelloThere() {
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(JMHSample_01_HelloWorld.class.getSimpleName())
.forks(1)
.build();
new Runner(opt).run();
}
}
输出
# JMH version: 1.29
# VM version: JDK 1.8.0_221, Java HotSpot(TM) 64-Bit Server VM, 25.221-b11
# VM invoker: C:\Program Files\Java\jdk1.8.0_221\jre\bin\java.exe
# VM options: -Dfile.encoding=UTF-8 -Duser.country=CN -Duser.language=zh -Duser.variant
# Blackhole mode: full + dont-inline hint
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.frank.kmh.samples.JMHSample_01_HelloWorld.wellHelloThere
# Run progress: 0.00% complete, ETA 00:01:40
# Fork: 1 of 1
# Warmup Iteration 1: 4274070288.284 ops/s
# Warmup Iteration 2: 4267169096.855 ops/s
# Warmup Iteration 3: 4062262047.465 ops/s
# Warmup Iteration 4: 3972325977.661 ops/s
# Warmup Iteration 5: 3871290584.386 ops/s
Iteration 1: 3756577746.113 ops/s
Iteration 2: 3839831352.045 ops/s
Iteration 3: 3914366005.824 ops/s
Iteration 4: 3984215516.669 ops/s
Iteration 5: 4030862703.572 ops/s
Result "com.frank.kmh.samples.JMHSample_01_HelloWorld.wellHelloThere":
3905170664.844 ±(99.9%) 423921676.486 ops/s [Average]
(min, avg, max) = (3756577746.113, 3905170664.844, 4030862703.572), stdev = 110091113.162
CI (99.9%): [3481248988.359, 4329092341.330] (assumes normal distribution)
# Run complete. Total time: 00:01:40
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
JMHSample_01_HelloWorld.wellHelloThere thrpt 5 3905170664.844 ± 423921676.486 ops/s
使用 @Benchmark 来标记需要基准测试的方法,然后需要写一个 main 方法来启动基准测试。
2 BenchmarkModes
这个例子主要讲基准测试的几个种模式,先看下主要代码:
@Benchmark
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public void measureThroughput() throws InterruptedException {
TimeUnit.MILLISECONDS.sleep(100);
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void measureAvgTime() throws InterruptedException {
TimeUnit.MILLISECONDS.sleep(100);
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void measureSamples() throws InterruptedException {
TimeUnit.MILLISECONDS.sleep(100);
}
@Benchmark
@BenchmarkMode(Mode.SingleShotTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void measureSingleShot() throws InterruptedException {
TimeUnit.MILLISECONDS.sleep(100);
}
@Benchmark
@BenchmarkMode({Mode.Throughput, Mode.AverageTime, Mode.SampleTime, Mode.SingleShotTime})
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void measureMultiple() throws InterruptedException {
TimeUnit.MILLISECONDS.sleep(100);
}
@Benchmark
@BenchmarkMode(Mode.All)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public void measureAll() throws InterruptedException {
TimeUnit.MILLISECONDS.sleep(100);
}
@OutputTimeUnit
各种模式使用的单位时间,如第1个方法统计每秒执行多少次调用,第2个方法统计平均每个调用要多少毫秒
@BenchmarkMode
基准测试的模式
- Mode.Throughput 吞吐量模式, 每单位时间执行多少次调用
- Mode.AverageTime 平均时间模式, 平均每次调用的执行时间
- Mode.SampleTime 采样模式, 取99%,95%等数据进行计算
- Mode.SingleShotTime 只运行一次。往往同时把 warmup 次数设为0,用于测试冷启动时的性能。
- <以上4种自由组合> [Mode.Throughput, Mode.AverageTime, Mode.SampleTime, Mode.SingleShotTime }
- Mode.All 4种模式都用上
为了测试方便,你可以减少迭代次数,降低迭代时间
以下例子预热1次,运行10毫秒;迭代2次,运行50毫秒
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(JMHSample_02_BenchmarkModes.class.getSimpleName())
.forks(1)
.warmupIterations(1).warmupTime(TimeValue.milliseconds(10))
.measurementIterations(2).measurementTime(TimeValue.milliseconds(50))
.build();
new Runner(opt).run();
}
以下是最后的结果,(采样的p0.00指的是什么,还不知道)
Benchmark Mode Cnt Score Error Units
JMHSample_02_BenchmarkModes.measureAll thrpt 2 ≈ 10⁻⁵ ops/us
JMHSample_02_BenchmarkModes.measureMultiple thrpt 2 ≈ 10⁻⁵ ops/us
JMHSample_02_BenchmarkModes.measureThroughput thrpt 2 10.007 ops/s
JMHSample_02_BenchmarkModes.measureAll avgt 2 100140.100 us/op
JMHSample_02_BenchmarkModes.measureAvgTime avgt 2 100215.850 us/op
JMHSample_02_BenchmarkModes.measureMultiple avgt 2 100202.400 us/op
JMHSample_02_BenchmarkModes.measureAll sample 2 100073.472 us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.00 sample 100007.936 us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.50 sample 100073.472 us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.90 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.95 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.99 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.999 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p0.9999 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureAll:measureAll·p1.00 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureMultiple sample 2 99876.864 us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.00 sample 99614.720 us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.50 sample 99876.864 us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.90 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.95 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.99 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.999 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p0.9999 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureMultiple:measureMultiple·p1.00 sample 100139.008 us/op
JMHSample_02_BenchmarkModes.measureSamples sample 2 99811.328 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.00 sample 99614.720 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.50 sample 99811.328 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.90 sample 100007.936 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.95 sample 100007.936 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.99 sample 100007.936 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.999 sample 100007.936 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p0.9999 sample 100007.936 us/op
JMHSample_02_BenchmarkModes.measureSamples:measureSamples·p1.00 sample 100007.936 us/op
JMHSample_02_BenchmarkModes.measureAll ss 2 100370.350 us/op
JMHSample_02_BenchmarkModes.measureMultiple ss 2 100890.550 us/op
JMHSample_02_BenchmarkModes.measureSingleShot ss 2 100254.750 us/op
3 State
@State 的用法,用于多线程的测试,可以像 spring 一样自动注入这些变量。
- @State(Scope.Thread):作用域为线程,可以理解为一个 ThreadLocal 变量
- @State(Scope.Benchmark):作用域为本次 JMH 测试,线程共享
- @State(Scope.Group):作用域为 group,将在后文看到
例子1 Scope.Thread
public class JMHSample_03_States_Thread {
@State(Scope.Thread)
public static class ThreadState {
volatile int x = 0;
}
@Benchmark
public void measureUnshared(ThreadState state) throws Exception{
state.x++;
TimeUnit.MILLISECONDS.sleep(800);
System.out.println(state.x);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(JMHSample_03_States_Thread.class.getSimpleName())
.threads(3)
.forks(1)
.warmupIterations(1).warmupTime(TimeValue.milliseconds(10))
.measurementIterations(2).measurementTime(TimeValue.seconds(3))
.build();
new Runner(opt).run();
}
}
输出
Iteration 1: 3
4
3
4
4
5
5
6
5
7
6
6
8
7
3.726 ops/s
threads是开启3个线程,相同的值会输出3遍
例子2 Scope.Benchmark
@State(Scope.Benchmark)
public static class ThreadState {
volatile AtomicInteger x = new AtomicInteger(0);
}
@Benchmark
public void measureUnshared(ThreadState state) throws Exception{
TimeUnit.MILLISECONDS.sleep(800);
System.out.println(state.x.addAndGet(1));
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(JMHSample_03_States_Benchmark.class.getSimpleName())
.threads(3)
.forks(1)
.warmupIterations(1).warmupTime(TimeValue.milliseconds(10))
.measurementIterations(2).measurementTime(TimeValue.seconds(3))
.build();
new Runner(opt).run();
}
# Fork: 1 of 1
# Warmup Iteration 1: 2
3
1
4
6
5
7
8
3.740 ops/s
Iteration 1: 10
9
11
12
13
14
15
17
16
18
20
19
21
22
共享一个对象,用线程安全的Integer。
4 JMHSample_04_DefaultState
@State 注解可以直接写在类上,表明类的所有属性的作用域。否则会报错的
@State(Scope.Benchmark)
public class JMHSample_04_DefaultState_Benchmark {
AtomicInteger x = new AtomicInteger(1);
5 @Setup 和 @TearDown
@Setup 用于基准测试前的初始化动作, @TearDown 用于基准测试后的动作,例子5-7
都是关于它们的。都可以传入 Level 参数,Level 参数表明粒度,粒度从粗到细分别是:
- Level.Trial:Benchmark 级别
- Level.Iteration:执行迭代级别
- Level.Invocation:每次方法调用级别
官方例子7
用@Setup(Level.Invocation)
,让基准测试方法睡眠一段时间,两个进程,一个有睡眠,一个没有。
8 死码消除
许多基准测试的失败是死代码消除(DCE):编译器足够聪明,可以推断出某些计算是多余的,并且可以完全消除它们。 如果被淘汰的部分是我们的基准代码,那么我们就有麻烦了。解决办法之一就是用 return
语句,就不会在编译期被去掉。
例子1 return
public class JMHSample_08_DeadCode {
private double x = Math.PI;
@Benchmark
public void baseline() {
// do nothing, this is a baseline
}
@Benchmark
public void measureWrong() {
// This is wrong: result is not used and the entire computation is optimized away.
Math.log(x);
}
@Benchmark
public double measureRight() {
// This is correct: the result is being used.
return Math.log(x);
}
}
结果:
Benchmark Mode Cnt Score Error Units
JMHSample_08_DeadCode.baseline avgt 2 0.251 ns/op
JMHSample_08_DeadCode.measureRight avgt 2 17.758 ns/op
JMHSample_08_DeadCode.measureWrong avgt 2 0.255 ns/op
还有一种死码消除的方法是Blackholes (黑洞),用法如下:
例子2 Blackholes
@Benchmark
public void measureRight_2(Blackhole bh) {
bh.consume(Math.log(x1));
bh.consume(Math.log(x2));
}
结果:
Benchmark Mode Cnt Score Error Units
JMHSample_09_Blackholes.baseline avgt 2 17.809 ns/op
JMHSample_09_Blackholes.measureRight_1 avgt 2 32.758 ns/op
JMHSample_09_Blackholes.measureRight_2 avgt 2 35.107 ns/op
JMHSample_09_Blackholes.measureWrong avgt 2 17.629 ns/op
constant-folding(常量折叠),也会导致基础测试失败
// JMHSample_10_ConstantFold
private double x = Math.PI;
private final double wrongX = Math.PI;
@Benchmark
public double baseline() {
return Math.PI;
}
@Benchmark
public double measureWrong_1() {
return Math.log(Math.PI);
}
@Benchmark
public double measureWrong_2() {
return Math.log(wrongX);
}
@Benchmark
public double measureRight() {
return Math.log(x);
}
Benchmark Mode Cnt Score Error Units
JMHSample_10_ConstantFold.baseline avgt 2 1.988 ns/op
JMHSample_10_ConstantFold.measureRight avgt 2 16.381 ns/op
JMHSample_10_ConstantFold.measureWrong_1 avgt 2 1.971 ns/op
JMHSample_10_ConstantFold.measureWrong_2 avgt 2 1.931 ns/op
11 不要用循环
循环,编译器会进行一系列的优化,
private int reps(int reps) {
int s = 0;
for (int i = 0; i < reps; i++) {
s += (x + y);
}
return s;
}
Benchmark Mode Cnt Score Error Units
JMHSample_11_Loops.measureRight avgt 2 2.006 ns/op
JMHSample_11_Loops.measureWrong_1 avgt 2 2.081 ns/op
JMHSample_11_Loops.measureWrong_10 avgt 2 0.214 ns/op
JMHSample_11_Loops.measureWrong_100 avgt 2 0.028 ns/op
JMHSample_11_Loops.measureWrong_1000 avgt 2 0.020 ns/op
JMHSample_11_Loops.measureWrong_10000 avgt 2 0.016 ns/op
JMHSample_11_Loops.measureWrong_100000 avgt 2 0.015 ns/op
答疑解惑
- 为什么不要用for呢?
答:例子中的解释如下:
It would be tempting for users to do loops within the benchmarked method. (This is the bad thing Caliper taught everyone). These tests explain why this is a bad idea. Looping is done in the hope of minimizing the overhead of calling the test method, by doing the operations inside the loop instead of inside the method call. Don’t buy this argument; you will see there is more magic happening when we allow optimizers to merge the loop iterations.
You might notice the larger the repetitions count, the lower the “perceived” cost of the operation being measured. Up to the point we do each addition with 1/20 ns, well beyond what hardware can actually do.
This happens because the loop is heavily unrolled/pipelined, and the operation to be measured is hoisted from the loop. Morale: don’t overuse loops, rely on JMH to get the measurement right.
翻译:
用户在基准测试方法中进行循环会很诱人。 (这是Caliper教给每个人的坏东西)。 这些测试说明了为什么这是一个坏主意。 通过在循环内部而不是在方法调用内部进行操作,可以实现循环,以最大程度地减少调用测试方法的开销。 不要买这个论点。 当我们允许优化器合并循环迭代时,您将看到发生了更多的魔术。
您可能会注意到重复次数越多,所衡量的操作的“感知”成本就越低。 到现在为止,我们每次加法都以1/20 ns进行,这远远超出了硬件实际能完成的工作。
发生这种情况的原因是,回路已严重展开/流水线化,并且要测量的操作已从回路中吊起。 Morale:不要过度使用循环,请依靠JMH正确进行测量。
看完这一段,依然一头雾水,从这个帖子Understanding loops performance in jvm中得到一个解释是,多核CPU情况下,JIT会进行优化,大至的优化效果会如下:
// pseudo code
int pipelines = 5;
for(int i = 0; i < length; i += pipelines){
s += (x + y);
s += (x + y);
s += (x + y);
s += (x + y);
s += (x + y);
}
这也解释了结果中循环次数越大,响应时间越好的原因。
- @OperationsPerInvocation 怎么用,这里为什么要用这个参数?
答:JMH的@OperationsPerInvocation参数详解