好吧,我将提供CPU如何立即翻译和更新帖子,但是与此同时,您看到的是waaaay差异太小而不必关心.
Java中的字节代码并不表示方法将执行(或不执行)的速度,有两种JIT编译器一旦足够热就会使该方法看起来完全不同.同样,众所周知,一旦编译代码,javac所做的优化很少,真正的优化来自JIT.
为此,我已经使用JMH进行了一些测试,仅使用C1编译器,或者使用GraalVM或完全不使用JIT替换C2 …(后面有很多测试代码,您可以跳过它,只看结果即可,这完成了使用jdk-12 btw).这段代码使用的是JMH-在Java微型基准世界中使用的事实工具(众所周知,如果手工完成,这些基准容易出错).
@Warmup(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
public class BooleanCompare {
public static void main(String[] args) throws Exception {
Options opt = new OptionsBuilder()
.include(BooleanCompare.class.getName())
.build();
new Runner(opt).run();
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
public boolean xor(BooleanExecutionPlan plan) {
return plan.booleans()[0] ^ plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
public boolean plain(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-Xint")
public boolean xorNoJIT(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-Xint")
public boolean plainNoJIT(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:-TieredCompilation")
public boolean xorC2Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:-TieredCompilation")
public boolean plainC2Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:TieredStopAtLevel=1")
public boolean xorC1Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1, jvmArgsAppend = "-XX:TieredStopAtLevel=1")
public boolean plainC1Only(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1,
jvmArgsAppend = {
"-XX:+UnlockExperimentalVMOptions",
"-XX:+EagerJVMCI",
"-Dgraal.ShowConfiguration=info",
"-XX:+UseJVMCICompiler",
"-XX:+EnableJVMCI"
})
public boolean xorGraalVM(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(value = 1,
jvmArgsAppend = {
"-XX:+UnlockExperimentalVMOptions",
"-XX:+EagerJVMCI",
"-Dgraal.ShowConfiguration=info",
"-XX:+UseJVMCICompiler",
"-XX:+EnableJVMCI"
})
public boolean plainGraalVM(BooleanExecutionPlan plan) {
return plan.booleans()[0] != plan.booleans()[1];
}
}
结果:
BooleanCompare.plain avgt 2 3.125 ns/op
BooleanCompare.xor avgt 2 2.976 ns/op
BooleanCompare.plainC1Only avgt 2 3.400 ns/op
BooleanCompare.xorC1Only avgt 2 3.379 ns/op
BooleanCompare.plainC2Only avgt 2 2.583 ns/op
BooleanCompare.xorC2Only avgt 2 2.685 ns/op
BooleanCompare.plainGraalVM avgt 2 2.980 ns/op
BooleanCompare.xorGraalVM avgt 2 3.868 ns/op
BooleanCompare.plainNoJIT avgt 2 243.348 ns/op
BooleanCompare.xorNoJIT avgt 2 201.342 ns/op
我不是一个多才多艺的人,尽管我有时喜欢这样做,但我不喜欢汇编程序.这是一些有趣的事情.如果这样做:
C1 compiler only with !=
/*
* run many iterations of this with :
* java -XX:+UnlockDiagnosticVMOptions
* -XX:TieredStopAtLevel=1
* "-XX:CompileCommand=print,com/so/BooleanCompare.compare"
* com.so.BooleanCompare
*/
public static boolean compare(boolean left, boolean right) {
return left != right;
}
我们得到:
0x000000010d1b2bc7: push %rbp
0x000000010d1b2bc8: sub $0x30,%rsp ;*iload_0 {reexecute=0 rethrow=0 return_oop=0}
; - com.so.BooleanCompare::compare@0 (line 22)
0x000000010d1b2bcc: cmp %edx,%esi
0x000000010d1b2bce: mov $0x0,%eax
0x000000010d1b2bd3: je 0x000000010d1b2bde
0x000000010d1b2bd9: mov $0x1,%eax
0x000000010d1b2bde: and $0x1,%eax
0x000000010d1b2be1: add $0x30,%rsp
0x000000010d1b2be5: pop %rbp
对我来说,这段代码有点明显:将0放入eax,进行比较(edx,esi)->如果不相等,则将1放入eax.返回eax& 1.
C1 compiler with ^:
public static boolean compare(boolean left, boolean right) {
return left ^ right;
}
# parm0: rsi = boolean
# parm1: rdx = boolean
# [sp+0x40] (sp of caller)
0x000000011326e5c0: mov %eax,-0x14000(%rsp)
0x000000011326e5c7: push %rbp
0x000000011326e5c8: sub $0x30,%rsp ;*iload_0 {reexecute=0 rethrow=0 return_oop=0}
; - com.so.BooleanCompare::compare@0 (line 22)
0x000000011326e5cc: xor %rdx,%rsi
0x000000011326e5cf: and $0x1,%esi
0x000000011326e5d2: mov %rsi,%rax
0x000000011326e5d5: add $0x30,%rsp
0x000000011326e5d9: pop %rbp
我真的不知道为什么,这里需要$0x1,%esi,否则我猜这也很简单.
But if I enable C2 compiler, things are a lot more interesting.
/**
* run with java
* -XX:+UnlockDiagnosticVMOptions
* -XX:CICompilerCount=2
* -XX:-TieredCompilation
* "-XX:CompileCommand=print,com/so/BooleanCompare.compare"
* com.so.BooleanCompare
*/
public static boolean compare(boolean left, boolean right) {
return left != right;
}
# parm0: rsi = boolean
# parm1: rdx = boolean
# [sp+0x20] (sp of caller)
0x000000011a2bbfa0: sub $0x18,%rsp
0x000000011a2bbfa7: mov %rbp,0x10(%rsp)
0x000000011a2bbfac: xor %r10d,%r10d
0x000000011a2bbfaf: mov $0x1,%eax
0x000000011a2bbfb4: cmp %edx,%esi
0x000000011a2bbfb6: cmove %r10d,%eax
0x000000011a2bbfba: add $0x10,%rsp
0x000000011a2bbfbe: pop %rbp
我什至都没有看到经典的Epilog推送. mov ebp,esp; sub esp,x,而是通过以下方式(至少对我而言)非常不寻常的方式:
sub $0x18,%rsp
mov %rbp,0x10(%rsp)
....
add $0x10,%rsp
pop %rbp
再说一次,比我更灵活的人可以满怀希望地解释.否则,它就像生成的C1的更好版本:
xor %r10d,%r10d // put zero into r10d
mov $0x1,%eax // put 1 into eax
cmp %edx,%esi // compare edx and esi
cmove %r10d,%eax // conditionally move the contents of r10d into eax
由于分支预测,AFAIK cmp / cmove比cmp / je好-至少这是我读过的内容…
XOR with C2 compiler:
public static boolean compare(boolean left, boolean right) {
return left ^ right;
}
0x000000010e6c9a20: sub $0x18,%rsp
0x000000010e6c9a27: mov %rbp,0x10(%rsp)
0x000000010e6c9a2c: xor %edx,%esi
0x000000010e6c9a2e: mov %esi,%eax
0x000000010e6c9a30: and $0x1,%eax
0x000000010e6c9a33: add $0x10,%rsp
0x000000010e6c9a37: pop %rbp
看起来与C1编译器生成的几乎相同.