系列文章目录
jmh学习笔记-源代码编译与bench mode
jmh学习笔记-State共享对象
jmh学习笔记-State共享对象前后置方法
jmh学习笔记-代码清除
jmh学习笔记-常量折叠
jmh学习笔记-Forking分叉
jmh学习笔记-环境配置
jmh学习笔记-缓存行的处理方式
jmh学习笔记-自定义项目引入jmh
前言
通常错误的分享会导致问题, 比如两个线程同时针对内存中相邻数据进行操作(通常涉及到修改),有可能涉及到同一个缓存行的问题。这样会导致大幅度的速度降低。对于缓存行的概念,可以查阅相关问题,本文就解决这个问题的几个方案进行基准测试。
缓存行问题
比如在下面的代码当中,同一个Group的两个基准测试,一个疯狂的读数据,一个疯狂的写数据,此时就极可能涉及到缓存行问题。
/**
* BASELINE EXPERIMENT:
* Because of the false sharing, both reader and writer will experience
* penalties.
*/
@State(Scope.Group)
public static class StateBaseline {
int readOnly;
int writeOnly;
}
@Benchmark
@Group("baseline")
public int reader(StateBaseline s) {
return s.readOnly;
}
@Benchmark
@Group("baseline")
public void writer(StateBaseline s) {
s.writeOnly++;
}
填充解决缓存行问题
通过填充一些数据让数据不会处于同一个缓存行。这不是通用的,因为JVM可以自由地重新排列字段顺序,即使是相同类型的字段也是如此。
/**
* APPROACH 1: PADDING
*
* We can try to alleviate some of the effects with padding.
* This is not versatile because JVMs can freely rearrange the
* field order, even of the same type.
*
*/
@State(Scope.Group)
public static class StatePadded {
int readOnly;
int p01, p02, p03, p04, p05, p06, p07, p08;
int p11, p12, p13, p14, p15, p16, p17, p18;
int writeOnly;
int q01, q02, q03, q04, q05, q06, q07, q08;
int q11, q12, q13, q14, q15, q16, q17, q18;
}
@Benchmark
@Group("padded")
public int reader(StatePadded s) {
return s.readOnly;
}
@Benchmark
@Group("padded")
public void writer(StatePadded s) {
s.writeOnly++;
}
进行测试
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
public class JMHSample_22_FalseSharing {
... 此处省略基准方法
/**
* Note the slowdowns.
*/
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(JMHSample_22_FalseSharing.class.getSimpleName())
.threads(Runtime.getRuntime().availableProcessors())
// .jvmArgs("-RestrictContended")
.build();
new Runner(opt).run();
}
}
测试结果如下
# JMH version: 1.26
# VM version: JDK 1.8.0_121, Java HotSpot(TM) 64-Bit Server VM, 25.121-b13
# VM invoker: C:\Program Files\Java\jdk1.8.0_121\jre\bin\java.exe
# VM options: -javaagent:D:\Program Files\JetBrains\IntelliJ IDEA 2019.3.1\lib\idea_rt.jar=52340:D:\Program Files\JetBrains\IntelliJ IDEA 2019.3.1\bin -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 4 threads (2 groups; 1x "reader", 1x "writer" in each group), will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.openjdk.jmh.samples.JMHSample_22_FalseSharing.baseline
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark Mode Cnt Score Error Units
JMHSample_22_FalseSharing.baseline thrpt 25 740.177 ± 40.703 ops/us
JMHSample_22_FalseSharing.baseline:reader thrpt 25 125.132 ± 38.325 ops/us
JMHSample_22_FalseSharing.baseline:writer thrpt 25 615.045 ± 22.291 ops/us
JMHSample_22_FalseSharing.padded thrpt 25 823.430 ± 14.982 ops/us
JMHSample_22_FalseSharing.padded:reader thrpt 25 257.498 ± 8.415 ops/us
JMHSample_22_FalseSharing.padded:writer thrpt 25 565.932 ± 15.281 ops/us
也可以通过继承的方式填充
/**
* APPROACH 2: CLASS HIERARCHY TRICK
*
* We can alleviate false sharing with this convoluted hierarchy trick,
* using the fact that superclass fields are usually laid out first.
* In this construction, the protected field will be squashed between
* paddings.
* It is important to use the smallest data type, so that layouter would
* not generate any gaps that can be taken by later protected subclasses
* fields. Depending on the actual field layout of classes that bear the
* protected fields, we might need more padding to account for "lost"
* padding fields pulled into in their superclass gaps.
*/
public static class StateHierarchy_1 {
int readOnly;
}
public static class StateHierarchy_2 extends StateHierarchy_1 {
byte p01, p02, p03, p04, p05, p06, p07, p08;
byte p11, p12, p13, p14, p15, p16, p17, p18;
byte p21, p22, p23, p24, p25, p26, p27, p28;
byte p31, p32, p33, p34, p35, p36, p37, p38;
byte p41, p42, p43, p44, p45, p46, p47, p48;
byte p51, p52, p53, p54, p55, p56, p57, p58;
byte p61, p62, p63, p64, p65, p66, p67, p68;
byte p71, p72, p73, p74, p75, p76, p77, p78;
}
public static class StateHierarchy_3 extends StateHierarchy_2 {
int writeOnly;
}
public static class StateHierarchy_4 extends StateHierarchy_3 {
byte q01, q02, q03, q04, q05, q06, q07, q08;
byte q11, q12, q13, q14, q15, q16, q17, q18;
byte q21, q22, q23, q24, q25, q26, q27, q28;
byte q31, q32, q33, q34, q35, q36, q37, q38;
byte q41, q42, q43, q44, q45, q46, q47, q48;
byte q51, q52, q53, q54, q55, q56, q57, q58;
byte q61, q62, q63, q64, q65, q66, q67, q68;
byte q71, q72, q73, q74, q75, q76, q77, q78;
}
@State(Scope.Group)
public static class StateHierarchy extends StateHierarchy_4 {
}
@Benchmark
@Group("hierarchy")
public int reader(StateHierarchy s) {
return s.readOnly;
}
@Benchmark
@Group("hierarchy")
public void writer(StateHierarchy s) {
s.writeOnly++;
}
测试结果如下
Benchmark Mode Cnt Score Error Units
JMHSample_22_FalseSharing.baseline thrpt 25 698.244 ± 22.066 ops/us
JMHSample_22_FalseSharing.baseline:reader thrpt 25 88.647 ± 11.051 ops/us
JMHSample_22_FalseSharing.baseline:writer thrpt 25 609.597 ± 32.784 ops/us
JMHSample_22_FalseSharing.hierarchy thrpt 25 802.604 ± 22.564 ops/us
JMHSample_22_FalseSharing.hierarchy:reader thrpt 25 252.829 ± 14.266 ops/us
JMHSample_22_FalseSharing.hierarchy:writer thrpt 25 549.775 ± 26.877 ops/us
JMHSample_22_FalseSharing.padded thrpt 25 803.736 ± 19.131 ops/us
JMHSample_22_FalseSharing.padded:reader thrpt 25 257.035 ± 6.357 ops/us
JMHSample_22_FalseSharing.padded:writer thrpt 25 546.701 ± 18.636 ops/us
数组解决缓存行问题
/**
* APPROACH 3: ARRAY TRICK
*
* This trick relies on the contiguous allocation of an array.
* Instead of placing the fields in the class, we mangle them
* into the array at very sparse offsets.
*/
@State(Scope.Group)
public static class StateArray {
int[] arr = new int[128];
}
@Benchmark
@Group("sparse")
public int reader(StateArray s) {
return s.arr[0];
}
@Benchmark
@Group("sparse")
public void writer(StateArray s) {
s.arr[64]++;
}
@Contended注解
@State(Scope.Group)
public static class StateContended {
int readOnly;
@sun.misc.Contended
int writeOnly;
}
@Benchmark
@Group("contended")
public int reader(StateContended s) {
return s.readOnly;
}
@Benchmark
@Group("contended")
public void writer(StateContended s) {
s.writeOnly++;
}
这种方式仅支持JDK8以及之上版本,运行时需要修改jvm参数-XX:-RestrictContended
为true
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(JMHSample_22_FalseSharing.class.getSimpleName())
.threads(Runtime.getRuntime().availableProcessors())
.jvmArgs("-XX:-RestrictContended")
.build();
new Runner(opt).run();
}
总结
缓存行涉及到CPU底层的知识,可参考其他的资料先行了解。