jmh学习笔记-缓存行的解决方式

本文链接：https://blog.csdn.net/m0_37607945/article/details/111605011

系列文章目录

jmh学习笔记-源代码编译与bench mode
jmh学习笔记-State共享对象
 jmh学习笔记-State共享对象前后置方法
 jmh学习笔记-代码清除
 jmh学习笔记-常量折叠
 jmh学习笔记-Forking分叉
 jmh学习笔记-环境配置
 jmh学习笔记-缓存行的处理方式
 jmh学习笔记-自定义项目引入jmh

文章目录

系列文章目录
前言
缓存行问题
填充解决缓存行问题
数组解决缓存行问题
@Contended注解
总结

前言

通常错误的分享会导致问题，比如两个线程同时针对内存中相邻数据进行操作（通常涉及到修改），有可能涉及到同一个缓存行的问题。这样会导致大幅度的速度降低。对于缓存行的概念，可以查阅相关问题，本文就解决这个问题的几个方案进行基准测试。

缓存行问题

比如在下面的代码当中，同一个Group的两个基准测试，一个疯狂的读数据，一个疯狂的写数据，此时就极可能涉及到缓存行问题。

    /**
     * BASELINE EXPERIMENT:
     * Because of the false sharing, both reader and writer will experience
     * penalties.
     */
    @State(Scope.Group)
    public static class StateBaseline {
        int readOnly;
        int writeOnly;
    }

    @Benchmark
    @Group("baseline")
    public int reader(StateBaseline s) {
        return s.readOnly;
    }

    @Benchmark
    @Group("baseline")
    public void writer(StateBaseline s) {
        s.writeOnly++;
    }

填充解决缓存行问题

通过填充一些数据让数据不会处于同一个缓存行。这不是通用的，因为JVM可以自由地重新排列字段顺序，即使是相同类型的字段也是如此。

    /**
     * APPROACH 1: PADDING
     *
     * We can try to alleviate some of the effects with padding.
     * This is not versatile because JVMs can freely rearrange the
     * field order, even of the same type.
     *
     */
    @State(Scope.Group)
    public static class StatePadded {
        int readOnly;
        int p01, p02, p03, p04, p05, p06, p07, p08;
        int p11, p12, p13, p14, p15, p16, p17, p18;
        int writeOnly;
        int q01, q02, q03, q04, q05, q06, q07, q08;
        int q11, q12, q13, q14, q15, q16, q17, q18;
    }

    @Benchmark
    @Group("padded")
    public int reader(StatePadded s) {
        return s.readOnly;
    }

    @Benchmark
    @Group("padded")
    public void writer(StatePadded s) {
        s.writeOnly++;
    }

进行测试

@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(5)
public class JMHSample_22_FalseSharing {


	... 此处省略基准方法

    /**
     * Note the slowdowns.
     */
    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(JMHSample_22_FalseSharing.class.getSimpleName())
                .threads(Runtime.getRuntime().availableProcessors())
//                .jvmArgs("-RestrictContended")
                .build();

        new Runner(opt).run();
    }

}

测试结果如下

# JMH version: 1.26
# VM version: JDK 1.8.0_121, Java HotSpot(TM) 64-Bit Server VM, 25.121-b13
# VM invoker: C:\Program Files\Java\jdk1.8.0_121\jre\bin\java.exe
# VM options: -javaagent:D:\Program Files\JetBrains\IntelliJ IDEA 2019.3.1\lib\idea_rt.jar=52340:D:\Program Files\JetBrains\IntelliJ IDEA 2019.3.1\bin -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 1 s each
# Measurement: 5 iterations, 1 s each
# Timeout: 10 min per iteration
# Threads: 4 threads (2 groups; 1x "reader", 1x "writer" in each group), will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: org.openjdk.jmh.samples.JMHSample_22_FalseSharing.baseline

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                   Mode  Cnt    Score    Error   Units
JMHSample_22_FalseSharing.baseline         thrpt   25  740.177 ± 40.703  ops/us
JMHSample_22_FalseSharing.baseline:reader  thrpt   25  125.132 ± 38.325  ops/us
JMHSample_22_FalseSharing.baseline:writer  thrpt   25  615.045 ± 22.291  ops/us
JMHSample_22_FalseSharing.padded           thrpt   25  823.430 ± 14.982  ops/us
JMHSample_22_FalseSharing.padded:reader    thrpt   25  257.498 ±  8.415  ops/us
JMHSample_22_FalseSharing.padded:writer    thrpt   25  565.932 ± 15.281  ops/us

也可以通过继承的方式填充

 /**
     * APPROACH 2: CLASS HIERARCHY TRICK
     *
     * We can alleviate false sharing with this convoluted hierarchy trick,
     * using the fact that superclass fields are usually laid out first.
     * In this construction, the protected field will be squashed between
     * paddings.

     * It is important to use the smallest data type, so that layouter would
     * not generate any gaps that can be taken by later protected subclasses
     * fields. Depending on the actual field layout of classes that bear the
     * protected fields, we might need more padding to account for "lost"
     * padding fields pulled into in their superclass gaps.
     */
    public static class StateHierarchy_1 {
        int readOnly;
    }

    public static class StateHierarchy_2 extends StateHierarchy_1 {
        byte p01, p02, p03, p04, p05, p06, p07, p08;
        byte p11, p12, p13, p14, p15, p16, p17, p18;
        byte p21, p22, p23, p24, p25, p26, p27, p28;
        byte p31, p32, p33, p34, p35, p36, p37, p38;
        byte p41, p42, p43, p44, p45, p46, p47, p48;
        byte p51, p52, p53, p54, p55, p56, p57, p58;
        byte p61, p62, p63, p64, p65, p66, p67, p68;
        byte p71, p72, p73, p74, p75, p76, p77, p78;
    }

    public static class StateHierarchy_3 extends StateHierarchy_2 {
        int writeOnly;
    }

    public static class StateHierarchy_4 extends StateHierarchy_3 {
        byte q01, q02, q03, q04, q05, q06, q07, q08;
        byte q11, q12, q13, q14, q15, q16, q17, q18;
        byte q21, q22, q23, q24, q25, q26, q27, q28;
        byte q31, q32, q33, q34, q35, q36, q37, q38;
        byte q41, q42, q43, q44, q45, q46, q47, q48;
        byte q51, q52, q53, q54, q55, q56, q57, q58;
        byte q61, q62, q63, q64, q65, q66, q67, q68;
        byte q71, q72, q73, q74, q75, q76, q77, q78;
    }

    @State(Scope.Group)
    public static class StateHierarchy extends StateHierarchy_4 {
    }

    @Benchmark
    @Group("hierarchy")
    public int reader(StateHierarchy s) {
        return s.readOnly;
    }

    @Benchmark
    @Group("hierarchy")
    public void writer(StateHierarchy s) {
        s.writeOnly++;
    }

测试结果如下

Benchmark                                    Mode  Cnt    Score    Error   Units
JMHSample_22_FalseSharing.baseline          thrpt   25  698.244 ± 22.066  ops/us
JMHSample_22_FalseSharing.baseline:reader   thrpt   25   88.647 ± 11.051  ops/us
JMHSample_22_FalseSharing.baseline:writer   thrpt   25  609.597 ± 32.784  ops/us
JMHSample_22_FalseSharing.hierarchy         thrpt   25  802.604 ± 22.564  ops/us
JMHSample_22_FalseSharing.hierarchy:reader  thrpt   25  252.829 ± 14.266  ops/us
JMHSample_22_FalseSharing.hierarchy:writer  thrpt   25  549.775 ± 26.877  ops/us
JMHSample_22_FalseSharing.padded            thrpt   25  803.736 ± 19.131  ops/us
JMHSample_22_FalseSharing.padded:reader     thrpt   25  257.035 ±  6.357  ops/us
JMHSample_22_FalseSharing.padded:writer     thrpt   25  546.701 ± 18.636  ops/us

数组解决缓存行问题

/**
 * APPROACH 3: ARRAY TRICK
 *
 * This trick relies on the contiguous allocation of an array.
 * Instead of placing the fields in the class, we mangle them
 * into the array at very sparse offsets.
 */
@State(Scope.Group)
public static class StateArray {
    int[] arr = new int[128];
}

@Benchmark
@Group("sparse")
public int reader(StateArray s) {
    return s.arr[0];
}

@Benchmark
@Group("sparse")
public void writer(StateArray s) {
    s.arr[64]++;
}

@Contended注解

    @State(Scope.Group)
    public static class StateContended {
        int readOnly;

        @sun.misc.Contended
        int writeOnly;
    }

    @Benchmark
    @Group("contended")
    public int reader(StateContended s) {
        return s.readOnly;
    }

    @Benchmark
    @Group("contended")
    public void writer(StateContended s) {
        s.writeOnly++;
    }

这种方式仅支持JDK8以及之上版本，运行时需要修改jvm参数-XX:-RestrictContended为true

public static void main(String[] args) throws RunnerException {
   Options opt = new OptionsBuilder()
           .include(JMHSample_22_FalseSharing.class.getSimpleName())
           .threads(Runtime.getRuntime().availableProcessors())
           .jvmArgs("-XX:-RestrictContended")
           .build();

   new Runner(opt).run();
}