循环内的局部变量和性能

最新推荐文章于 2022-04-28 16:36:55 发布

dnc8371

最新推荐文章于 2022-04-28 16:36:55 发布

阅读量537

点赞数

本文探讨了在循环中分配局部变量对性能的影响。通过一系列测试，结果显示这样做几乎没有任何显著差异，无论变量是在循环内外声明。测试涉及不同操作，如模运算、位移和位与，结果表明这些操作的成本在循环中的影响微乎其微，暗示JIT编译器可能已经优化了这类分配。

摘要由CSDN通过智能技术生成

总览

有时会出现一个问题，即分配一个新的局部变量需要花费多少工作。我的感觉一直是，代码已优化到成本为静态的程度，即一次执行，而不是每次运行时都执行。

最近， Ishwor Gurung建议考虑将一些局部变量移出循环。我怀疑这不会有所作为，但我从未测试过是否确实如此。

考试

这是我运行的测试：

public static void main(String... args) {
    for (int i = 0; i < 10; i++) {
        testInsideLoop();
        testOutsideLoop();
    }
}

private static void testInsideLoop() {
    long start = System.nanoTime();
    int[] counters = new int[144];
    int runs = 200 * 1000;
    for (int i = 0; i < runs; i++) {
        int x = i % 12;
        int y = i / 12 % 12;
        int times = x * y;
        counters[times]++;
    }
    long time = System.nanoTime() - start;
    System.out.printf("Inside: Average loop time %.1f ns%n", (double) time / runs);
}

private static void testOutsideLoop() {
    long start = System.nanoTime();
    int[] counters = new int[144];
    int runs = 200 * 1000, x, y, times;
    for (int i = 0; i < runs; i++) {
        x = i % 12;
        y = i / 12 % 12;
        times = x * y;
        counters[times]++;
    }
    long time = System.nanoTime() - start;
    System.out.printf("Outside: Average loop time %.1f ns%n", (double) time / runs);
}

输出以以下结尾：

内部：平均循环时间3.6 ns
外：平均循环时间3.6 ns
内部：平均循环时间3.6 ns 外：平均循环时间3.6 ns

将测试时间增加到1亿次迭代，对结果的影响很小。

内部：平均循环时间3.8 ns
外：平均循环时间3.8 ns
内部：平均循环时间3.8 ns 外：平均循环时间3.8 ns

用>>, &, + I代替模和乘法

int x = i & 15;
int y = (i >> 4) & 15;
int times = x + y;

版画

内部：平均循环时间1.2 ns
外：平均循环时间1.2 ns
内部：平均循环时间1.2 ns 外：平均循环时间1.2 ns

尽管模量相对昂贵，但测试的分辨率为0.1 ns或小于时钟周期的1/3。这将显示两次测试之间的任何差异，以达到此精度。

使用卡尺

正如@maaartinus所评论的那样， Caliper是一个微基准测试库，因此我对手工编写代码可能会慢得多感兴趣。

public static void main(String... args) {
    Runner.main(LoopBenchmark.class, args);
}

public static class LoopBenchmark extends SimpleBenchmark {
    public void timeInsideLoop(int reps) {
        int[] counters = new int[144];
        for (int i = 0; i < reps; i++) {
            int x = i % 12;
            int y = i / 12 % 12;
            int times = x * y;
            counters[times]++;
        }
    }

    public void timeOutsideLoop(int reps) {
        int[] counters = new int[144];
        int x, y, times;
        for (int i = 0; i < reps; i++) {
            x = i % 12;
            y = i / 12 % 12;
            times = x * y;
            counters[times]++;
        }
    }
}

首先要注意的是，代码较短，因为它不包括计时和印刷样板代码。运行此程序，我将与第一个测试使用同一台计算机。

0% Scenario{vm=java, trial=0, benchmark=InsideLoop} 4.23 ns; σ=0.01 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=OutsideLoop} 4.23 ns; σ=0.01 ns @ 3 trials

benchmark   ns linear runtime
InsideLoop 4.23 ==============================
OutsideLoop 4.23 =============================

vm: java
trial: 0

用shift和and替换模数

0% Scenario{vm=java, trial=0, benchmark=InsideLoop} 1.27 ns; σ=0.01 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=OutsideLoop} 1.27 ns; σ=0.00 ns @ 3 trials

benchmark   ns linear runtime
InsideLoop 1.27 =============================
OutsideLoop 1.27 ==============================

vm: java
trial: 0

这与第一个结果是一致的，并且一次测试仅慢了0.4-0.6 ns。（大约两个时钟周期），并且移位几乎没有差异，并且加上测试。这可能是由于游标卡尺对数据进行采样而不会改变结果的缘故。

毫无疑问，在运行真实程序时，由于微型程序会执行更多操作，因此获得的时间通常比微型基准测试时间更长，因此缓存和分支预测并不理想。对所花费时间的一小部分过高估计可能更接近您在实际程序中所期望的时间。