java math.min,Java Math.min / max性能

最新推荐文章于 2024-06-20 16:33:08 发布

茉莉味的糖

最新推荐文章于 2024-06-20 16:33:08 发布

阅读量292

点赞数

文章标签： java math.min

EDIT: maaartinus gave the answer I was looking for and tmyklebu's data on the problem helped a lot, so thanks both! :)

I've read a bit about how HotSpot has some "intrinsics" that injects in the code, specially for Java standard Math libs (from here)

So I decided to give it a try, to see how much difference HotSpot could make against doing the comparison directly (specially since I've heard min/max can compile to branchless asm).

public static final int max ( final int a, final int b )

{

if ( a > b )

{

return a;

}

return b;

}

That's my implementation. From another SO question I've read that using the ternary operator uses an extra register, I haven't found significant differences between doing an if block and using a ternary operator (ie, return ( a > b ) ? a : b ).

Allocating a 8Mb int array (ie, 2 million values), and randomizing it, I do the following test:

try ( final Benchmark bench = new Benchmark( "millis to max" ) )

{

int max = Integer.MIN_VALUE;

for ( int i = 0; i < array.length; ++i )

{

max = OpsMath.max( max, array[i] );

// max = Math.max( max, array[i] );

}

I'm using a Benchmark object in a try-with-resources block. When it finishes, it calls close() on the object and prints the time the block took to complete. The tests are done separately by commenting in/out the max calls in the code above.

'max' is added to a list outside the benchmark block and printed later, so to avoid the JVM optimizing the whole block away.

The array is randomized each time the test runs.

Running the test 6 times, it gives these results:

Java standard Math:

millis to max 9.242167

millis to max 2.1566199999999998

millis to max 2.046396

millis to max 2.048616

millis to max 2.035761

millis to max 2.001044

So fairly stable after the first run, and running the tests again gives similar results.

OpsMath:

millis to max 8.65418

millis to max 1.161559

millis to max 0.955851

millis to max 0.946642

millis to max 0.994543

millis to max 0.9469069999999999

Again, very stable results after the first run.

The question is: Why? Thats quite a big difference there. And I have no idea why. Even if I implement my max() method exactly like Math.max() (ie, return (a >= b) ? a : b ) I still get better results! It makes no sense.

Specs:

CPU: Intel i5 2500, 3,3Ghz.

Java Version: JDK 8 (public march 18 release), x64.

Debian Jessie (testing release) x64.

I have yet to try with 32 bit JVM.

EDIT: Self contained test as requested. Added a line to force the JVM to preload Math and OpsMath classes. That eliminates the 18ms cost of the first iteration for OpsMath test.

// Constant nano to millis.

final double TO_MILLIS = 1.0d / 1000000.0d;

// 8Mb alloc.

final int[] array = new int[(8*1024*1024)/4];

// Result and time array.

final ArrayList results = new ArrayList<>();

final ArrayList times = new ArrayList<>();

// Number of tests.

final int itcount = 6;

// Call both Math and OpsMath method so JVM initializes the classes.

System.out.println("initialize classes " +

OpsMath.max( Math.max( 20.0f, array.length ), array.length / 2.0f ));

final Random r = new Random();

for ( int it = 0; it < itcount; ++it )

{

int max = Integer.MIN_VALUE;

// Randomize the array.

for ( int i = 0; i < array.length; ++i )

{

array[i] = r.nextInt();

}

final long start = System.nanoTime();

for ( int i = 0; i < array.length; ++i )

{

max = Math.max( array[i], max );

// OpsMath.max() method implemented as described.

// max = OpsMath.max( array[i], max );

}

// Calc time.

final double end = (System.nanoTime() - start);

// Store results.

times.add( Double.valueOf( end ) );

results.add( Integer.valueOf( max ) );

}

// Print everything.

for ( int i = 0; i < itcount; ++i )

{

System.out.println( "IT" + i + " result: " + results.get( i ) );

System.out.println( "IT" + i + " millis: " + times.get( i ) * TO_MILLIS );

}

Java Math.max result:

IT0 result: 2147477409

IT0 millis: 9.636998

IT1 result: 2147483098

IT1 millis: 1.901314

IT2 result: 2147482877

IT2 millis: 2.095551

IT3 result: 2147483286

IT3 millis: 1.9232859999999998

IT4 result: 2147482828

IT4 millis: 1.9455179999999999

IT5 result: 2147482475

IT5 millis: 1.882047

OpsMath.max result:

IT0 result: 2147482689

IT0 millis: 9.003616

IT1 result: 2147483480

IT1 millis: 0.882421

IT2 result: 2147483186

IT2 millis: 1.079143

IT3 result: 2147478560

IT3 millis: 0.8861169999999999

IT4 result: 2147477851

IT4 millis: 0.916383

IT5 result: 2147481983

IT5 millis: 0.873984

Still the same overall results. I've tried with randomizing the array only once, and repeating the tests over the same array, I get faster results overall, but the same 2x difference between Java Math.max and OpsMath.max.

解决方案

It's hard to tell why Math.max is slower than a Ops.max, but it's easy to tell why this benchmark strongly favors branching to conditional moves: On the n-th iteration, the probability of

Math.max( array[i], max );

being not equal to max is the probability that array[n-1] is bigger than all previous elements. Obviously, this probability gets lower and lower with growing n and given

final int[] array = new int[(8*1024*1024)/4];

it's rather negligible most of the time. The conditional move instruction is insensitive to the branching probability, it always take the same amount of time to execute. The conditional move instruction is faster than branch prediction if the branch is very hard to predict. On the other hand, branch prediction is faster if the branch can be predicted well with high probability. Currently, I'm unsure about the speed of conditional move compared to best and worst case of branching.1

In your case all but first few branches are fairly predictable. From about n == 10 onward, there's no point in using conditional moves as the branch is rather guaranteed to be predicted correctly and can execute in parallel with other instructions (I guess you need exactly one cycle per iteration).

This seems to happen for algorithms computing minimum/maximum or doing some inefficient sorting (good branch predictability means low entropy per step).

1 Both conditional move and predicted branch take one cycle. The problem with the former is that it needs its two operands and this takes additional instruction. In the end the critical path may get longer and/or the ALUs saturated while the branching unit is idle. Often, but not always, branches can be predicted well in practical applications; that's why branch prediction was invented in the first place.

As for the gory details of timing conditional move vs. branch prediction best and worst case, see the discussion below in comments. My my own benchmark shows that conditional move is significantly faster than branch prediction when branch prediction encounters its worst case, but I can't ignore contradictory results. We need some explanation for what exactly makes the difference. Some more benchmarks and/or analysis could help.