目录
1.1 x86: MMX, SSE, AVX, AVX2, AVX-512
2.1 Automatic vectorization of Java code
一、SIMD today
1.1 x86: MMX, SSE, AVX, AVX2, AVX-512
8 64-bit registers (MMX) to 32 512-bit registers (AVX-512)
1.2 ARM: NEON, SVE, SVE2
32 128-bit registers (NEON) to 32 128-2048-bit in SVE
1.3 POWER: VMX/AltiVec
32 128-bit registers
1.4 MIPS: MSA LASX
32 128-bit registers in 32 128-bit registers in LASX(only Loongson expand)
二、JVM and SIMD today
2.1 Automatic vectorization of Java code
Superword optimizations (-XX:+UseSuperWord)in HotSpot C2 compiler to derive SIMD code from sequential code.
缺点:And applied only to unrolled loops. Superword optimizations can be very brittle doesn’t (and can’t) cover all the use cases.
2.2 JVM intrinsics
e.g., Array copying, filling, and comparison
缺点:Intrinsics are point fixes, not general powerful, lightweight, and flexible high development costs
2.3 Vector API
JEP 338: “Vector API (Incubator)”
reliable way to write performant vectorized code.
int[] A, B, C
for (int i = 0; i < MAX; i++) {
A[i] = B[i] + C[i];
}
equal:
var S = IntVector.SPECIES_PREFERRED;
for (int i = 0; i < MAX; i += S.length()) {
var va = IntVector.fromArray(S, A, i);
var vb = IntVector.fromArray(S, B, i);
var vc = va.add(vb);
vc.intoArray(C, i);
}