分别对如下计算进行不同方式的处理
x = abs(*(I2pData+i) /(sqrt(3.0) * (esp + *(I1pData+i)) ));
原始的C版本
cost time:32.7379
cost time:33.2216
cost time:33.1455
cost time:32.8658
cost time:32.8115
cost time:35.0207
cost time:33.7224
cost time:32.7236
cost time:32.7317
cost time:32.6847
sse加速
cost time:29.2576
cost time:29.2466
cost time:29.2112
cost time:29.2658
cost time:29.4008
cost time:29.5152
cost time:29.2775
cost time:30.8675
cost time:30.4669
cost time:29.5768
avx加速
cost time:29.0309
cost time:29.0445
cost time:29.0172
cost time:29.0109
cost time:29.0343
cost time:29.0368
cost time:29.0123
cost time:29.0284
cost time:29.0341
cost time:29.038
普通版本的多核
cost time:14.768
cost time:11.0459
cost time:10.1506
cost time:9.74029
cost time:9.25993
cost time:8.68711
cost time:8.24942
cost time:8.06414
cost time:7.77311
cost time:7.48125
open mp sse加速
cost time:79.258
cost time:48.7382
cost time:45.1632
cost time:43.5007
cost time:43.7332
cost time:43.3712
cost time:43.659
cost time:43.4181
cost time:43.6024
cost time:43.7735
奇怪的是openmp多核加上sse速度反而反而会降低很多