mgalcu
slurm - *****.out
优化1:
(baseline: true/float64/float64)
42680 - baseline - 376.579s
42685 - true/float32/float32 - 329.579s
42686 - false/float32/float32 - 514.048s
42687 - true/float16/float16 - 402.219s
42688 - false/float16/float16 - 589.722s
42690 - true/float32/float16 - 421.687s
42691 - true/float16/float32 - 478.926s
42692 - true/float64/float32 - 522.439s
42693 - true/float32/float64 - 452.675s
42697 - true/float64/float16 - 380.009s
42698 - true/float16/float64 - 413.311s
42699 - false/float16/float64 - 578.449s
优化2
格式:
true/float32/float32
number | num_intra_threads | num_inter_threads | time |
---|---|---|---|
42827 | 1 | 1 | 216.473s |
42837 | 1 | 2 | 283.214s |
42830 | 1 | 3 | 317.024s |
42836 | 2 | 1 | 229.168s |
42832 | 2 | 2 | 329.669s |
42841 | 2 | 3 | 276.582s |
42839 | 3 | 1 | 234.837s |
42840 | 3 | 2 | 273.078s |
42838 | 3 | 3 | 259.427s |
42842 | 3 | 4 | 272.911s |
42843 | 4 | 1 | 220.982s |
42844 | 4 | 2 | 378.229s |
42845 | 4 | 3 | 388.401s |
42846 | 4 | 4 | 389.674s |
42847 | 5 | 1 | 282.551s |
42848 | 5 | 2 | 290.621s |
42849 | 5 | 3 | 280.024s |
42850 | 5 | 4 | 278.902s |
42851 | 5 | 5 | 269.933s |
42852 | 6 | 1 | 214.897s |
42864 | 6 | 2 | 272.777s |
42853 | 6 | 6 | 275.516s |
42854 | 6 | 8 | 272.305s |
42855 | 7 | 1 | 217.494s |
42856 | 7 | 7 | 291.409s |
42857 | 8 | 1 | 219.555s |
42858 | 8 | 4 | 318.062s |
42865 | 9 | 1 | 221.695s |
42866 | 10 | 1 | 222.555s |
优化3
对优化2进一步
格式:
true/float32/float32
intra: 6
inter: 1
number | OMP_NUM_THREADS | time |
---|---|---|
42877 | - | 214.897s |
42869 | 1 | 329.136s |
42870 | 2 | 332.076s |
42871 | 3 | 313.161s |
42872 | 4 | 340.577s |
42873 | 5 | 332.321s |
42874 | 6 | 348.315s |
42875 | 7 | 475.030s |
42876 | 8 | 479.035s |
42882 | 9 | 272.468s |
42878 | 10 | 271.305s |
42883 | 11 | 271.302s |
42879 | 12 | 274.557s |
42880 | 16 | 377.460s |
42881 | 18 | 350.008s |
42884 - 6 / 3 / 2 309.766s
42886 - 10 / 3 /2 261.253s