简介
The LU kernel factors a dense matrix into the product of a lower triangular and an upper triangular matrix. The dense matrix A is divided into an array of blocks ( ) to exploit temporal locality on submatrix elements. To reduce communication, block ownership is assigned using a 2-D scatter decomposition, with blocks being updated by the processors that own them.
1.热点分析
1.1热点函数
1.2热点循环
格式说明: 热点循环-(函数执行次数-各层执行总次数)
执行百分比:热点循环占本函数执行比例
函数: bmod
bm.L1.1(5559680-88954880-1423278080)
执行百分比:95.57%
bm.L1.1.1(5559680-88954880-1423278080-22772449280)
执行百分比:
1.3 热点代码
/* 函数 bmod */