工具
numactl
github地址:https://github.com/numactl/numactl
在ubuntu上可以直接安装。
(base) [root@localhost test]# numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 65431 MB
node 0 free: 35258 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 65535 MB
node 1 free: 62369 MB
node 2 cpus: 16 17 18 19 20 21 22 23
node 2 size: 65535 MB
node 2 free: 64002 MB
node 3 cpus: 24 25 26 27 28 29 30 31
node 3 size: 65535 MB
node 3 free: 63536 MB
node 4 cpus: 32 33 34 35 36 37 38 39
node 4 size: 65535 MB
node 4 free: 63747 MB
node 5 cpus: 40 41 42 43 44 45 46 47
node 5 size: 65535 MB
node 5 free: 63872 MB
node 6 cpus: 48 49 50 51 52 53 54 55
node 6 size: 65535 MB
node 6 free: 64035 MB
node 7 cpus: 56 57 58 59 60 61 62 63
node 7 size: 65535 MB
node 7 free: 61804 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 16 16 16 28 28 22 28
1: 16 10 16 16 28 28 28 22
2: 16 16 10 16 22 28 28 28
3: 16 16 16 10 28 22 28 28
4: 28 28 22 28 10 16 16 16
5: 28 28 28 22 16 10 16 16
6: 22 28 28 28 16 16 10 16
7: 28 22 28 28 16 16 16 10
(base) [root@localhost test]#
(base) [root@localhost test]# numastat
node0 node1 node2 node3
numa_hit 17238639 2348706 571063 748099
numa_miss 0 0 0 0
numa_foreign 0 0 0 0
interleave_hit 27721 27729 27726 27734
local_node 17210141 2216993 493770 671023
other_node 28498 131713 77293 77076
node4 node5 node6 node7
numa_hit 1034606 876289 562762 1165051
numa_miss 0 0 0 0
numa_foreign 0 0 0 0
interleave_hit 27715 27744 27724 27727
local_node 956360 751729 472855 735365
other_node 78246 124560 89907 429686
(base) [root@localhost test]#
MLC
Intel 的Memory Latency Checker是用来检测numa 内存延迟和带宽的工具,一般使用下面两个命令:
在这里插入代码片
rocm_bandwidth_test
AMD的rocm软件栈中也有一个类似MLC的工具,用于测试GPU和CPU之间的带宽,一般使用方式如下:
(base) [root@localhost centos7.6]# ./rocm-bandwidth-test
............................................................................................................................................................................................................................................
RocmBandwidthTest Version: 2.3.11
Launch Command is: ./rocm-bandwidth-test (rocm_bandwidth -a + rocm_bandwidth -A)
Device: 0,
Device: 1,
Device: 2,
Device: 3,
Device: 4,
Device: 5,
Device: 6,
Device: 7,
Device: 8, C888888, c5:0.0
Device: 9, C888888, c8:0.0
Device: 10, C888888, cb:0.0
Device: 11, C888888, ce:0.0
Inter-Device Access
D/D 0 1 2 3 4 5 6 7 8 9 10 11
0 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 1 1 1 1
5 1 1 1 1 1 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 1 1 1 1 1
7 1 1 1 1 1 1 1 1 1 1 1 1
8 1 1 1 1 1 1 1 1 1 1 1 1
9 1 1 1 1 1 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1 1 1 1 1 1
11 1 1 1 1 1 1 1 1 1 1 1 1
Inter-Device Numa Distance
D/D 0 1 2 3 4 5 6 7 8 9 10 11
0 0 16 16 16 28 28 22 28 42 42 42 42
1 16 0 16 16 28 28 28 22 48 48 48 48
2 16 16 0 16 22 28 28 28 48 48 48 48
3 16 16 16 0 28 22 28 28 48 48 48 48
4 28 28 22 28 0 16 16 16 36 36 36 36
5 28 28 28 22 16 0 16 16 36 36 36 36
6 22 28 28 28 16 16 0 16 20 20 20 20
7 28 22 28 28 16 16 16 0 36 36 36 36
8 42 48 48 48 36 36 20 36 0 40 40 40
9 42 48 48 48 36 36 20 36 40 0 40 40
10 42 48 48 48 36 36 20 36 40 40 0 40
11 42 48 48 48 36 36 20 36 40 40 40 0
Unidirectional copy peak bandwidth GB/s
D/D 0 1 2 3 4 5 6 7 8 9 10 11
0 N/A N/A N/A N/A N/A N/A N/A N/A 8.235595 8.234948 8.234949 8.235756
1 N/A N/A N/A N/A N/A N/A N/A N/A 8.238830 8.238668 8.239155 8.238830
2 N/A N/A N/A N/A N/A N/A N/A N/A 8.238830 8.237859 8.239155 8.238830
3 N/A N/A N/A N/A N/A N/A N/A N/A 8.239154 8.238830 8.238831 8.240125
4 N/A N/A N/A N/A N/A N/A N/A N/A 13.080635 13.020941 13.023370 13.079411
5 N/A N/A N/A N/A N/A N/A N/A N/A 13.081043 13.020941 13.022561 13.081451
6 N/A N/A N/A N/A N/A N/A N/A N/A 13.022558 13.020941 13.022561 13.081859
7 N/A N/A N/A N/A N/A N/A N/A N/A 13.021750 13.020941 13.080230 13.021750
8 7.550506 7.538700 7.549148 7.549148 14.106568 14.106568 14.106093 14.105619 587.026452 14.106568 14.106093 14.105619
9 7.550370 7.538157 7.549147 7.548875 14.106568 14.105619 14.106568 14.106568 14.106568 585.796648 14.106093 14.106568
10 7.551459 7.538430 7.549148 7.549148 14.106096 14.106571 14.106571 14.106571 14.106571 14.106571 586.615944 14.108469
11 7.550371 7.538158 7.549148 7.549012 14.105619 14.106568 14.105144 14.105144 14.106093 14.106568 14.105619 586.615944
Bdirectional copy peak bandwidth GB/s
D/D 0 1 2 3 4 5 6 7 8 9 10 11
0 N/A N/A N/A N/A N/A N/A N/A N/A 11.942445 11.976716 12.012391 11.968857
1 N/A N/A N/A N/A N/A N/A N/A N/A 11.960665 12.067514 12.057456 12.037557
2 N/A N/A N/A N/A N/A N/A N/A N/A 12.024616 12.061441 12.047932 12.067515
3 N/A N/A N/A N/A N/A N/A N/A N/A 12.003796 12.075679 12.070295 12.037212
4 N/A N/A N/A N/A N/A N/A N/A N/A 22.000032 22.024291 22.037603 22.068907
5 N/A N/A N/A N/A N/A N/A N/A N/A 22.087501 22.019088 22.090995 22.081687
6 N/A N/A N/A N/A N/A N/A N/A N/A 22.227966 22.330926 22.292356 22.303021
7 N/A N/A N/A N/A N/A N/A N/A N/A 21.940764 21.974099 22.007539 21.992534
8 11.942445 11.960665 12.024616 12.003796 22.000032 22.087501 22.227966 21.940764 N/A 27.127089 27.107968 27.126924
9 11.976716 12.067514 12.061441 12.075679 22.024291 22.019088 22.330926 21.974099 27.127089 N/A 27.743500 27.713025
10 12.012391 12.057456 12.047932 12.070295 22.037603 22.090995 22.292356 22.007539 27.107968 27.743500 N/A 27.713946
11 11.968857 12.037557 12.067515 12.037212 22.068907 22.081687 22.303021 21.992534 27.126924 27.713025 27.713946 N/A
(base) [root@localhost centos7.6]#
API
在安装完numactl
或者libnuma-dev
以后系统中应该会有如下两个头文件/usr/inclue/numa.h
和/usr/include/numaif.h
,二者的功能有一些重叠,笔者暂时还没弄清楚为啥这么设置。
详细函数的使用咱先跳过,直接看代码。
#include <stdio.h>
#include <stdlib.h>
#include <numa.h>
#include <numaif.h>
#include <errno.h>
#include <sys/mman.h>
#ifndef MPOL_F_STATIC_NODES
/* Bug in numaif.h, this should be defined in there. Definition copied
* from linux/mempolicy.h.
*/
#define MPOL_F_STATIC_NODES (1 << 15)
#endif
void print_bitmask(struct bitmask* mask) {
printf("bitmask size: %lu \n", mask->size);
printf("bitmask val: 0x%lx \n", *mask->maskp);
}
void print_nodes(int node_count) {
long long fsz = 0;
for (int i = 0; i < node_count; i++) {
long long sz = numa_node_size64(i, &fsz);
printf("node: %d, size: %lldMB, free: %lldMB \n", i, sz >> 20, fsz >> 20);
}
}
int main(int argc, const char* argv[])
{
if (-1 == numa_available()) {
printf("numa support not available \n");
goto out;
}
int maxnode = numa_max_node();
print_nodes(maxnode + 1);
struct bitmask *node_mask;
node_mask = numa_bitmask_alloc(maxnode + 1);
if (!node_mask) {
printf("numa bit mask alloc failed \n");
goto err;
}
print_bitmask(node_mask);
// bind the memory to node 1
numa_bitmask_setbit(node_mask, 1);
print_bitmask(node_mask);
size_t len = 10UL << 30;
char *tenGB = (char *) mmap(tenGB, len, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
if (NULL == tenGB) {
printf("failed to malloc memory \n");
}
// int mode = MPOL_F_STATIC_NODES | MPOL_BIND;
// int ret = mbind(tenGB, len, mode, node_mask->maskp, maxnode + 2, 0);
// if (ret < 0) {
// printf("mbind failed, error: %d \n", errno);
// }
memset(tenGB, 0x0, len);
print_nodes(maxnode + 1);
numa_bitmask_free(node_mask);
err:
out:
return 0;
}
(base) [root@localhost test]# ./mynuma
node: 0, size: 65431MB, free: 35245MB
node: 1, size: 65535MB, free: 62371MB
node: 2, size: 65535MB, free: 64003MB
node: 3, size: 65535MB, free: 63537MB
node: 4, size: 65535MB, free: 63744MB
node: 5, size: 65535MB, free: 63884MB
node: 6, size: 65535MB, free: 64035MB
node: 7, size: 65535MB, free: 61804MB
bitmask size: 8
bitmask val: 0x0
bitmask size: 8
bitmask val: 0x2
node: 0, size: 65431MB, free: 35245MB
node: 1, size: 65535MB, free: 62371MB
node: 2, size: 65535MB, free: 64003MB
node: 3, size: 65535MB, free: 63537MB
node: 4, size: 65535MB, free: 63744MB
node: 5, size: 65535MB, free: 53624MB
node: 6, size: 65535MB, free: 64035MB
node: 7, size: 65535MB, free: 61804MB
(base) [root@localhost test]#
(base) [root@localhost test]#
(base) [root@localhost test]#
(base) [root@localhost test]# ./mynuma
node: 0, size: 65431MB, free: 35237MB
node: 1, size: 65535MB, free: 62371MB
node: 2, size: 65535MB, free: 64000MB
node: 3, size: 65535MB, free: 63535MB
node: 4, size: 65535MB, free: 63745MB
node: 5, size: 65535MB, free: 63882MB
node: 6, size: 65535MB, free: 64035MB
node: 7, size: 65535MB, free: 61805MB
bitmask size: 8
bitmask val: 0x0
bitmask size: 8
bitmask val: 0x2
node: 0, size: 65431MB, free: 35237MB
node: 1, size: 65535MB, free: 62371MB
node: 2, size: 65535MB, free: 64000MB
node: 3, size: 65535MB, free: 54496MB
node: 4, size: 65535MB, free: 62525MB
node: 5, size: 65535MB, free: 63882MB
node: 6, size: 65535MB, free: 64035MB
node: 7, size: 65535MB, free: 61805MB
(base) [root@localhost test]#
(base) [root@localhost test]#
(base) [root@localhost test]# ./mynuma
node: 0, size: 65431MB, free: 35237MB
node: 1, size: 65535MB, free: 62372MB
node: 2, size: 65535MB, free: 64000MB
node: 3, size: 65535MB, free: 63534MB
node: 4, size: 65535MB, free: 63745MB
node: 5, size: 65535MB, free: 63882MB
node: 6, size: 65535MB, free: 64035MB
node: 7, size: 65535MB, free: 61805MB
bitmask size: 8
bitmask val: 0x0
bitmask size: 8
bitmask val: 0x2
node: 0, size: 65431MB, free: 35233MB
node: 1, size: 65535MB, free: 62371MB
node: 2, size: 65535MB, free: 64000MB
node: 3, size: 65535MB, free: 63534MB
node: 4, size: 65535MB, free: 63743MB
node: 5, size: 65535MB, free: 53622MB
node: 6, size: 65535MB, free: 64035MB
node: 7, size: 65535MB, free: 61805MB
(base) [root@localhost test]#
(base) [root@localhost test]# ./mynuma
node: 0, size: 65431MB, free: 35235MB
node: 1, size: 65535MB, free: 62371MB
node: 2, size: 65535MB, free: 64000MB
node: 3, size: 65535MB, free: 63534MB
node: 4, size: 65535MB, free: 63744MB
node: 5, size: 65535MB, free: 63880MB
node: 6, size: 65535MB, free: 64035MB
node: 7, size: 65535MB, free: 61805MB
bitmask size: 8
bitmask val: 0x0
bitmask size: 8
bitmask val: 0x2
node: 0, size: 65431MB, free: 35235MB
node: 1, size: 65535MB, free: 62371MB
node: 2, size: 65535MB, free: 64000MB
node: 3, size: 65535MB, free: 63534MB
node: 4, size: 65535MB, free: 53483MB
node: 5, size: 65535MB, free: 63880MB
node: 6, size: 65535MB, free: 64035MB
node: 7, size: 65535MB, free: 61805MB
(base) [root@localhost test]#
可以看出来,每次分配的内存都不在同一个节点上,我们用numactl绑定节点再试试看
(base) [root@localhost test]# numactl --preferred=2 ./mynuma
node: 0, size: 65431MB, free: 35237MB
node: 1, size: 65535MB, free: 62368MB
node: 2, size: 65535MB, free: 64001MB
node: 3, size: 65535MB, free: 63410MB
node: 4, size: 65535MB, free: 63744MB
node: 5, size: 65535MB, free: 63862MB
node: 6, size: 65535MB, free: 63862MB
node: 7, size: 65535MB, free: 61677MB
bitmask size: 8
bitmask val: 0x0
bitmask size: 8
bitmask val: 0x2
node: 0, size: 65431MB, free: 35241MB
node: 1, size: 65535MB, free: 62368MB
node: 2, size: 65535MB, free: 53741MB
node: 3, size: 65535MB, free: 63538MB
node: 4, size: 65535MB, free: 63616MB
node: 5, size: 65535MB, free: 63857MB
node: 6, size: 65535MB, free: 63862MB
node: 7, size: 65535MB, free: 61677MB
(base) [root@localhost test]#
(base) [root@localhost test]# numactl --preferred=2 ./mynuma
node: 0, size: 65431MB, free: 35241MB
node: 1, size: 65535MB, free: 62367MB
node: 2, size: 65535MB, free: 64000MB
node: 3, size: 65535MB, free: 63538MB
node: 4, size: 65535MB, free: 63616MB
node: 5, size: 65535MB, free: 63857MB
node: 6, size: 65535MB, free: 63862MB
node: 7, size: 65535MB, free: 61677MB
bitmask size: 8
bitmask val: 0x0
bitmask size: 8
bitmask val: 0x2
node: 0, size: 65431MB, free: 35241MB
node: 1, size: 65535MB, free: 62367MB
node: 2, size: 65535MB, free: 53741MB
node: 3, size: 65535MB, free: 63538MB
node: 4, size: 65535MB, free: 63616MB
node: 5, size: 65535MB, free: 63857MB
node: 6, size: 65535MB, free: 63862MB
node: 7, size: 65535MB, free: 61677MB
(base) [root@localhost test]#
(base) [root@localhost test]# numactl --preferred=2 ./mynuma
node: 0, size: 65431MB, free: 35241MB
node: 1, size: 65535MB, free: 62366MB
node: 2, size: 65535MB, free: 64001MB
node: 3, size: 65535MB, free: 63538MB
node: 4, size: 65535MB, free: 63744MB
node: 5, size: 65535MB, free: 63728MB
node: 6, size: 65535MB, free: 63862MB
node: 7, size: 65535MB, free: 61676MB
bitmask size: 8
bitmask val: 0x0
bitmask size: 8
bitmask val: 0x2
node: 0, size: 65431MB, free: 35241MB
node: 1, size: 65535MB, free: 62366MB
node: 2, size: 65535MB, free: 53740MB
node: 3, size: 65535MB, free: 63538MB
node: 4, size: 65535MB, free: 63744MB
node: 5, size: 65535MB, free: 63728MB
node: 6, size: 65535MB, free: 63862MB
node: 7, size: 65535MB, free: 61676MB
(base) [root@localhost test]#
可以看出来这样每次分配内存都分配到了节点2上。
我们将代码中mbind相关代码注释取消再来运行
(base) [root@localhost test]# numactl --preferred=2 ./mynuma
node: 0, size: 65431MB, free: 35255MB
node: 1, size: 65535MB, free: 62365MB
node: 2, size: 65535MB, free: 64000MB
node: 3, size: 65535MB, free: 63538MB
node: 4, size: 65535MB, free: 63740MB
node: 5, size: 65535MB, free: 63844MB
node: 6, size: 65535MB, free: 64028MB
node: 7, size: 65535MB, free: 61803MB
bitmask size: 8
bitmask val: 0x0
bitmask size: 8
bitmask val: 0x2
node: 0, size: 65431MB, free: 35257MB
node: 1, size: 65535MB, free: 52126MB
node: 2, size: 65535MB, free: 63981MB
node: 3, size: 65535MB, free: 63538MB
node: 4, size: 65535MB, free: 63740MB
node: 5, size: 65535MB, free: 63844MB
node: 6, size: 65535MB, free: 64028MB
node: 7, size: 65535MB, free: 61803MB
(base) [root@localhost test]#
(base) [root@localhost test]# numactl --preferred=2 ./mynuma
node: 0, size: 65431MB, free: 35254MB
node: 1, size: 65535MB, free: 62364MB
node: 2, size: 65535MB, free: 63999MB
node: 3, size: 65535MB, free: 63537MB
node: 4, size: 65535MB, free: 63741MB
node: 5, size: 65535MB, free: 63844MB
node: 6, size: 65535MB, free: 64028MB
node: 7, size: 65535MB, free: 61803MB
bitmask size: 8
bitmask val: 0x0
bitmask size: 8
bitmask val: 0x2
node: 0, size: 65431MB, free: 35254MB
node: 1, size: 65535MB, free: 52124MB
node: 2, size: 65535MB, free: 63979MB
node: 3, size: 65535MB, free: 63537MB
node: 4, size: 65535MB, free: 63741MB
node: 5, size: 65535MB, free: 63844MB
node: 6, size: 65535MB, free: 64028MB
node: 7, size: 65535MB, free: 61803MB
(base) [root@localhost test]#
可以发现api的优先级比numactl高。
#include <stdio.h>
#include <stdlib.h>
#include <numa.h>
#include <numaif.h>
#include <errno.h>
#include <sys/mman.h>
#include <numacompat1.h>
#ifndef MPOL_F_STATIC_NODES
/* Bug in numaif.h, this should be defined in there. Definition copied
* from linux/mempolicy.h.
*/
#define MPOL_F_STATIC_NODES (1 << 15)
#endif
void print_bitmask(struct bitmask* mask) {
printf("bitmask size: %lu \n", mask->size);
printf("bitmask val: 0x%lx \n", *mask->maskp);
}
void print_nodes(int node_count) {
long long fsz = 0;
for (int i = 0; i < node_count; i++) {
long long sz = numa_node_size64(i, &fsz);
printf("node: %d, size: %lldMB, free: %lldMB \n", i, sz >> 20, fsz >> 20);
}
}
int main(int argc, const char* argv[])
{
if (-1 == numa_available()) {
printf("numa support not available \n");
goto out;
}
int maxnode = numa_max_node();
print_nodes(maxnode + 1);
struct bitmask *node_mask;
node_mask = numa_bitmask_alloc(maxnode + 1);
if (!node_mask) {
printf("numa bit mask alloc failed \n");
goto err;
}
print_bitmask(node_mask);
// bind the memory to node 1
numa_bitmask_setbit(node_mask, 1);
print_bitmask(node_mask);
size_t len = 10UL << 30;
char *tenGB = (char *) mmap(tenGB, len, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
if (NULL == tenGB) {
printf("failed to malloc memory \n");
}
nodemask_t memmask = numa_get_membind();
printf("memmask: 0x%lx \n", memmask);
int mode = MPOL_F_STATIC_NODES | MPOL_BIND;
int ret = mbind(tenGB, len, mode, node_mask->maskp, maxnode + 2, 0);
if (ret < 0) {
printf("mbind failed, error: %d \n", errno);
}
memset(tenGB, 0x0, len);
print_nodes(maxnode + 1);
numa_bitmask_free(node_mask);
err:
out:
return 0;
}
修改代码加入numa_get_membind
函数,检测当前memory是否有绑定node,再次运行
(base) [root@localhost test]# numactl --preferred=2 ./mynuma
node: 0, size: 65431MB, free: 35198MB
node: 1, size: 65535MB, free: 62362MB
node: 2, size: 65535MB, free: 63982MB
node: 3, size: 65535MB, free: 63539MB
node: 4, size: 65535MB, free: 63740MB
node: 5, size: 65535MB, free: 63878MB
node: 6, size: 65535MB, free: 64037MB
node: 7, size: 65535MB, free: 61806MB
bitmask size: 8
bitmask val: 0x0
bitmask size: 8
bitmask val: 0x2
memmask: 0xff
node: 0, size: 65431MB, free: 35205MB
node: 1, size: 65535MB, free: 52123MB
node: 2, size: 65535MB, free: 63966MB
node: 3, size: 65535MB, free: 63539MB
node: 4, size: 65535MB, free: 63740MB
node: 5, size: 65535MB, free: 63876MB
node: 6, size: 65535MB, free: 64037MB
node: 7, size: 65535MB, free: 61806MB
(base) [root@localhost test]#
(base) [root@localhost test]#
(base) [root@localhost test]#
(base) [root@localhost test]#
(base) [root@localhost test]#
(base) [root@localhost test]#
(base) [root@localhost test]#
(base) [root@localhost test]# numactl --membind=2 ./mynuma
node: 0, size: 65431MB, free: 35192MB
node: 1, size: 65535MB, free: 62364MB
node: 2, size: 65535MB, free: 64000MB
node: 3, size: 65535MB, free: 63539MB
node: 4, size: 65535MB, free: 63743MB
node: 5, size: 65535MB, free: 63873MB
node: 6, size: 65535MB, free: 64037MB
node: 7, size: 65535MB, free: 61806MB
bitmask size: 8
bitmask val: 0x0
bitmask size: 8
bitmask val: 0x2
memmask: 0x4
node: 0, size: 65431MB, free: 35192MB
node: 1, size: 65535MB, free: 52124MB
node: 2, size: 65535MB, free: 63980MB
node: 3, size: 65535MB, free: 63539MB
node: 4, size: 65535MB, free: 63743MB
node: 5, size: 65535MB, free: 63873MB
node: 6, size: 65535MB, free: 64037MB
node: 7, size: 65535MB, free: 61806MB
(base) [root@localhost test]#
发现numactl的–membind其实就是制定了当前mem所需要绑定的node。
后期代码有些许修改,请参阅https://github.com/raykwok1150/numa-demo