linux下的numa机制

最新推荐文章于 2024-04-29 13:50:13 发布

骗人布-

最新推荐文章于 2024-04-29 13:50:13 发布

阅读量657

点赞数

分类专栏： linux

本文链接：https://blog.csdn.net/weixin_46600100/article/details/104960969

版权

linux 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

NUMA(Non-Uniform Memory Access)字面直译为“非一致性内存访问”，对于Linux内核来说最早出现在2.6.7版本上。这种特性对于当下大内存+多CPU为潮流的X86平台来说确实会有不少的性能提升，但相反的，如果配置不当的话，也是一个很大的坑。本文就从头开始说说Linux下关于CPU NUMA特性的配置和调优。

在若干年前，对于x86架构的计算机，那时的内存控制器还没有整合进CPU，所有内存的访问都需要通过北桥芯片来完成。此时的内存访问如下图所示，被称为UMA（uniform memory access, 一致性内存访问 ）。这样的访问对于软件层面来说非常容易实现：总线模型保证了所有的内存访问是一致的，不必考虑由不同内存地址之前的差异。

之后的x86平台经历了一场从“拼频率”到“拼核心数”的转变，越来越多的核心被尽可能地塞进了同一块芯片上，各个核心对于内存带宽的争抢访问成为了瓶颈；此时软件、OS方面对于SMP多核心CPU的支持也愈发成熟；再加上各种商业上的考量，x86平台也顺水推舟的搞了NUMA（Non-uniform memory access, 非一致性内存访问）。

在这种架构之下，每个Socket都会有一个独立的内存控制器IMC（integrated memory controllers, 集成内存控制器），分属于不同的socket之内的IMC之间通过QPI link通讯。

然后就是进一步的架构演进，由于每个socket上都会有多个core进行内存访问，这就会在每个core的内部出现一个类似最早SMP架构相似的内存访问总线，这个总线被称为IMC bus。

于是，很明显的，在这种架构之下，两个socket各自管理1/2的内存插槽，如果要访问不属于本socket的内存则必须通过QPI link。也就是说内存的访问出现了本地/远程（local/remote）的概念，内存的延时是会有显著的区别的。

————————————————————————————————————————————————————————

回到当前世面上的CPU，工程上的实现其实更加复杂了。以Xeon 2699 v4系列CPU的标准来看，两个Socket之之间通过各自的一条9.6GT/s的QPI link互访。而每个Socket事实上有2个内存控制器。双通道的缘故，每个控制器又有两个内存通道（channel），每个通道最多支持3根内存条（DIMM）。理论上最大单socket支持76.8GB/s的内存带宽，而两个QPI link，每个QPI link有9.6GT/s的速率（~57.6GB/s）事实上QPI link已经出现瓶颈了。

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Linux提供了一个一个手工调优的命令numactl（默认不安装），在Centos7.0上的安装命令如下：

[root@realhost /]# yum -y install numactl
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
iso                                                                                                                                                                                    | 3.6 kB  00:00:00
Resolving Dependencies
--> Running transaction check
---> Package numactl.x86_64 0:2.0.12-3.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

==============================================================================================================================================================================================================
 Package                                          Arch                                            Version                                                  Repository                                    Size
==============================================================================================================================================================================================================
Installing:
 numactl                                          x86_64                                          2.0.12-3.el7                                             iso                                           65 k

Transaction Summary
==============================================================================================================================================================================================================
Install  1 Package

Total download size: 65 k
Installed size: 141 k
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : numactl-2.0.12-3.el7.x86_64                                                                                                                                                                1/1
  Verifying  : numactl-2.0.12-3.el7.x86_64                                                                                                                                                                1/1

Installed:
  numactl.x86_64 0:2.0.12-3.el7

Complete!
[root@realhost /]# numactl -H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 1023 MB
node 0 free: 147 MB
node distances:
node   0
  0:  10

可以看到，此系统共有1个node（因为是跑在windows上的虚拟机），共有1170M内存

这里假设我的虚拟机有多个node ，并且我要执行一个java param命令。最好的优化方案时python在node0中执行，而java在node1中执行，那命令是

numactl --cpubind=0 --membind=0 python param
numactl --cpubind=1 --membind=1 java param

numactl 命令详解

[root@realhost /]# numactl --help
numactl: unrecognized option '--help'
usage: numactl [--all | -a] [--interleave= | -i <nodes>] [--preferred= | -p <node>]
               [--physcpubind= | -C <cpus>] [--cpunodebind= | -N <nodes>]
               [--membind= | -m <nodes>] [--localalloc | -l] command args ...
       numactl [--show | -s]
       numactl [--hardware | -H]
       numactl [--length | -l <length>] [--offset | -o <offset>] [--shmmode | -M <shmmode>]
               [--strict | -t]
               [--shmid | -I <id>] --shm | -S <shmkeyfile>
               [--shmid | -I <id>] --file | -f <tmpfsfile>
               [--huge | -u] [--touch | -T]
               memory policy | --dump | -d | --dump-nodes | -D

memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l
<nodes> is a comma delimited list of node numbers or A-B ranges or all.
Instead of a number a node can also be:
  netdev:DEV the node connected to network device DEV
  file:PATH  the node the block device of path is connected to
  ip:HOST    the node of the network device host routes through
  block:PATH the node of block device path
  pci:[seg:]bus:dev[:func] The node of a PCI device
<cpus> is a comma delimited list of cpu numbers or A-B ranges or all
all ranges can be inverted with !
all numbers and ranges can be made cpuset-relative with +
the old --cpubind argument is deprecated.
use --cpunodebind or --physcpubind instead
<length> can have g (GB), m (MB) or k (KB) suffixes

骗人布-

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
linux下的numa机制

NUMA(Non-UniformMemoryAccess)字面直译为“非一致性内存访问”，对于Linux内核来说最早出现在2.6.7版本上。这种特性对于当下大内存+多CPU为潮流的X86平台来说确实会有不少的性能提升，但相反的，如果配置不当的话，也是一个很大的坑。本文就从头开始说说Linux下关于CPU NUMA特性的配置和调优。在若干年前，对于x86架构的计算机，那时的内存控制器还没有整...
复制链接

扫一扫