OpenMP + MPI 编程 (五)NUMA Operations&using numactl&SMP Nodes&SMP Sockets

NUMA Operations

If memory were completely uniform, there would be no need to worry about questions like “where do processes go?”. Only for NUMA is the placement of processes/threads and allocated memory (NUMA control) of any importance.

The default NUMA control is set through policy. The policy is applied whenever a process is executed, or a thread is forked, or memory is allocated. These are all events that are directed from within the kernel.

NUMA control is managed by the kernel.
NUMA control can be changed with numactl.

Process Affinity and Memory Policy

One would like to set the affinity of a process for a certain socket or core, and the allocation of data in memory relative to a socket or core.

  • Individual users can alter kernel policies
    (setting Process Affinity and Memory Policy == PAMPer)
    users can PAMPer their own processes
    root can PAMPer any process
    careful, libraries may PAMPer, too!

  • Means by which Process Affinity and Memory Policy can be changed:
    dynamically on a running process (knowing process id)
    at start of process execution (with wrapper command)
    within program through F90/C API

Using numactl, at the Process Level

Using the command:
numactl <option socket(s)/core(s)> ./a.out

Quick guide to numactl:
懒得抄

SMP Nodes

Hybrid batch script for 16 threads/node

  • Specify total MPI tasks to be started by batch (-n )
    Specify total nodes equal to tasks (-N )
    Set number of threads for each process
    PAMPering at job level
    controls behavior (e.g., process-core affinity) for ALL tasks
    no simple/standard way to control thread-core affinity with numactl
...
#SBATCH -n 10 -N 10
...
setenv OMP_NUM_THREADS 16
ibrun numactl -i all ./a.out

SMP Sockets

Hybrid batch script for 2 tasks/node, 8 threads/task:
Example script uses 4 nodes

  • Specify total MPI tasks to be started by batch
    Specify total nodes equal to tasks/2 (so 2 tasks/node)
    Set number of threads for each process
    PAMPering at process level, must invoke a script to manage affinity
    tacc_affinity script pins tasks to sockets, ensures local memory allocation
    if tacc_affinity isn’t quite right for your application, use it as a starting point
...
#SBATCH -n 8
#SBATCH -N 4
...
setenv OMP_NUM_THREADS 8
ibrun tacc_affinity ./a.out

What does tacc_affinity do?

It works similarly to the following script, which extracts global and local MPI rank, sets the numactl options per process, etc.

Ranks on Stampede are always assigned sequentially, node by node
Scheduler distributes tasks as evenly as possible across nodes
This example pertains to MVAPICH2; for Intel MPI, I_MPI_PIN_ variables would have to be parsed

#!/bin/bash
export MV2_USE_AFFINITY=0
export MV2_ENABLE_AFFINITY=0
#LocalRank, LocalSize, Socket

LR=$MV2_COMM_WORLD_LOCAL_RANK

LS=$MV2_COMM_WORLD_LOCAL_SIZE
SK = $(( 2*$LR/$LS ))
[ ! $SK ] && echo SK null!
[ ! $SK ] && exit 1
numactl -N $SK -m $SK ./a.out
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值