Environment Variables impi

Environment Variables 

I_MPI_PIN 

Turn on/off process pinning.

Syntax

I_MPI_PIN=<arg>

Arguments 

<arg> 

Binary indicator

enable | yes | on | 1

Enable process pinning. This is the default value

disable | no | off | 0

Disable processes pinning

Description

Set this environment variable to turn off the process pinning feature of the Intel® MPI Library.

I_MPI_PIN_MODE 

Choose the pinning method.

Syntax

I_MPI_PIN_MODE=<pinmode>

Arguments

<pinmode> 

Choose the CPU pinning mode

mpd|pm 

Pin processes inside the process manager involved (Multipurpose Daemon*/MPD or Hydra*). This is the default value

lib

Pin processes inside the Intel MPI Library

Description

Set the I_MPI_PIN_MODE environment variable to choose the pinning method. This environment variable is valid only if theI_MPI_PIN environment variable is enabled.

Set the I_MPI_PIN_MODE environment variable tompd orpm to make the mpd daemon or the Hydra process launcher pin processes through system specific means, if they are available. The pinning is done before the MPI process launch. Therefore, it is possible to co-locate the process CPU and memory in this case. This pinning method has an advantage over a system with Non-Uniform Memory Architecture (NUMA) like SGI* Altix*. Under NUMA, a processor can access its own local memory faster than non-local memory.

Set the I_MPI_PIN_MODE environment variable tolib to make the Intel® MPI Library pin the processes. This mode does not offer the capability to co-locate the CPU and memory for a process.

I_MPI_PIN_PROCESSOR_LIST 

(I_MPI_PIN_PROCS)

Define a processor subset and the mapping rules for MPI processes within this subset.

Syntax

I_MPI_PIN_PROCESSOR_LIST=<value>    

The environment variable value has the following syntax forms:

1. <proclist>

2. [<procset>][:[grain=<grain>][,shift=<shift>]\[,preoffset=<preoffset>][,postoffset=<postoffset>]

3. [<procset>][:map=<map>]

Deprecated Syntax

I_MPI_PIN_PROCS=<proclist>

Note:

The postoffset keyword has offset alias.

Note:

The second form of the pinning procedure has three steps:

  1. Cyclic shift of the source processor list on preoffset*grain value.

  2. Round robin shift of the list derived on the first step onshift*grain value.

  3. Cyclic shift of the list derived on the second step on thepostoffset*grain value.

The resulting processor list is used for the consecutive mapping of MPI processes (i-th rank is mapped to the i-th list member).

Note:

The grain, shift, preoffset, and postoffset parameters have a unified definition style.

This environment variable is available for both Intel® and non-Intel microprocessors, but it may perform additional optimizations for Intel microprocessors than it performs for non-Intel microprocessors.

Arguments

<proclist>

A comma-separated list of logical processor numbers and/or ranges of processors. The process with the i-th rank is pinned to the i-th processor in the list. The number should not exceed the amount of processors on a node.

<l>

Processor with logical number <l>.

<l>-<m>

Range of processors with logical numbers from <l> to <m>

<k>,<l>-<m>

Processors <k>, as well as<l> through <m>.

 

<procset> 

Specify a processor subset based on the topological numeration.  The default value isallcores.

all

All logical processors. This subset is defined to be the number of CPUs on a node.

allcores

All cores (physical CPUs). This subset is defined to be the number of cores on a node.  This is the default value.

If Intel® Hyper-Threading Technology is disabled, allcores equals to all.

allsocks

All packages/sockets. This subset is defined to be the number of sockets on a node.

 

<map> 

The mapping pattern used for process placement.

bunch

The processes are mapped to sockets by #processes/#sockets processes. The processes of each socket portion are mapped as close as possible to each other in each socket.

scatter

The processes are mapped as remotely as possible so as not to share common resources: FSB, caches, core.

spread

The processes are mapped consecutively with the possibility not to share common resources.

 

<grain> 

Specify the pinning granularity cell for a defined <procset>. The minimal <grain> is a single element of the<procset>. The maximal grain is the number of<procset> elements in a socket. The <grain> value must be a multiple of the <procset> value. Otherwise, minimal grain is assumed. The default value is the minimal<grain>.

<shift>

Specify the granularity of the round robin scheduling shift of the cells for the<procset>.<shift> is measured in the defined <grain> units. The <shift> value must be positive integer. Otherwise, no shift is performed. The default value is no shift.

<preoffset> 

Specify the cyclic shift of the processor subset <procset> defined before the round robin shifting on the <preoffset> value. The value is measured in the defined <grain> units. The <preoffset> value must be non-negative integer. Otherwise, no shift is performed. The default value is no shift.

<postoffset> 

Specify the cyclic shift of the processor subset <procset> derived after round robin shifting on the <postoffset> value. The value is measured in the defined <grain> units. The <postoffset> value must be non-negative integer. Otherwise no shift is performed. The default value is no shift.

 

<n>

Specify an explicit value of the corresponding parameters previously mentioned.<n> is non-negative integer.

fine

Specify the minimal value of the corresponding parameter.

core

Specify the parameter value equal to the amount of the corresponding parameter units contained in one core.

cache1

Specify the parameter value equal to the amount of the corresponding parameter units that share an L1 cache.

cache2

Specify the parameter value equal to the amount of the corresponding parameter units that share an L2 cache.

cache3

Specify the parameter value equal to the amount of the corresponding parameter units that share an L3 cache.

cache

The largest value among cache1, cache2, and cache3

socket | sock

Specify the parameter value equal to the amount of the corresponding parameter units contained in one physical package/socket.

half | mid

Specify the parameter value equal to socket/2.

third

Specify the parameter value equal to socket/3.

quarter

Specify the parameter value equal to socket/4.

octavo

Specify the parameter value equal to socket/8.

Description

Set the I_MPI_PIN_PROCESSOR_LIST environment variable to define the processor placement. To avoid conflicts with differing shell versions, the environment variable value may need to be enclosed in quotes.

Note:

This environment variable is valid only if I_MPI_PIN is enabled.

The I_MPI_PIN_PROCESSOR_LIST environment variable has the following different syntax variants:

  • Explicit processor list. This comma-separated list is defined in terms of logical processor numbers. The relative node rank of a process is an index to the processor list such that the i-th process is pinned on i-th list member. This permits the definition of any process placement on the CPUs.

For example, process mapping for I_MPI_PIN_PROCESSOR_LIST=p0,p1,p2,...,pn is as follows: 

Rank on a node

0

1

2

...

n-1

N

Logical CPU

p0

p1

p2

...

pn-1

Pn

  • grain/shift/offset mapping. This method provides cyclic shift of a definedgrain along the processor list with steps equal to shift*grain and a single shift on offset*grain at the end. This shifting action is repeatedshift times.

For example: grain = 2 logical processors, shift = 3 grains, offset = 0.

Legend:

gray - MPI process grains

A) red - processor grains chosen on the 1st pass

B) cyan - processor grains chosen on the 2nd pass

C) green - processor grains chosen on the final 3rd pass

D) Final map table ordered by MPI ranks

A)

   0       1

 

 

    2       3

 

 

...

2n-2  2n-1

 

 

   0       1

    2       3

    4       5

    6       7

    8       9

   10     11

...

6n-6   6n-5

6n-4   6n-3

6n-2   6n-1

B)

   0       1

  2n    2n+1

 

    2       3

2n+2 2n+3

 

...

2n-2  2n-1

4n-2   4n-1

 

   0       1

    2       3

    4       5

    6       7

    8       9

   10     11

...

6n-6   6n-5

6n-4   6n-3

6n-2   6n-1

C)

   0       1

  2n    2n+1

  4n    4n+1

    2       3

2n+2 2n+3

4n+2 4n+3

...

2n-2  2n-1

4n-2   4n-1

6n-2   6n-1

   0       1

    2       3

    4       5

    6       7

    8       9

   10     11

...

6n-6   6n-5

6n-4   6n-3

6n-2   6n-1

D)

   0       1

    2       3

2n-2  2n-1

2n  2n+1

2n+2 2n+3

4n-2  4n-1

  4n    4n+1

4n+2 4n+3

6n-2   6n-1

   0       1

    6       7

6n-6  6n-5

  2     3

   8        9

6n-4  6n-3

    4       5

   10     11

6n-2   6n-1

  • Predefined mapping scenario. In this case popular process pinning schemes are defined as keywords selectable at runtime. There are two such scenarios:bunch and scatter.

In this case popular process pinning schemes are defined as keywords that are selectable at runtime. There are two such scenarios:bunch and scatter.

In the bunch scenario the processes are mapped proportionally to sockets as closely as possible. This makes sense for partial processor loading. In this case the number of processes is less than the number of processors.

In the scatter scenario the processes are mapped as remotely as possible so as not to share common resources: FSB, caches, cores. 

In the example below there are two sockets, four cores per socket, one logical CPU per core, and two cores per shared cache.

Legend:

gray - MPI processes

cyan - 1st socket processors

green - 2nd socket processors

Same color defines a processor pair sharing a cache 

  0

  1

   2

 

 

      3

     4

 

 

  0

  1

   2

  3

      4

     5

      6

     7

bunch scenario for 5 processes

 

  0

  4

   2

  6

 

      1

     5

     3

     7

  0

  1

   2

  3

      4

     5

     6

     7

scatter scenario for full loading 

Examples

To pin the processes to CPU0 and CPU3 on each node globally, use the following command:
$ mpirun -genv I_MPI_PIN_PROCESSOR_LIST 0,3 \ 

-n<# of processes> <executable> 

To pin the processes to different CPUs on each node individually (CPU0 and CPU3 on host1 and CPU0, CPU1 and CPU3 on host2), use the following command: 

$ mpirun -host host1 -env I_MPI_PIN_PROCESSOR_LIST 0,3 \

-n <# of processes> <executable> : \

-host host2 -env I_MPI_PIN_PROCESSOR_LIST 1,2,3 \

-n <# of processes> <executable>

To print extra debug information about process pinning, use the following command:
$ mpirun -genv I_MPI_DEBUG 4 -m -host host1 \ 

-env I_MPI_PIN_PROCESSOR_LIST 0,3 -n<# of processes> <executable> :\

-host host2 -env I_MPI_PIN_PROCESSOR_LIST 1,2,3 \ -n<# of processes> <executable> 

Note:

If the number of processes is greater than the number of CPUs used for pinning, the process list is wrapped around to the start of the processor list. 

I_MPI_PIN_PROCESSOR_EXCLUDE_LIST

Define a subset of logical processors.

Syntax

I_MPI_PIN_PROCESSOR_EXCLUDE_LIST=<proclist>

Arguments

<proclist>

A comma-separated list of logical processor numbers and/or ranges of processors.

<l>

Processor with logical number <l>.

<l>-<m>

Range of processors with logical numbers from <l> to <m>

<k>,<l>-<m>

Processors <k>, as well as<l> through <m>.

 

Description

Set this environment variable to define the logical processors Intel® MPI Library do not use for pining capability on the intended hosts. Logical processors are numbered as in/proc/cpuinfo.

I_MPI_PIN_CELL

Set this environment variable to define the pinning resolution granularity.I_MPI_PIN_CELL specifies the minimal processor cell allocated when an MPI process is running.

Syntax

I_MPI_PIN_CELL=<cell>

Arguments

<cell> 

Specify the resolution granularity

unit

Basic processor unit (logical CPU)

core

Physical processor core

Description

Set this environment variable to define the processor subset used when a process is running. You can choose from two scenarios:  

  • all possible CPUs in a system (unit value)  

  • all cores in a system (core value) 

The environment variable has effect on both pinning kinds:  

  • one-to-one pinning through the I_MPI_PIN_PROCESSOR_LIST environment variable  

  • one-to-many pinning through the I_MPI_PIN_DOMAIN environment variable 

The default value rules are:

  • If you use I_MPI_PIN_DOMAIN, then the cell granularity isunit

  • If you use I_MPI_PIN_PROCESSOR_LIST, then the following rules apply:

  • When the number of processes is greater than the number of cores, the cell granularity isunit.

  • When the number of processes is equal to or less than the number of cores, the cell granularity iscore.

Note:

The core value is not affected by the enabling/disabling of Hyper-threading technology in a system. 

I_MPI_PIN_RESPECT_CPUSET 

Respect the process affinity mask.

Syntax

I_MPI_PIN_RESPECT_CPUSET=<value>

Arguments 

<value> 

Binary indicator

enable | yes | on | 1

Respect the process affinity mask. This is the default value

disable | no | off | 0

Do not respect the process affinity mask

Description

If I_MPI_PIN_RESPECT_CPUSET=enable, the Hydra process launcher uses its process affinity mask on each intended host to determine logical processors for applying Intel MPI Library pinning capability.

If I_MPI_PIN_RESPECT_CPUSET=disable, the Hydra process launcher does not use its process affinity mask to determine logical processors for applying Intel MPI Library pinning capability.

I_MPI_PIN_RESPECT_HCA 

In the presence of Infiniband architecture* host channel adapter (IBA* HCA),  adjust the pinning according to the location of IBA HCA.

Syntax

I_MPI_PIN_RESPECT_HCA=<value>

Arguments 

<value> 

Binary indicator

enable | yes | on | 1

Use the location of IBA HCA (if available). This is the default value

disable | no | off | 0

Do not use the location of IBA HCA

Description

If I_MPI_PIN_RESPECT_HCA=enable, the Hydra process launcher uses the location of IBA HCA on each intended host for applying Intel MPI Library pinning capability.

If I_MPI_PIN_RESPECT_CPUSET=disable, the Hydra process launcher does not use the location of IBA HCA on each intended host for applying Intel MPI Library pinning capability.


http://software.intel.com/sites/products/documentation/hpc/ics/impi/41/lin/Reference_Manual/Environment_Variables_Process_Pinning.htm

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值