直接google翻译的,能让人对horovodrun命令参数有大致的了解:
usage: horovodrun [-h] [-v] -np NP [-cb] [-p SSH_PORT] [--disable-cache]
[--start-timeout START_TIMEOUT] [--verbose]
[--config-file CONFIG_FILE]
[--fusion-threshold-mb FUSION_THRESHOLD_MB]
[--cycle-time-ms CYCLE_TIME_MS]
[--cache-capacity CACHE_CAPACITY]
[--hierarchical-allreduce | --no-hierarchical-allreduce]
[--hierarchical-allgather | --no-hierarchical-allgather]
[--autotune] [--autotune-log-file AUTOTUNE_LOG_FILE]
[--autotune-warmup-samples AUTOTUNE_WARMUP_SAMPLES]
[--autotune-steps-per-sample AUTOTUNE_STEPS_PER_SAMPLE]
[--autotune-bayes-opt-max-samples AUTOTUNE_BAYES_OPT_MAX_SAMPLES]
[--autotune-gaussian-process-noise AUTOTUNE_GAUSSIAN_PROCESS_NOISE]
[--timeline-filename TIMELINE_FILENAME]
[--timeline-mark-cycles] [--no-stall-check]
[--stall-check-warning-time-seconds STALL_CHECK_WARNING_TIME_SECONDS]
[--stall-check-shutdown-time-seconds STALL_CHECK_SHUTDOWN_TIME_SECONDS]
[--mpi-threads-disable]
[--num-nccl-streams NUM_NCCL_STREAMS]
[--mlsl-bgt-affinity MLSL_BGT_AFFINITY]
[--log-level {TRACE,DEBUG,INFO,WARNING,ERROR,FATAL}]
[--log-hide-timestamp] [-H HOSTS | -hostfile HOSTFILE]
[--gloo | --mpi]
...
Horovod Runner
positional arguments:
command Command to be executed.
optional arguments:
-h, --help show this help message and exit
# 查看版本
-v, --version Shows Horovod version.
# 训练进程总数
-np NP, --num-proc NP Total number of training processes.
# 显示在Horovod中内置了哪些框架和库
-cb, --check-build Shows which frameworks and libraries have been built into Horovod.
# 指定端口
-p SSH_PORT, --ssh-port SSH_PORT SSH port on all the hosts.
# 如果未设置该标志,则horovodrun将每60分钟执行一次初始化检查-如果检查成功通过。否则,每次调用horovodrun时,所有检查都将运行。
--disable-cache If the flag is not set, horovodrun will perform the
initialization checks only once every 60 minutes -- if
the checks successfully pass. Otherwise, all the
checks will run every time horovodrun is called.
# Horovodrun必须执行所有检查并在指定的超时之前启动进程。默认值为30秒。或者,环境变量HOROVOD_START_TIMEOUT也可以用于指定初始化超时。
--start-timeout START_TIMEOUT
Horovodrun has to perform all the checks and start the
processes before the specified timeout. The default
value is 30 seconds. Alternatively, The environment
variable HOROVOD_START_TIMEOUT can also be used to
specify the initialization timeout.
# 如果设置了此标志,则将打印其他消息。
--verbose If this flag is set, extra messages will printed.
# 包含Horovod的运行时参数配置的YAML文件的路径。请注意,这将覆盖此参数之前提供的所有命令行参数,并被其后的所有参数覆盖。
--config-file CONFIG_FILE
Path to YAML file containing runtime parameter
configuration for Horovod. Note that this will
override any command line arguments provided before
this argument, and will be overridden by any arguments
that come after it.
# 可调参数
tuneable parameter arguments:
# 融合缓冲区阈值(以MB为单位)。这是在allreduce / allgather期间可以融合在一起成为一个批处理的最大张量数据。设置为0将禁用张量融合。 (默认值:64)
--fusion-threshold-mb FUSION_THRESHOLD_MB
Fusion buffer threshold in MB. This is the maximum
amount of tensor data that can be fused together into
a single batch during allreduce / allgather. Setting 0
disables tensor fusion. (default: 64)
# 循环时间(毫秒)。这是每个张量融合周期之间的延迟。周期时间越长,批处理越多,但每次减少/聚集操作之间的等待时间就越大。 (默认值:5
--cycle-time-ms CYCLE_TIME_MS
Cycle time in ms. This is the delay between each
tensor fusion cycle. The larger the cycle time, the
more batching, but the greater latency between each
allreduce / allgather operations. (default: 5
# 在执行allreduce / allgather之前将要缓存的张量名称的最大数量,以减少工作进程之间所需的协调量。 (默认值:1024
--cache-capacity CACHE_CAPACITY
Maximum number of tensor names that will be cached to
reduce amount of coordination required between workers
before performing allreduce / allgather. (default:
1024
# 在工作进程之间执行分层缩减,而不是环缩减。分层降序在主机内执行本地降序/聚集,然后在跨工人的相等本地级别之间进行并行交叉降序,最后进行本地聚集。
--hierarchical-allreduce
Perform hierarchical allreduce between workers instead
of ring allreduce. Hierarchical allreduce performs a
local allreduce / gather within a host, then a
parallel cross allreduce between equal local ranks
across workers, and finally a local gather.
# 明确禁用分层Allreduce,以防止自动调整。???
--no-hierarchical-allreduce
Explicitly disable hierarchical allreduce to prevent
autotuning from adjusting it.
# 在工作进程之间执行分层聚集,而不是环形聚集。有关算法的详细信息,请参见分层缩减。
--hierarchical-allgather
Perform hierarchical allgather between workers instead
of ring allgather. See hierarchical allreduce for
algorithm details.
# 明确禁用分层聚集,以防止自动调整。???
--no-hierarchical-allgather
Explicitly disable hierarchical allgather to prevent
autotuning from adjusting it.
# 自动调节参数
autotune arguments:
# 执行自动调整以选择参数参数值,以最大程度地提高allreduce / allgather的吞吐量。在调试过程中,任何显式设置的参数将保持不变。
--autotune Perform autotuning to select parameter argument values
that maximimize throughput for allreduce / allgather.
Any parameter explicitly set will be held constant
during tuning.
# 自动调节参数日志记录
--autotune-log-file AUTOTUNE_LOG_FILE
Comma-separated log of trials containing each
hyperparameter and the score of the trial. The last
row will always contain the best value found.
# 自动调参预热过程丢弃的样本数默认为3
--autotune-warmup-samples AUTOTUNE_WARMUP_SAMPLES
Number of samples to discard before beginning the
optimization process during autotuning. Performance
during the first few batches can be affected by
initialization and cache warmups. (default: 3)
#
--autotune-steps-per-sample AUTOTUNE_STEPS_PER_SAMPLE
Number of steps (approximate) to record before
observing a sample. The sample score is defined to be
the median score over all batches within the sample.
The more batches per sample, the less variance in
sample scores, but the longer autotuning will take.
(default: 10)
--autotune-bayes-opt-max-samples AUTOTUNE_BAYES_OPT_MAX_SAMPLES
Maximum number of samples to collect for each Bayesian
optimization process. (default: 20)
--autotune-gaussian-process-noise AUTOTUNE_GAUSSIAN_PROCESS_NOISE
Regularization value [0, 1] applied to account for
noise in samples. (default: 0.8)
# 时间轴参数
timeline arguments: # 用于记录时间线事件
--timeline-filename TIMELINE_FILENAME
JSON file containing timeline of Horovod events used
for debugging performance. If this is provided,
timeline events will be recorded, which can have a
negative impact on training performance.
--timeline-mark-cycles
Mark cycles on the timeline. Only enabled if the
timeline filename is provided.
# 失速检查参数
stall check arguments:
# 禁用失速检查
--no-stall-check Disable the stall check. The stall check will log a
warning when workers have stalled waiting for other
ranks to submit tensors.
# 失速警告记录到stderr的秒数阈值(默认值:60)
--stall-check-warning-time-seconds STALL_CHECK_WARNING_TIME_SECONDS
Seconds until the stall warning is logged to stderr.
(default: 60)
--stall-check-shutdown-time-seconds STALL_CHECK_SHUTDOWN_TIME_SECONDS
Seconds until Horovod is shutdown due to stall.
Shutdown will only take place if this value is greater
than the warning time. (default: 0)
# 库参数
library arguments:
# 禁用PMI,仅在PMI模式下使用,在一些情况下多线程MPI会降低其它组件的速度,如果你希望在horovod上跑mpi4pyname则需要这个选项。
--mpi-threads-disable
Disable MPI threading support. Only applies when
running in MPI mode. In some cases, multi-threaded MPI
can slow down other components, but is necessary if
you wish to run mpi4py on top of Horovod.
# NCCL流的数量。仅在具有NCCL支持的情况下运行时适用。 (默认值:1)
--num-nccl-streams NUM_NCCL_STREAMS
Number of NCCL streams. Only applies when running with NCCL support. (default: 1)
# MLSL背景线程亲和力。仅在带有MLSL支持的情况下运行时适用。 (默认值:0)
--mlsl-bgt-affinity MLSL_BGT_AFFINITY
MLSL background thread affinity. Only applies when running with MLSL support. (default: 0)
# 日志参数
logging arguments:
# 日志6个等级
--log-level {TRACE,DEBUG,INFO,WARNING,ERROR,FATAL}
Minimum level to log to stderr from the Horovod backend. (default: WARNING).
# 隐藏Horovod日志消息中的时间戳
--log-hide-timestamp Hide the timestamp from Horovod log messages.
host arguments:
# 主机名列表和每个主机上正在运行的进程的可用插槽数,格式为:<hostname>:<slots>(例如:host1:2,host2:4,host3:1指示2个进程可以在host1、4上运行在host2上,在host3上为1)。如果未指定,则默认使用localhost:<np>
-H HOSTS, --hosts HOSTS
List of host names and the number of available slots
for running processes on each, of the form:
<hostname>:<slots> (e.g.: host1:2,host2:4,host3:1
indicating 2 processes can run on host1, 4 on host2,
and 1 on host3). If not specified, defaults to using
localhost:<np>
# 包含主机名列表和可用插槽数的主机文件的路径。文件的每一行都必须采用以下格式:<hostname> slot = <slots>
-hostfile HOSTFILE, --hostfile HOSTFILE
Path to a host file containing the list of host names
and the number of available slots. Each line of the
file must be of the form: <hostname> slots=<slots>
# 控制器参数
controller arguments:
# 使用Gloo控制器运行Horovod。 如果Horovod不是在MPI支持下构建的,则这将是默认设置。
--gloo Run Horovod using the Gloo controller. This will be
the default if Horovod was not built with MPI support.
# 使用MPI控制器运行Horovod。 如果Horovod是在MPI支持下构建的,则这将是默认设置。
--mpi Run Horovod using the MPI controller. This will be the
default if Horovod was built with MPI support.
显示在Horovod中内置了哪些框架和库:
root@1deac373611b:/DistributedTrain/horovod# horovodrun -cb
Horovod v0.18.1:
Available Frameworks:
[X] TensorFlow
[X] PyTorch
[X] MXNet
Available Controllers:
[X] MPI
[X] Gloo
Available Tensor Operations:
[X] NCCL
[ ] DDL
[ ] MLSL
[X] MPI
[X] Gloo