TensorLayer官方中文文档1.7.4:CLI – 命令行界面


所属分类: TensorLayer

CLI - 命令行界面

The tensorlayer.cli module provides a command-line tool for some common tasks.

tl train

(Alpha release - usage might change later)

The tensorlayer.cli.train module provides the tl train subcommand.
It helps the user bootstrap a TensorFlow/TensorLayer program for distributed training
using multiple GPU cards or CPUs on a computer.

You need to first setup the CUDA_VISIBLE_DEVICES
to tell tl train which GPUs are available. If the CUDA_VISIBLE_DEVICES is not given,
tl train would try best to discover all available GPUs.

In distribute training, each TensorFlow program needs a TF_CONFIG environment variable to describe
the cluster. It also needs a master daemon to
monitor all trainers. tl train is responsible
for automatically managing these two tasks.

Usage

tl train [-h] [-p NUM_PSS] [-c CPU_TRAINERS] <file> [args [args ...]]

# example of using GPU 0 and 1 for training mnist
CUDA_VISIBLE_DEVICES="0,1"
tl train example/tutorial_mnist_distributed.py

# example of using CPU trainers for inception v3
tl train -c 16 example/tutorial_imagenet_inceptionV3_distributed.py

# example of using GPU trainers for inception v3 with customized arguments
# as CUDA_VISIBLE_DEVICES is not given, tl would try to discover all available GPUs
tl train example/tutorial_imagenet_inceptionV3_distributed.py -- --batch_size 16

Parameters

  • file: python file path.

  • NUM_PSS : The number of parameter servers.

  • CPU_TRAINERS: The number of CPU trainers.

    It is recommended that NUM_PSS + CPU_TRAINERS <= cpu count

  • args: Any parameter after -- would be passed to the python program.

Notes

A parallel training program would require multiple parameter servers
to help parallel trainers to exchange intermediate gradients.
The best number of parameter servers is often proportional to the
size of your model as well as the number of CPUs available.
You can control the number of parameter servers using the -p parameter.

If you have a single computer with massive CPUs, you can use the -c parameter
to enable CPU-only parallel training.
The reason we are not supporting GPU-CPU co-training is because GPU and
CPU are running at different speeds. Using them together in training would
incur stragglers.

艾伯特(http://www.aibbt.com/)国内第一家人工智能门户
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值