ubuntu 18.04 Horovod的安装和使用

目录

0 安装horovod所需要的g++版本

1).修改源

2).打开的文件最后添加如下两行

3).更新源

4).安装,并更新

5). 调整g++版本

6). 验证版本

需要注意:

1. 安装NCCL

方法1:

方法2: 下载nccl_2.4.8-1+cuda10.0_x86_64.txz(如下链接,需要登录nividia),解压后移动到/usr/local/下:

在/etc/profile添加环境变量:

2. 安装Openmpi

下载源码openmpi-4.0.2.tar.gz

编译OpenMPI

查询版本:

3. 安装horovod

安装

测试


 

参考:

https://github.com/horovod/horovod/blob/master/docs/install.rst 

 

0 安装horovod所需要的g++版本

安装时可能出现g++版本过高的问题,因此需要进行版本切换: 验证ubuntu 18.04 对应 g++ gcc 5版本可用 (但docker中4.8可用)

直接安装可能报错,按如下步骤操作:

1).修改源

sudo gedit /etc/apt/sources.list

2).打开的文件最后添加如下两行

deb http://dk.archive.ubuntu.com/ubuntu/ xenial main

deb http://dk.archive.ubuntu.com/ubuntu/ xenial universe

3).更新源

sudo apt update

4).安装,并更新

sudo apt-get install gcc-4.9
sudo apt-get install g++-4.9

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 20

sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 20

此时,终端输入gcc --version发现默认版本仍然是未改变,需要更改为4.9,

5). 调整g++版本

通过下面的指令来选择不同的gcc和g++的版本

> sudo update-alternatives --config gcc
There are 2 choices for the alternative gcc (providing /usr/bin/gcc).

  Selection    Path              Priority   Status
------------------------------------------------------------
  0            /usr/bin/gcc-5     50        auto mode
* 1            /usr/bin/gcc-4.9   20        manual mode
  2            /usr/bin/gcc-5     50        manual mode

Press <enter> to keep the current choice[*], or type selection number: 1




> sudo update-alternatives --config g++
There are 2 choices for the alternative g++ (providing /usr/bin/g++).

  Selection    Path              Priority   Status
------------------------------------------------------------
* 0            /usr/bin/g++-5     50        auto mode
  1            /usr/bin/g++-4.9   20        manual mode
  2            /usr/bin/g++-5     50        manual mode

Press <enter> to keep the current choice[*], or type selection number: 1
update-alternatives: using /usr/bin/g++-4.9 to provide /usr/bin/g++ (g++) in manual mode



6). 验证版本

#查看当前系统版本
gcc -v
g++ -v   

需要注意:

#当切换使用了其他版本的gcc时,请务必保持g++的版本和gcc版本的一致性,  否则用cmake配置出来的项目遇到c++代码还是会用之前版本的gcc   

# 删除选项操作
    

sudo update-alternatives --remove gcc /usr/bin/gcc-4.9

1. 安装NCCL

方法1:

先下载库文件 https://developer.nvidia.com/nccl/nccl-legacy-downloads

2.4.8-1+cuda10.0是对应的nccl版本

# option 1: local
sudo dpkg -i nccl-repo-ubuntu1804-2.4.8-ga-cuda10.0_1-1_amd64

# option 2: network
sudo dpkg -i nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

sudo apt-get update
sudo apt install libnccl2=2.4.8-1+cuda10.0 libnccl-dev=2.4.8-1+cuda10.0

方法2: 下载nccl_2.4.8-1+cuda10.0_x86_64.txz(如下链接,需要登录nividia),解压后移动到/usr/local/下:

https://developer.nvidia.com/nccl/nccl-legacy-downloads

tar xvf nccl_2.4.8-1+cuda10.0_x86_64.txz
mv nccl_2.4.8-1+cuda10.0_x86_64 /usr/local/nccl_2.4.8

在/etc/profile添加环境变量:

export LD_LIBRARY_PATH=/usr/local/nccl_2.4.8/lib:$LD_LIBRARY_PATH

2. 安装Openmpi

参考: https://www.open-mpi.org/faq/?category=building#easy-build

下载源码openmpi-4.0.2.tar.gz

https://www.open-mpi.org/software/ompi/v4.0/ 

编译OpenMPI


$ gunzip -c openmpi-4.0.2.tar.gz | tar xf -
$ cd openmpi-4.0.2
$ ./configure --prefix=/usr/local


<...lots of output...>



$ make all install

解压编译,不出现error即编译成功

查询版本:

mpiexec --version
mpirun --version

 

3. 安装horovod

参考 https://github.com/horovod/horovod/blob/master/docs/install.rst

https://github.com/horovod/horovod/blob/master/docs/gpus.rst

 注意所需要的g++版本

安装

使用pip来安装,需要激活某框架,如tensorflow

> ldconfig /usr/local/cuda/targets/x86_64-linux/lib/stubs

> HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 pip install --no-cache-dir horovod -i https://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com

> ldconfig

测试

import horovod.torch as hvd

import horovod.tensorflow as hvd

(ref: https://zhuanlan.zhihu.com/p/78303865 )

https://github.com/horovod/horovod/tree/master/examples --- tensorflow_mnist.py

import os
import errno
import tensorflow as tf
import horovod.tensorflow as hvd
import numpy as np
import argparse

from tensorflow import keras

layers = tf.layers

tf.logging.set_verbosity(tf.logging.INFO)

# Training settings
parser = argparse.ArgumentParser(description='Tensorflow MNIST Example')
parser.add_argument('--use-adasum', action='store_true', default=False,
                    help='use adasum algorithm to do reduction')
args = parser.parse_args()

def conv_model(feature, target, mode):
    """2-layer convolution model."""
    # Convert the target to a one-hot tensor of shape (batch_size, 10) and
    # with a on-value of 1 for each one-hot vector of length 10.
    target = tf.one_hot(tf.cast(target, tf.int32), 10, 1, 0)

    # Reshape feature to 4d tensor with 2nd and 3rd dimensions being
    # image width and height final dimension being the number of color channels.
    feature = tf.reshape(feature, [-1, 28, 28, 1])

    # First conv layer will compute 32 features for each 5x5 patch
    with tf.variable_scope('conv_layer1'):
        h_conv1 = layers.conv2d(feature, 32, kernel_size=[5, 5],
                                activation=tf.nn.relu, padding="SAME")
        h_pool1 = tf.nn.max_pool(
            h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

    # Second conv layer will compute 64 features for each 5x5 patch.
    with tf.variable_scope('conv_layer2'):
        h_conv2 = layers.conv2d(h_pool1, 64, kernel_size=[5, 5],
                                activation=tf.nn.relu, padding="SAME")
        h_pool2 = tf.nn.max_pool(
            h_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
        # reshape tensor into a batch of vectors
        h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])

    # Densely connected layer with 1024 neurons.
    h_fc1 = layers.dropout(
        layers.dense(h_pool2_flat, 1024, activation=tf.nn.relu),
        rate=0.5, training=mode == tf.estimator.ModeKeys.TRAIN)

    # Compute logits (1 per class) and compute loss.
    logits = layers.dense(h_fc1, 10, activation=None)
    loss = tf.losses.softmax_cross_entropy(target, logits)

    return tf.argmax(logits, 1), loss


def train_input_generator(x_train, y_train, batch_size=64):
    assert len(x_train) == len(y_train)
    while True:
        p = np.random.permutation(len(x_train))
        x_train, y_train = x_train[p], y_train[p]
        index = 0
        while index <= len(x_train) - batch_size:
            yield x_train[index:index + batch_size], \
                  y_train[index:index + batch_size],
            index += batch_size


def main(_):
    # Horovod: initialize Horovod.
    hvd.init()

    # Keras automatically creates a cache directory in ~/.keras/datasets for
    # storing the downloaded MNIST data. This creates a race
    # condition among the workers that share the same filesystem. If the
    # directory already exists by the time this worker gets around to creating
    # it, ignore the resulting exception and continue.
    cache_dir = os.path.join(os.path.expanduser('~'), '.keras', 'datasets')
    if not os.path.exists(cache_dir):
        try:
            os.mkdir(cache_dir)
        except OSError as e:
            if e.errno == errno.EEXIST and os.path.isdir(cache_dir):
                pass
            else:
                raise

    # Download and load MNIST dataset.
    (x_train, y_train), (x_test, y_test) = \
        keras.datasets.mnist.load_data('MNIST-data-%d' % hvd.rank())

    # The shape of downloaded data is (-1, 28, 28), hence we need to reshape it
    # into (-1, 784) to feed into our network. Also, need to normalize the
    # features between 0 and 1.
    x_train = np.reshape(x_train, (-1, 784)) / 255.0
    x_test = np.reshape(x_test, (-1, 784)) / 255.0

    # Build model...
    with tf.name_scope('input'):
        image = tf.placeholder(tf.float32, [None, 784], name='image')
        label = tf.placeholder(tf.float32, [None], name='label')
    predict, loss = conv_model(image, label, tf.estimator.ModeKeys.TRAIN)

    lr_scaler = hvd.size()
    # By default, Adasum doesn't need scaling when increasing batch size. If used with NCCL,
    # scale lr by local_size
    if args.use_adasum:
        lr_scaler = hvd.local_size() if hvd.nccl_built() else 1

    # Horovod: adjust learning rate based on lr_scaler.
    opt = tf.train.AdamOptimizer(0.001 * lr_scaler)

    # Horovod: add Horovod Distributed Optimizer.
    opt = hvd.DistributedOptimizer(opt, op=hvd.Adasum if args.use_adasum else hvd.Average)

    global_step = tf.train.get_or_create_global_step()
    train_op = opt.minimize(loss, global_step=global_step)

    hooks = [
        # Horovod: BroadcastGlobalVariablesHook broadcasts initial variable states
        # from rank 0 to all other processes. This is necessary to ensure consistent
        # initialization of all workers when training is started with random weights
        # or restored from a checkpoint.
        hvd.BroadcastGlobalVariablesHook(0),

        # Horovod: adjust number of steps based on number of GPUs.
        tf.train.StopAtStepHook(last_step=20000 // hvd.size()),

        tf.train.LoggingTensorHook(tensors={'step': global_step, 'loss': loss},
                                   every_n_iter=10),
    ]

    # Horovod: pin GPU to be used to process local rank (one GPU per process)
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    config.gpu_options.visible_device_list = str(hvd.local_rank())

    # Horovod: save checkpoints only on worker 0 to prevent other workers from
    # corrupting them.
    checkpoint_dir = './checkpoints' if hvd.rank() == 0 else None
    training_batch_generator = train_input_generator(x_train,
                                                     y_train, batch_size=100)
    # The MonitoredTrainingSession takes care of session initialization,
    # restoring from a checkpoint, saving to a checkpoint, and closing when done
    # or an error occurs.
    with tf.train.MonitoredTrainingSession(checkpoint_dir=checkpoint_dir,
                                           hooks=hooks,
                                           config=config) as mon_sess:
        while not mon_sess.should_stop():
            # Run a training step synchronously.
            image_, label_ = next(training_batch_generator)
            mon_sess.run(train_op, feed_dict={image: image_, label: label_})


if __name__ == "__main__":
    tf.app.run()
  1. hvd.init() 初始化 Horovod,启动相关线程和MPI线程。
  2. config.gpu_options.visible_device_list = str(hvd.local_rank())为不同的进程分配不同的GPU。
  3. opt = tf.train.AdagradOptimizer(0.01 * hvd.size()) 根据Worker的数量增加学习率的大小。
  4. opt=hvd.DistributedOptimizer(opt) 把常规TensorFlow Optimizer通过Horovod包起来,进而使用 ring-allreduce 来得到平均梯度。
  5. hvd.BroadcastGlobalVariablesHook(0) 将模型的参数从第一个设备传向其他设备,以保证初始化模型参数的一致性。
  6. tf.train.MonitoredTrainingSession if hvd.rank() != 0 设置只有设备0需要保存模型参数。

运行方式: https://github.com/horovod/horovod#usage

** 运行在一台机器的4个GPUs上(根据自己电脑设备指定IP): 

mpirun -np 4 -H ip-1:4 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml ob1 -mca btl ^openib python tensorflow_mnist.py

** 运行在四台机器的4个GPUs上(根据自己电脑设备指定IP):

mpirun -np 16 -H ip-1:4,ip-2:4,ip-3:4,ip-4:4 -bind-to none -map-by slot -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml ob1 -mca btl ^openib python tensorflow_mnist.py

 

  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
您好!在Ubuntu 18.04安装CUDA和cuDNN需要以下步骤: 1. 安装NVIDIA显卡驱动: 首先,您需要安装适用于您的显卡型号的NVIDIA驱动程序。您可以通过以下命令来安装: ```shell sudo ubuntu-drivers autoinstall ``` 这将自动检测并安装适合您显卡型号的驱动程序。 2. 下载并安装CUDA: 访问NVIDIA的官方网站,并下载适用于Ubuntu 18.04的CUDA安装包。您可以在https://developer.nvidia.com/cuda-downloads 上找到适合您系统的CUDA版本。选择合适的版本并下载安装包。 安装CUDA之前,您需要先关闭图形界面。按Ctrl+Alt+F1切换到TTY模式,然后在终端中输入以下命令以停止图形界面服务: ```shell sudo service lightdm stop ``` 运行安装包并按照提示进行安装: ```shell sudo sh cuda_*.run ``` 安装完成后,重新启动系统。 3. 配置环境变量: 将CUDA的bin和lib目录添加到系统的环境变量中。编辑`~/.bashrc`文件,添加以下行: ```shell export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH ``` 保存文件并执行以下命令使配置生效: ```shell source ~/.bashrc ``` 4. 下载并安装cuDNN: 访问NVIDIA的官方网站,并下载适用于Ubuntu 18.04的cuDNN安装包。您需要注册为NVIDIA开发者才能下载cuDNN。 下载完成后,解压缩安装包并进入解压缩后的目录: ```shell tar -xzvf cudnn-*.tgz cd cuda ``` 将cuDNN的库文件复制到CUDA的安装目录中: ```shell sudo cp -P include/cudnn.h /usr/local/cuda/include sudo cp -P lib64/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn* ``` 5. 完成! 现在您已经成功安装了CUDA和cuDNN。您可以编译并运行依赖于CUDA的程序。 请注意,上述步骤仅适用于Ubuntu 18.04和支持NVIDIA GPU的系统。确保您的显卡型号与CUDA和cuDNN的要求相匹配。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值