linux nvidia 361.run,Nvidia-Docker

简述

由于容器技术的优势,其应用越发广泛,并且传统虚拟化技术正逐步向容器进行适配,比如将SR-IOV(Single-Root Input/Output

Virtualization)应用于容器,Intel的实验[1]表明网络和存储的性能几乎能接近物理设备。同时近些年GPU (Graphics Processing

Unit)在高性能计算,云桌面等领域不断革新。GPU密集型的应用程序开发、调试和使用,环境比较多样且版本依赖程度高。而借助容器技术在CI/CD方面的优势,容器化的GPU应用程序将带来以下好处,NVIDIA Docker简化了这些繁锁的工作,本文将初步认识和简单实践nvidia-docker[2]。

Benefits of GPU

containerization:

Reproducible

builds

Ease of

deployment

Isolation of

individual devices

Run across

heterogeneous driver/toolkit environments

Requires only the

NVIDIA driver to be installed

Enables "fire and

forget" GPU applications

Facilitate

collaboration

Example of how CUDA

integrates with Docker.

a4c26d1e5885305701be709a3d33442f.png

实验环境

系统配置

操作系统为CentOS

[root@localhost ~]#

cat /etc/redhat-release

CentOS Linux

release 7.2.1511 (Core)

[root@localhost ~]#

uname -a

Linux

localhost.localdomain 3.10.0-327.13.1.el7.x86_64 #1 SMP Thu Mar 31

16:04:38 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

[root@localhost ~]#

yum update -y

[root@localhost ~]#

yum install -y  wget tmux vim git pciutils

kernel-devel kernel-headers gcc make epel-release

GPU详情

[root@localhost ~]#

lspci | grep VGA

03:00.0 VGA

compatible controller: NVIDIA Corporation GK104GL [Quadro K4200]

(rev a1)

04:00.0 VGA

compatible controller: NVIDIA Corporation GF110GL [Tesla C2050 /

C2075] (rev a1)

[root@localhost ~]#

lspci -v -s 03:00.0

03:00.0 VGA

compatible controller: NVIDIA Corporation GK104GL [Quadro K4200]

(rev a1) (prog-if 00 [VGA controller])

Subsystem: NVIDIA Corporation Device 1096

Physical Slot: 2

Flags: bus master, fast devsel, latency 0, IRQ 11

Memory at fa000000 (32-bit, non-prefetchable) [size=16M]

Memory at d0000000 (64-bit, prefetchable) [size=256M]

Memory at e0000000 (64-bit, prefetchable) [size=32M]

I/O ports at d000 [size=128]

Expansion ROM at fb000000 [disabled] [size=512K]

Capabilities: [60] Power Management version 3

Capabilities: [68] MSI: Enable- Count=1/1 Maskable-

64bit+

Capabilities: [78] Express Endpoint, MSI 00

Capabilities: [b4] Vendor Specific Information: Len=14

Capabilities: [100] Virtual Channel

Capabilities: [128] Power Budgeting

Capabilities: [420] Advanced Error Reporting

Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1

Len=024

a4c26d1e5885305701be709a3d33442f.png

[root@localhost ~]#

lspci -v -s 04:00.0

04:00.0 VGA

compatible controller: NVIDIA Corporation GF110GL [Tesla C2050 /

C2075] (rev a1) (prog-if 00 [VGA controller])

Subsystem: NVIDIA Corporation Tesla C2075

Physical Slot: 4

Flags: fast devsel, IRQ 11

Memory at f8000000 (32-bit, non-prefetchable) [disabled]

[size=16M]

Memory at e8000000 (64-bit, prefetchable) [disabled]

[size=128M]

Memory at f0000000 (64-bit, prefetchable) [disabled]

[size=32M]

I/O ports at c000 [disabled] [size=128]

Expansion ROM at f9000000 [disabled] [size=512K]

Capabilities: [60] Power Management version 3

Capabilities: [68] MSI: Enable- Count=1/1 Maskable-

64bit+

Capabilities: [78] Express Endpoint, MSI 00

Capabilities: [b4] Vendor Specific Information: Len=14

Capabilities: [100] Virtual Channel

Capabilities: [128] Power Budgeting

Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1

Len=024

a4c26d1e5885305701be709a3d33442f.png

安装docker

[root@localhost ~]#

sudo tee /etc/yum.repos.d/docker.repo <

>

[dockerrepo]

> name=Docker

Repository

>

baseurl=https://yum.dockerproject.org/repo/main/centos/$releasever/

>

enabled=1

>

gpgcheck=1

>

gpgkey=https://yum.dockerproject.org/gpg

> EOF

[root@localhost ~]#

yum install docker-engine

[root@localhost ~]#

systemctl restart docker

[root@localhost ~]#

systemctl enable docker

安装NVIDIA驱动

[root@localhost ~]#

uname -r

3.10.0-327.13.1.el7.x86_64

[root@localhost ~]#

ll /usr/src/kernels/3.10.0-327.13.1.el7.x86_64/

版本要一致,否则检查修改grup2并重启。

出于兼容性的考虑,选择较低版本的驱动进行安装

[root@localhost ~]#

sh ./NVIDIA-Linux-x86_64-352.79_Tesla_C2050.run

[root@localhost ~]#

ll /dev/nvidia*

crw-rw-rw-. 1 root

root 195, 0

4月6

18:19 /dev/nvidia0

crw-rw-rw-. 1 root

root 195, 1

4月6

18:19 /dev/nvidia1

crw-rw-rw-. 1 root

root 195, 255 4月6 18:19

/dev/nvidiactl

crw-rw-rw-. 1 root

root 246, 0

4月6

18:19 nvidia-uvm

如果没有nvidia-uvm,则手动modprobe

[root@localhost ~]#

sudo modprobe nvidia_uvm

安装和配置CUDA环境

安装

[root@localhost

~]# rpm -ivh

cuda-repo-rhel7-7.5-18.x86_64.rpm

[root@localhost ~]#

yum clean expire-cache

[root@localhost ~]#

yum install cuda -y

出现DKMS dependency问题时,检查是否执行了yum install -y epel-release

配置环境变量

[root@localhost ~]#

find / -name nvcc

/usr/local/cuda-7.5/bin/nvcc

可知cuda版本是7.5,位于/usr/local/cuda-7.5/目录下。

[root@localhost ~]#

vim /etc/profile

…….

……..

export

PATH=/usr/local/cuda-7.5/bin:$PATH

export

LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH

安装nvidia-docker

# Install

nvidia-docker and nvidia-docker-plugin

[root@localhost ~]#

sudo tar --strip-components=1 -C /usr/bin -xvf

/tmp/nvidia-docker_1.0.0.beta.3_amd64.tar.xz && rm

/tmp/nvidia-docker*.tar.xz

# Run

nvidia-docker-plugin

[root@localhost ~]#

sudo -b nohup nvidia-docker-plugin >

/tmp/nvidia-docker.log

docker

images

REPOSITORY TAG IMAGE

ID CREATED SIZE

nvidia/cuda latest 22bde803e760 2 weeks

ago 1.226 GB

错误:

[root@localhost ~]#

nvidia-docker run --rm nvidia/cuda nvidia-smi

docker: Error

response from daemon: create nvidia_driver_352.79: create

nvidia_driver_352.79: Error looking up volume plugin nvidia-docker:

plugin not found.

See 'docker run

--help'.

解决办法:

nvidia-docker

volume setup

docker volume

ls

DRIVER VOLUME NAME

local nvidia_driver_352.79

a4c26d1e5885305701be709a3d33442f.png

测试

启动多个容器,并确认每个container中都有nvidia设备

mkdir -p

~/docker/digits

nvidia-docker run

-it -p 8080:8080 -v ~/docker/digits:/digits nvidia/cuda

nvidia-docker run

-it -p 8081:8080 -v ~/docker/digits:/digits nvidia/cuda

nvidia-docker run

-it -p 8082:8080 -v ~/docker/digits:/digits nvidia/cuda

nvidia-docker run

-it -p 8083:8080 -v ~/docker/digits:/digits nvidia/cuda

a4c26d1e5885305701be709a3d33442f.png

docker ps

-a

CONTAINER

ID IMAGE COMMAND CREATED STATUS PORTS NAMES

5d86dbc4047b nvidia/cuda "/bin/bash" About a minute ago Up About a

minute 0.0.0.0:8083->8080/tcp

romantic_williams

0c8d3300140b nvidia/cuda "/bin/bash" About a minute ago Up About a

minute 0.0.0.0:8082->8080/tcp tiny_shaw

5c927720fa16 nvidia/cuda "/bin/bash" 2 minutes

ago Up About a minute 0.0.0.0:8081->8080/tcp drunk_brattain

5ab94e3a21d2 nvidia/cuda "/bin/bash" 2 minutes

ago Up 2

minutes 0.0.0.0:8080->8080/tcp evil_pare

在容器中编译并运行cuda程序

cuda源文件

root@5ab94e3a21d2:~# cd

/digits/

root@5ab94e3a21d2:/digits# ls

hellocuda.cu

在容器中使用nvcc编译

root@5ab94e3a21d2:/digits# nvcc hellocuda.cu -o

hellocuda

在容器中运行程序

root@5ab94e3a21d2:/digits# ./hellocuda

16 18 20 22 24 26

28 30 32 34 36 38 40 42 44 46

也可在容器中使用其它测试NVIDIA/DIGITS[4][5]

[1] Single-Root

Input/Output Virtualization (SR-IOV) with Linux*

Containers

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值