ubuntu18 cuda nvidia-driver nvidia-docker 环境配置详细步骤-phy12321

ubuntu 分区说明(2T磁盘):

efi 500M(自动生成)
swap 交换分区 同内存大小
/	逻辑分区(ext4,类似于c盘) 	200G
/boot	主分区(ext4)		200G
/home 逻辑分区(注意是xfs)		1T
/var   主分区,xfs          600G

进入系统后:

firefox 登录

下载配置teamviewer

换源:
sudo gedit /etc/apt/source.list

deb https://mirrors.ustc.edu.cn/ubuntu/ bionic main restricted universe multiverse
deb-src https://mirrors.ustc.edu.cn/ubuntu/ bionic main restricted universe multiverse
deb https://mirrors.ustc.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse
deb-src https://mirrors.ustc.edu.cn/ubuntu/ bionic-updates main restricted universe multiverse
deb https://mirrors.ustc.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse
deb-src https://mirrors.ustc.edu.cn/ubuntu/ bionic-backports main restricted universe multiverse
deb https://mirrors.ustc.edu.cn/ubuntu/ bionic-security main restricted universe multiverse
deb-src https://mirrors.ustc.edu.cn/ubuntu/ bionic-security main restricted universe multiverse
deb https://mirrors.ustc.edu.cn/ubuntu/ bionic-proposed main restricted universe multiverse
deb-src https://mirrors.ustc.edu.cn/ubuntu/ bionic-proposed main restricted universe multiverse

sudo apt update
sudo apt upgrade
也可以直接在图形化界面的software 里面改掉,选择 China server里面的一个就行。

一些基本的安装包

后面会用到:

sudo apt install openssh-server htop axel  git build-essential make cmake gcc

nvidia-driver 安装

sudo gedit  /etc/modprobe.d/blacklist.conf

最后面加入这两句:

blacklist nouveau
options nouveau modeset=0

然后保存退出:

sudo update-initramfs -u

重启:

reboot

验证nouveau是否已禁用

lsmod | grep nouveau

方法一:

这里以415版本的驱动为例:

sudo add-apt-repository ppa:graphics-drivers
sudo apt update
sudo apt install nvidia-415 nvidia-primer

然后重启
验证安装成功:
nvidia-smi

方法二:

去官网提前下载好驱动文件,然后:

sudo service lightdm stop

then login and :

sudo ./NVIDIA-Linux-x86_64-430.50.run -no-x-check -no-nouveau-check --no-opengl-files
 

然后重启
验证安装成功:

nvidia-smi

cuda安装

下载好文件后直接:

 sudo ./cuda_10.1.105_418.39_linux.run

注意不要选择nvidia-driver,前面已经安装过了

sudo gedit ~/.bashrc

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
 

重新打开终端,输入:

nvcc -V
检查是否安装成功

安装docker

sudo apt-get update
sudo apt-get install  apt-transport-https  ca-certificates curl  software-properties-common
curl -fsSL  https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/gpg | sudo apt-key add - 
sudo add-apt-repository "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install docker-ce

systemctl status docker检测是否成功安装

安装nvidia-docker2

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - 
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl daemon-reload
sudo systemctl restart docker

检测是否成功:
sudo docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi

更改docker源:

sudo gedit /etc/docker/daemon.json

{  
    "registry-mirrors": ["http://hub-mirror.c.163.com","https://registry.docker-cn.com"],
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

重启docker生效:
systemctl daemon-reload
systemctl restart docker

限制容器大小:

docker的storage-driver是overlay2时,可以限制单个容器可占用的磁盘空间
overlay2.size是在 17.07.0-ce 中引入的: Add overlay2.size daemon storage-opt 。
docker的overlay2需要的是xfs文件系统的pquota ,
docker daemon配置项 中介绍了 overlay2.size 配置项,可以用来限制每个容器可以占用的磁盘空间。

overlay2.size
Sets the default max size of the container. It is supported only when the backing fs is xfs and mounted with pquota mount option. Under these conditions the user can pass any size less then the backing fs size.

如文档中所述,需要使用xfs文件系统,并且挂载时使用 pquota

开启xfs的quota特性:

在 /etc/fstab 中开启/home的pquota特性:

UUID=cd11c77d-1bb7-4545-ba3c-53069307cf4a /var xfs usrquota,grpquota,prjquota,defaults 0 0
reboot

配置docker daemon

docker 默认的存储路径查看:

docker info

(一般在 /var/lib/docker)

先要将docker容器的默认存储目录改到XFS文件系统的路径下,本文中将/var 设置为XFS , so no need to change path

/etc/docker/daemon.json 配置文件如下,这里将每个容器可以使用的磁盘空间设置为200G,log文件为10M

{  
    "storage-driver": "overlay2",
    "storage-opts": [
        "overlay2.override_kernel_check=true",
         "overlay2.size=200G"
    ]
    "log-driver": "json-file",
    "log-opts": {
        "max-size": "10m"
  }    

    "registry-mirrors": ["http://hub-mirror.c.163.com","https://registry.docker-cn.com"],
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
systemctl daemon-reload
systemctl restart docker

if restart get failed , just delet the “storage-driver”,“storage-opts”,“log-driver”,“log-opts” and restart.In this way the size of container cannot be limited,so you need to add the “–storage-opt size=70G” into command when trying to run a container from image.

docker daemon 参考配置项:

https://docs.docker.com/engine/reference/commandline/dockerd/#docker-runtime-execution-options

载入本地镜像

docker load < nginx.tar
docker save > nginx.tar nginx:latest  
docker image ls

启动镜像

-p端口映射, -v 挂载数据集,只读
sudo nvidia-docker run -it -p 10000:22 -p 10500:3389 -v /home/phy/public_data:/data:ro -v /home/phy/output:/output 410/docker:pytorch bash

docker run 的其他参数:

  • 使容器随着docker daemon的启动一同启动:
--restart=always
  • 对容器命名:
--name xxx
  • 指定容器大小:
--storage-opt size=70G

docker 的其他命令

  • 查看容器大小:
sudo  docker ps -s
  • delet all stopped container
sudo docker container prune
  • 查看docker信息:
sudo docker info

根据dockerfile建立镜像

自己的dockerfile供学习交流:

FROM nvidia/cudagl:9.2-devel-ubuntu16.04
MAINTAINER phy12321@mail.ustc.edu.cn
RUN apt-get update \
 && sh -c '/bin/echo -e "\n" | apt-get install apt-transport-https'
# source and apt
RUN mv /etc/apt/sources.list /etc/apt/sources.list_bkp \
 && touch /etc/apt/sources.list \
 && echo "deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial main restricted universe multiverse" > /etc/apt/sources.list \
 && echo "deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse" >> /etc/apt/sources.list \
 && echo "deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse" >> /etc/apt/sources.list \
 && echo "deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-security main restricted universe multiverse" >> /etc/apt/sources.list
 
# apt  
RUN apt-get update && apt-get update 
RUN apt-get install -y openssh-server vim axel net-tools iproute iproute-doc xarclock tmux gedit htop lsb-core git build-essential
 

RUN  rm -rf /var/lib/apt/lists/*
# ssh
RUN mkdir /var/run/sshd
RUN rm /etc/ssh/ssh_host_rsa_key && rm /etc/ssh/ssh_host_dsa_key
RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
RUN sh -c '/bin/echo "root:111111"| chpasswd'
RUN /bin/sed -i 's/.*session.*required.*pam_loginuid.so.*/session optional pam_loginuid.so/g' /etc/pam.d/sshd
RUN /bin/echo -e "LANG=\"en_US.UTF-8\"" > /etc/default/local
# port
EXPOSE 10003
RUN echo "service ssh start" >> ~/.bashrc \
 && echo "export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}" >> ~/.bashrc \
 && echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" >> ~/.bashrc \
 && echo "echo '**********************************************'" >> ~/.bashrc \
 && echo "echo '*******   ubunt 16.04 docker container  ******'" >> ~/.bashrc \
 && echo "tmux" >> ~/.bashrc
CMD bash && apt-get update
Then build docker image from docker file:

first do to the docker file’s path,then:

 docker build -t image_name  .

启动容器(from built image above):

启动容器:
sudo nvidia-docker run -it  --privileged 
--name ubuntu16 \
--restart=always \
-v /home/phy/public_data:/data:ro \
-v /home/phy/output:/output \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v /etc/localtime:/etc/localtime:ro \
-e DISPLAY=unix$DISPLAY \
-e GDK_SCALE -e GDK_DPI_SCALE --net=host  \
cuda9.2_opengl_ubuntu16:base \
bash

进入容器后配置ssh服务:

vim /etc/ssh/sshd_config

更改端口和root登录:

port 10003
PermitRootLogin yes

重启ssh服务:

service ssh restart

引用:

https://www.cnblogs.com/nrm1/p/10219434.html
https://www.cnblogs.com/nrm1/p/10219269.html
https://www.cnblogs.com/nrm1/p/10219754.html

next: 容器内GUI界面远程输出到win10

https://blog.csdn.net/phy12321/article/details/102779888GUI界面远程输出到win10

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值