在这里插入代码片
1.下载docker
sudo -i
apt install docker.io
2.配置nvidia-docker相关依赖
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker
apt install curl
curl https://get.docker.com | sh \
&& sudo systemctl --now enable docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list
apt-get update
apt-get install -y nvidia-docker2
systemctl restart docker
3.测试
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
输出显卡驱动则安装成功
4.pytorch
https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/running.html#running
4.1拉一个比较老的版本,pytorch=1.3,cuda=10.1,ubuntu=18.04,
docker pull nvcr.io/nvidia/pytorch:19.10-py3
4.2创建容器
docker run -it --name torch13 --gpus all -p 6666:22 -v /home/wjd/yolov3:/workspace/yolov3 --shm-size 16g nvcr.io/nvidia/pytorch:19.10-py3 /bin/bash
除了要加–gpus all之外,其他和docker操作没有任何区别,端口号这边直接映射到了22,方便ssh连接,–shm-size分配内存,不分配的话docker里默认给64MB,cpu会先把数据拷贝到RAM里然后再送给gpu,分配少了的话会报dataloader错
4.3换ubuntu的源,具体方法查看其他博客
apt update
apt install openssh-server
4.4设置密码,并开放用root登陆权限
passwd
vi /etc/ssh/sshd_config
修改PermitRootLogin 的值为 yes
service ssh restart
重启ssh
5.其他
退出 ssh root@192.168.123.123 -p 6666即可连入。ip和端口号均为本地,
注意容器如果关闭的话,需要重新进入容器输入service ssh restart,docker run -it 退出容器后会关闭容器,需要重新启动 docker start torch13,也可以使用docker run -id创建,然后docker exec -it进入,否则会出现
ssh: connect to host 192.168.123.123 port 6666: Connection refused
6.Tensorflow
和pytorch一样的操作,不再演示
https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/running.html#running
docker pull nvcr.io/nvidia/tensorflow:21.09-tf2-py3
7.pycharm
https://blog.csdn.net/Lin_Danny/article/details/82185023?ops_request_misc=&request_id=&biz_id=102&utm_term=pycharm%E8%BF%9C%E7%A8%8B%E8%B0%83%E8%AF%95&utm_medium=distribute.pc_search_result.none-task-blog-2allsobaiduweb~default-1-82185023.nonecase&spm=1018.2226.3001.4187
都是一样的,配置ssh的时候需要改下端口号(6666)就行,不再是默认的22。进入的时候不再默认添加环境变量,python是默认在conda环境下的,所以conda也需要添加环境变量:
vi ~/.bashrc
export CUDA_HOME=/usr/local/cuda
export PATH=$PATH:/usr/local/cuda/bin:/opt/conda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
source ~/.bashrc