文章目录
获取cuda docker
docker 镜像list: https://hub.docker.com/r/nvidia/cuda
找到自己合适的镜像
拉取docker
sudo docker pull nvidia/cuda:11.1.1-cudnn8-devel-centos7
启动docker 验证环境是否work
sudo docker run -it --name mydet2 --rm --gpus all nvidia/cuda:11.1.1-cudnn8-devel-centos7 nvidia-smi -l
Wed Oct 26 09:47:25 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01 Driver Version: 470.63.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:04:00.0 Off | N/A |
| 33% 35C P0 88W / 350W | 0MiB / 12053MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:0B:00.0 Off | N/A |
| 33% 35C P0 1W / 350W | 0MiB / 12053MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
nvidia的卡在docker环境下是可用的。
在mydet2 container中下载相关torch和dectron2的框架
先安装anaconda,并切换源
版本的选择参见这里
https://blog.csdn.net/weixin_40293999/article/details/127377288
一个小问题
此时docker的date时区都是错的
(base) [root@fc52f21b15e6 /]# date
Wed Oct 26 10:03:03 UTC 2022
索性安装vim,wget插件,把yum源换了
切换yum源要用wget
先安装wget
yum install wget
Loaded plugins: fastestmirror, ovl
Loading mirror speeds from cached hostfile
* base: mirrors.huaweicloud.com
* extras: mirrors.tuna.tsinghua.edu.cn
* updates: mirrors.tuna.tsinghua.edu.cn
base | 3.6 kB 00:00:00
https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/repodata/repomd.xml: [Errno 14] HTTPS Error 301 - Moved Permanently
Trying other mirror.
One of the configured repositories failed (cuda),
and yum doesn't have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Run the command with the repository temporarily disabled
yum --disablerepo=cuda ...
4. Disable the repository permanently, so yum won't use it by default. Yum
will then just ignore the repository until you permanently enable it
again or use --enablerepo for temporary usage:
yum-config-manager --disable cuda
or
subscription-manager repos --disable=cuda
5. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager --save --setopt=cuda.skip_if_unavailable=true
failure: repodata/repomd.xml from cuda: [Errno 256] No more mirrors to try.
https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/repodata/repomd.xml: [Errno 14] HTTPS Error 301 - Moved Permanently
解决方案:mv cuda.repo cuda.repo.bk
再次yum install wget 搞定
切换yum源指引在这里:https://blog.csdn.net/MateSnake/article/details/124088310
解决时区问题:
docker 外面
sudo docker cp /usr/share/zoneinfo/ mydet2:/usr/share/
docker 内部:
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
echo “Asia/Shanghai” > /etc/timezone
验证时间
date
(base) [root@fc52f21b15e6 yum.repos.d]# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
(base) [root@fc52f21b15e6 yum.repos.d]# echo "Asia/Shanghai" > /etc/timezone
(base) [root@fc52f21b15e6 yum.repos.d]# date
Wed Oct 26 18:22:38 CST 2022
安装vim
yum -y install vim*
切换conda源和pip源的指引在这里:
https://blog.csdn.net/weixin_40293999/article/details/126776913
2.安装pytorch 和 detectron2
几次下不下来,多来几次好使了 原因未知
conda install pytorch1.8.0 torchvision0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
(base) [root@fc52f21b15e6 yum.repos.d]# conda install pytorch1.8.0 torchvision0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
- torchvision0.9.0
- pytorch1.8.0
and use the search bar at the top of the page.
(base) [root@fc52f21b15e6 yum.repos.d]# conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
Collecting package metadata (current_repodata.json): \ done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /root/miniconda3
added / updated specs:
- cudatoolkit=11.1
- pytorch==1.8.0
- torchaudio==0.8.0
- torchvision==0.9.0
反向打包,并推到hub仓库
### 反向打包
sudo docker commit mydet2 justin0114/cuda11.1_torch1.8_det2
### 推到hub仓库
sudo docker push justin0114/cuda11.1_torch1.8_det2
总结
到这里,这个det2的环境已经搞定了。