【深度学习】搭建一个docker环境，封装detectron2的分割网络服务3080TI cuda11.1

置顶 weixin_40293999

已于 2022-12-27 22:38:36 修改

阅读量823

点赞数

文章标签： docker 深度学习容器

于 2022-10-26 18:56:29 首次发布

本文链接：https://blog.csdn.net/weixin_40293999/article/details/127537212

版权

文章目录

获取cuda docker
启动docker 验证环境是否work
在mydet2 container中下载相关torch和dectron2的框架
- 一个小问题
验证时间
- 2.安装pytorch 和 detectron2
总结

获取cuda docker

docker 镜像list： https://hub.docker.com/r/nvidia/cuda
找到自己合适的镜像
拉取docker

sudo docker pull nvidia/cuda:11.1.1-cudnn8-devel-centos7

启动docker 验证环境是否work

 sudo docker run -it --name mydet2 --rm --gpus all  nvidia/cuda:11.1.1-cudnn8-devel-centos7 nvidia-smi -l
Wed Oct 26 09:47:25 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
| 33%   35C    P0    88W / 350W |      0MiB / 12053MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0B:00.0 Off |                  N/A |
| 33%   35C    P0     1W / 350W |      0MiB / 12053MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvidia的卡在docker环境下是可用的。

在mydet2 container中下载相关torch和dectron2的框架

先安装anaconda，并切换源

版本的选择参见这里
https://blog.csdn.net/weixin_40293999/article/details/127377288

一个小问题

此时docker的date时区都是错的

(base) [root@fc52f21b15e6 /]# date
Wed Oct 26 10:03:03 UTC 2022

索性安装vim，wget插件，把yum源换了
切换yum源要用wget
先安装wget

yum install wget
Loaded plugins: fastestmirror, ovl
Loading mirror speeds from cached hostfile



 * base: mirrors.huaweicloud.com
 * extras: mirrors.tuna.tsinghua.edu.cn
 * updates: mirrors.tuna.tsinghua.edu.cn
base                                                                                                                                                   | 3.6 kB  00:00:00
https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/repodata/repomd.xml: [Errno 14] HTTPS Error 301 - Moved Permanently
Trying other mirror.


 One of the configured repositories failed (cuda),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Run the command with the repository temporarily disabled
            yum --disablerepo=cuda ...

     4. Disable the repository permanently, so yum won't use it by default. Yum
        will then just ignore the repository until you permanently enable it
        again or use --enablerepo for temporary usage:

            yum-config-manager --disable cuda
        or
            subscription-manager repos --disable=cuda

     5. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=cuda.skip_if_unavailable=true

failure: repodata/repomd.xml from cuda: [Errno 256] No more mirrors to try.
https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/repodata/repomd.xml: [Errno 14] HTTPS Error 301 - Moved Permanently

解决方案：mv cuda.repo cuda.repo.bk
再次yum install wget 搞定

切换yum源指引在这里：https://blog.csdn.net/MateSnake/article/details/124088310

解决时区问题：
docker 外面
sudo docker cp /usr/share/zoneinfo/ mydet2:/usr/share/
docker 内部：
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
echo “Asia/Shanghai” > /etc/timezone

验证时间

date

(base) [root@fc52f21b15e6 yum.repos.d]# ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
(base) [root@fc52f21b15e6 yum.repos.d]# echo "Asia/Shanghai" > /etc/timezone
(base) [root@fc52f21b15e6 yum.repos.d]# date
Wed Oct 26 18:22:38 CST 2022

安装vim
yum -y install vim*

切换conda源和pip源的指引在这里:
https://blog.csdn.net/weixin_40293999/article/details/126776913

2.安装pytorch 和 detectron2

几次下不下来，多来几次好使了原因未知
conda install pytorch1.8.0 torchvision0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge

(base) [root@fc52f21b15e6 yum.repos.d]# conda install pytorch1.8.0 torchvision0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
  - torchvision0.9.0
  - pytorch1.8.0


and use the search bar at the top of the page.


(base) [root@fc52f21b15e6 yum.repos.d]# conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
Collecting package metadata (current_repodata.json): \                                                                                                                                                                                                                                                                                                done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
  environment location: /root/miniconda3
  added / updated specs:
    - cudatoolkit=11.1
    - pytorch==1.8.0
    - torchaudio==0.8.0
    - torchvision==0.9.0

反向打包，并推到hub仓库

### 反向打包
sudo docker commit mydet2 justin0114/cuda11.1_torch1.8_det2
### 推到hub仓库
sudo docker push justin0114/cuda11.1_torch1.8_det2

总结

到这里，这个det2的环境已经搞定了。

weixin_40293999

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫