YMIR部署及部署的问题处理

前言

YMIR的在github官方地址:https://github.com/IndustryEssentials/ymir

1.YMIR是什么?

  1. YMIR项目是一个开源AI算法平台,能以无代码开发方式,实现数据管理、数据挖掘、模型训练、模型验证等功能
  2. 一站式模型生产和部署平台,特点免费开放、无代码、容易上手等
  3. 降低算法模型的门槛和成本,实现AI模型生产的全生命周期覆盖

2.为什么使用YMIR?

  1. YMIR系统为算法人员提供端到端的算法研发工具,围绕AI开发过程中所需要数据处理、模型训练等业务需求提供一站式服务,推动算法技术进步;
  2. 算法模型的全生命周期覆盖,后期模型部署以后,同时方便监测算法在场景中使用情况

3.YMIR如何部署?

可参考官方文档:https://github.com/IndustryEssentials/ymir/blob/master/README_zh-CN.md#2-%E5%AE%89%E8%A3%85

环境依赖(重要)

1.GPU版本需要GPU,并安装nvidia驱动: https://www.nvidia.cn/geforce/drivers/
2.需要安装 docker 及 docker compose:
● docker compose >= 1.29.2, docker >= 20.10
● Docker & Docker Compose 安装: https://docs.docker.com/get-docker/
● NVIDIA Docker安装: nvidia-docker install-guide

## 通过nvidia-smi查看主机显卡驱动支持的最高cuda版本
nvidia-smi
## 对支持CUDA11以上版本的主机, 检查nvidia-docker是否安装成功
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
## 对支持CUDA10的主机, 检测nvidia-docker是否安装成功
sudo docker run --rm --gpus all nvidia/cuda:10.2-base-ubuntu18.04 nvidia-smi
## 上述命令在终端应输出类似以下的结果 (最高支持cuda 11.6)
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02    Driver Version: 510.60.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:0B:00.0 Off |                    0 |
| N/A   62C    P0    55W /  75W |   4351MiB /  7680MiB |     94%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      8132      C                                    4349MiB |
+-----------------------------------------------------------------------------+

3.推荐服务器配置
● NVIDIA GeForce RTX 2080 Ti 12G
● 显存最大值到达9974MiB
● 显卡驱动支持的最高CUDA 版本 >= 11.2

PS:注意这里一定安装好环境依赖,要不然后面安装YMIR容易出现各种问题。

1.拉取YMIR项目代码

执行拉取项目命令:

git clone https://github.com/IndustryEssentials/ymir

在这里插入图片描述
项目代码说明:

  1. .env配置文件,如果没有可用显卡,用户需要安装CPU模式,将文件中参数SERVER_RUNTIME参数修改为runc;该文件包含YMIR项目使用的MySQL等配置信息
  2. docker-compose文件,可以指定启动的镜像

2.执行启动脚本

# 在YMIR目录下执行启动命令
#启动:
bash ymir.sh start
#停止:
bash ymir.sh stop

PS:如果出现无法拉取的情况,建议更换一下docker的镜像源

拉取的镜像:
在这里插入图片描述
启动的镜像:
在这里插入图片描述

3.访问页面

http://localhost:12001/
在这里插入图片描述

4.遇到的问题处理

1.如何构造镜像

cd /ymir/ymir/web
docker build -f Dockerfile -t industryessentials/ymir-web:release-version .
# 注意构建镜像完成以后,在docker-compose文件中,启用该镜像,然后启动ymir项目即可。

2.修改Nginx配置文件

ymir/web/nginx.conf.template 是Nginx的配置文件,可以在这里修改

TODO:2023-07-22 17:09:10,当前YMIR最新版本是2.5,有机会记录一下2.5的安装教程。

5.安装过程中遇到的问题

1.docker访问失败
错误信息:

permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/version": dial unix /var/run/docker.sock: connect: permission denied
Traceback (most recent call last):
  File "urllib3/connectionpool.py", line 677, in urlopen
  File "urllib3/connectionpool.py", line 392, in _make_request
  File "http/client.py", line 1277, in request
  File "http/client.py", line 1323, in _send_request
  File "http/client.py", line 1272, in endheaders
  File "http/client.py", line 1032, in _send_output
  File "http/client.py", line 972, in send
  File "docker/transport/unixconn.py", line 43, in connect
PermissionError: [Errno 13] Permission denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "requests/adapters.py", line 449, in send
  File "urllib3/connectionpool.py", line 727, in urlopen
  File "urllib3/util/retry.py", line 410, in increment
  File "urllib3/packages/six.py", line 734, in reraise
  File "urllib3/connectionpool.py", line 677, in urlopen
  File "urllib3/connectionpool.py", line 392, in _make_request
  File "http/client.py", line 1277, in request
  File "http/client.py", line 1323, in _send_request
  File "http/client.py", line 1272, in endheaders
  File "http/client.py", line 1032, in _send_output
  File "http/client.py", line 972, in send
  File "docker/transport/unixconn.py", line 43, in connect
urllib3.exceptions.ProtocolError: ('Connection aborted.', PermissionError(13, 'Permission denied'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "docker/api/client.py", line 214, in _retrieve_server_version
  File "docker/api/daemon.py", line 181, in version
  File "docker/utils/decorators.py", line 46, in inner
  File "docker/api/client.py", line 237, in _get
  File "requests/sessions.py", line 543, in get
  File "requests/sessions.py", line 530, in request
  File "requests/sessions.py", line 643, in send
  File "requests/adapters.py", line 498, in send
requests.exceptions.ConnectionError: ('Connection aborted.', PermissionError(13, 'Permission denied'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "docker-compose", line 3, in <module>
  File "compose/cli/main.py", line 81, in main
  File "compose/cli/main.py", line 200, in perform_command
  File "compose/cli/command.py", line 70, in project_from_options
  File "compose/cli/command.py", line 153, in get_project
  File "compose/cli/docker_client.py", line 43, in get_client
  File "compose/cli/docker_client.py", line 170, in docker_client
  File "docker/api/client.py", line 197, in __init__
  File "docker/api/client.py", line 222, in _retrieve_server_version
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied'))
[618061] Failed to execute script docker-compose

原因:
当前用户,没有访问docker的权限
处理办法:

# 创建docker组
sudo groupadd docker
# 将用户假如docker组
sudo usermod -aG docker zmj
# 重启一下
newgrp docker
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值