【xinference】（14）：在compshare上，安装nvidia-docker工具，成功启动推理框架xinference-gpu的docker镜像，并运行大模型，非常简单方便！

最新推荐文章于 2025-03-25 16:56:20 发布

fly-iot

最新推荐文章于 2025-03-25 16:56:20 发布

阅读量2k

点赞数 12

分类专栏： compshare 大模型 xinference 文章标签： docker 容器运维

本文链接：https://blog.csdn.net/freewebsys/article/details/140193692

版权

大模型同时被 3 个专栏收录

74 篇文章

订阅专栏

compshare

9 篇文章

订阅专栏

xinference

9 篇文章

订阅专栏

关于compshare算力共享平台

关于UCloud(优刻得)旗下的compshare算力共享平台
UCloud(优刻得)是中国知名的中立云计算服务商，科创板上市，中国云计算第一股。
Compshare GPU算力平台隶属于UCloud，专注于提供高性价4090算力资源，配备独立IP，支持按时、按天、按月灵活计费，支持github、huggingface访问加速。

https://www.compshare.cn/?ytag=GPU_flyiot_Lcsdn_csdn_display

视频演示

【xinference】（14）：在compshare上，使用nvidia-docker方式，成功启动推理框架xinference，并运行大模型，非常简单方便

1，创建带cuda的基础镜像

https://www.compshare.cn/ 在这里插入图片描述

2，直接使用apt 更新安装docker，核心方法

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia-docker2

Setting up containerd (1.7.12-0ubuntu2~20.04.1) ...
Created symlink /etc/systemd/system/multi-user.target.wants/containerd.service → /lib/systemd/system/containerd.service.
Setting up nvidia-container-toolkit (1.13.5-1) ...
Setting up docker.io (24.0.7-0ubuntu2~20.04.1) ...
Adding group `docker' (GID 117) ...
Done.
Created symlink /etc/systemd/system/multi-user.target.wants/docker.service → /lib/systemd/system/docker.service.
Created symlink /etc/systemd/system/sockets.target.wants/docker.socket → /lib/systemd/system/docker.socket.
Setting up dnsmasq-base (2.90-0ubuntu0.20.04.1) ...
Setting up nvidia-docker2 (2.13.0-1) ...
Setting up ubuntu-fan (0.12.13ubuntu0.1) ...
Created symlink /etc/systemd/system/multi-user.target.wants/ubuntu-fan.service → /lib/systemd/system/ubuntu-fan.service.
Processing triggers for systemd (245.4-4ubuntu3) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for dbus (1.12.16-2ubuntu2) ...
Processing triggers for libc-bin (2.31-0ubuntu9) ...

然后修改docker配置：

 # cat /etc/docker/daemon.json 
{
   "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
   },
  "data-root": "/data/docker",
  "registry-mirrors" : [
      "https://registry.dockermirror.com"
    ]
}

3，给当前用户增加权限

# 把当前用户加入到 docker 组；
sudo gpasswd -a $USER docker
# 更新docker组
newgrp docker
# 增加自动启动
sudo systemctl enable docker
sudo systemctl restart docker

4，还可以把安装好的制作镜像

可以随时创建镜像。把安装成功的软件保存起来。
在这里插入图片描述

在这里插入图片描述

5，测试下载xinf镜像27G，需要等待很长时间

https://inference.readthedocs.io/zh-cn/latest/getting_started/using_docker_image.html#docker-image
在这里插入图片描述

docker images
REPOSITORY          TAG       IMAGE ID       CREATED      SIZE
xprobe/xinference   latest    693b25cfab7c   6 days ago   27.7GB

docker run -e XINFERENCE_MODEL_SRC=modelscope -p 8080:9997 --gpus all xprobe/xinference:latest xinference-local -H 0.0.0.0 --log-level debug

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.


2024-07-05 01:10:35,270 xinference.core.supervisor 49 INFO     Xinference supervisor 0.0.0.0:55354 started
2024-07-05 01:10:35,886 xinference.core.worker 49 INFO     Starting metrics export server at 0.0.0.0:None
2024-07-05 01:10:35,889 xinference.core.worker 49 INFO     Checking metrics export server...
2024-07-05 01:10:38,039 xinference.core.worker 49 INFO     Metrics server is started at: http://0.0.0.0:37643

然后就可以通过界面启动模型了，支持大语言模型，embedding模型，rerank等。

在这里插入图片描述

下载模型镜像

2024-07-05 01:10:48,112 xinference.core.worker 49 DEBUG    Leave get_model_count, elapsed time: 0 s
2024-07-05 01:10:48,112 xinference.core.worker 49 DEBUG    Enter launch_builtin_model, args: (<xinference.core.worker.WorkerActor object at 0x7fdbe88a96c0>,), kwargs: {'model_uid': 'sd3-medium-1-0', 'model_name': 'sd3-medium', 'model_size_in_billions': None, 'model_format': None, 'quantization': None, 'model_engine': None, 'model_type': 'image', 'n_gpu': 'auto', 'request_limits': None, 'peft_model_config': None, 'gpu_idx': None}
2024-07-05 01:10:48,112 xinference.core.worker 49 DEBUG    GPU selected: [0] for model sd3-medium-1-0
2024-07-05 01:10:54,021 xinference.model.image.core 49 DEBUG    Image model sd3-medium found in ModelScope.
2024-07-05 01:10:54,095 - modelscope - INFO - PyTorch version 2.3.0 Found.
2024-07-05 01:10:54,097 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-07-05 01:10:54,149 - modelscope - INFO - Loading done! Current index file version is 1.15.0, with md5 b12dbd38edf9ec469b2df324fc5cde83 and a total number of 980 components indexed
Downloading: 100%|██████████| 372/372 [00:00<00:00, 1.81kB/s]
Downloading: 100%|██████████| 740/740 [00:00<00:00, 3.70kB/s]
Downloading: 100%|██████████| 574/574 [00:00<00:00, 1.78kB/s]
Downloading: 100%|██████████| 570/570 [00:00<00:00, 2.86kB/s]
Downloading: 100%|██████████| 739/739 [00:00<00:00, 3.70kB/s]
Downloading: 100%|██████████| 59.0/59.0 [00:00<00:00, 295B/s]
Downloading: 100%|██████████| 160M/160M [00:13<00:00, 12.4MB/s] 
Downloading:  23%|██▎       | 918M/3.88G [01:06<02:09, 24.8MB/s]