使用xinference部署自定义embedding模型（docker）

shizidushu

已于 2024-08-27 11:04:03 修改

阅读量412

点赞数 20

文章标签： embedding xinference reranking

于 2024-08-27 10:55:51 首次发布

本文链接：https://blog.csdn.net/shizidushu/article/details/141597358

版权

使用xinference部署自定义embedding模型（docker）

说明：

首次发表日期：2024-08-27
官方文档： https://inference.readthedocs.io/zh-cn/latest/index.html

使用docker部署xinference

FROM nvcr.io/nvidia/pytorch:23.10-py3

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE=1

# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED=1

RUN python3 -m pip uninstall -y transformer-engine
RUN python3 -m pip install --upgrade pip


RUN python3 -m pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --no-cache-dir --index-url https://download.pytorch.org/whl/cu121

# If there are network issue, you can download torch whl file and use it
# ADD torch-2.3.0+cu121-cp310-cp310-linux_x86_64.whl /root/torch-2.3.0+cu121-cp310-cp310-linux_x86_64.whl
# RUN python3 -m pip install /root/torch-2.3.0+cu121-cp310-cp310-linux_x86_64.whl


RUN python3 -m pip install packaging setuptools==69.5.1 --no-cache-dir -i https://mirror.baidu.com/pypi/simple
RUN python3 -m pip install -U ninja --no-cache-dir -i https://mirror.baidu.com/pypi/simple
RUN python3 -m pip install flash-attn==2.5.8 --no-build-isolation --no-cache-dir
RUN python3 -m pip install "xinference[all]" --no-cache-dir -i https://repo.huaweicloud.com/repository/pypi/simple

EXPOSE 80

CMD ["sh", "-c", "tail -f /dev/null"]

构建镜像

docker build -t myxinference:latest .

参照 https://inference.readthedocs.io/zh-cn/latest/getting_started/using_docker_image.html#mount-your-volume-for-loading-and-saving-models 部署docker服务

另外，如果使用huggingface的话，建议使用 https://hf-mirror.com/ 镜像（记得docker部署时设置HF_ENDPOINT环境变量）。

以下假设部署后的服务地址为 http://localhost:9997

部署自定义 embedding 模型

准备embedding模型自定义JSON文件

创建文件夹custom_models/embedding：

mkdir -p custom_models/embedding

然后创建以下模型自定义JSON文件：

360Zhinao-search.json:

{
    "model_name": "360Zhinao-search",
    "dimensions": 1024,
    "max_tokens": 512,
    "language": ["en", "zh"],
    "model_id": "qihoo360/360Zhinao-search",
    "model_format": "pytorch"
}

gte-Qwen2-7B-instruct.json：

{
    "model_name": "gte-Qwen2-7B-instruct",
    "dimensions": 4096,
    "max_tokens": 32768,
    "language": ["en", "zh"],
    "model_id": "Alibaba-NLP/gte-Qwen2-7B-instruct",
    "model_format": "pytorch"
}

zpoint_large_embedding_zh.json:

{
    "model_name": "zpoint_large_embedding_zh",
    "dimensions": 1792,
    "max_tokens": 512,
    "language": ["zh"],
    "model_id": "iampanda/zpoint_large_embedding_zh",
    "model_format": "pytorch"
}

注意：对于下载到本地的模型可以设置 model_uri参数，例如 “[file:///path/to/llama-2-7b](file:///path/to/llama-2-7b)”。

注册自定义 embedding 模型

xinference register --model-type embedding --file custom_models/embedding/360Zhinao-search.json --persist --endpoint http://localhost:9997

xinference register --model-type embedding --file custom_models/embedding/gte-Qwen2-7B-instruct.json --persist --endpoint http://localhost:9997

xinference register --model-type embedding --file custom_models/embedding/zpoint_large_embedding_zh.json --persist --endpoint http://localhost:9997

启动自定义 embedding 模型

xinference launch --model-type embedding --model-name gte-Qwen2-7B-instruct --model-engine transformers  --model-format pytorch --endpoint http://localhost:9997

xinference launch --model-type embedding --model-name 360Zhinao-search --model-engine transformers  --model-format pytorch --endpoint http://localhost:9997

xinference launch --model-type embedding --model-name zpoint_large_embedding_zh --model-engine transformers  --model-format pytorch --endpoint http://localhost:9997

启动bge-m3和bge-reranker-base模型

bge-m3和bge-reranker-base是比较常用的embedding模型和reranking模型。

xinference launch --model-name bge-m3 --model-type embedding --endpoint http://localhost:9997

xinference launch --model-name bge-reranker-base --model-type rerank --endpoint http://localhost:9997

curl调用测试

embedding:

curl http://localhost:9997/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "360Zhinao-search",
    "encoding_format": "float"
  }'

reranking:

curl http://localhost:9997/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
  "model": "bge-reranker-base",
  "query": "I love you",
  "documents": [
    "I hate you",
    "I really like you",
    "天空是什么颜色的",
    "黑芝麻味饼干"
  ],
  "top_n": 3
}'

shizidushu

关注

20
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
使用xinference部署自定义embedding模型（docker）

使用xinference部署自定义embedding模型（docker）## 说明：- 首次发表日期：2024-08-27- 官方文档： https://inference.readthedocs.io/zh-cn/latest/index.html
复制链接

扫一扫