使用xinference部署自定义embedding模型(docker)

使用xinference部署自定义embedding模型(docker)

说明:

  • 首次发表日期:2024-08-27
  • 官方文档: https://inference.readthedocs.io/zh-cn/latest/index.html

使用docker部署xinference

FROM nvcr.io/nvidia/pytorch:23.10-py3

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE=1

# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED=1

RUN python3 -m pip uninstall -y transformer-engine
RUN python3 -m pip install --upgrade pip


RUN python3 -m pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --no-cache-dir --index-url https://download.pytorch.org/whl/cu121

# If there are network issue, you can download torch whl file and use it
# ADD torch-2.3.0+cu121-cp310-cp310-linux_x86_64.whl /root/torch-2.3.0+cu121-cp310-cp310-linux_x86_64.whl
# RUN python3 -m pip install /root/torch-2.3.0+cu121-cp310-cp310-linux_x86_64.whl


RUN python3 -m pip install packaging setuptools==69.5.1 --no-cache-dir -i https://mirror.baidu.com/pypi/simple
RUN python3 -m pip install -U ninja --no-cache-dir -i https://mirror.baidu.com/pypi/simple
RUN python3 -m pip install flash-attn==2.5.8 --no-build-isolation --no-cache-dir
RUN python3 -m pip install "xinference[all]" --no-cache-dir -i https://repo.huaweicloud.com/repository/pypi/simple

EXPOSE 80

CMD ["sh", "-c", "tail -f /dev/null"]

构建镜像

docker build -t myxinference:latest .

参照 https://inference.readthedocs.io/zh-cn/latest/getting_started/using_docker_image.html#mount-your-volume-for-loading-and-saving-models 部署docker服务

另外,如果使用huggingface的话,建议使用 https://hf-mirror.com/ 镜像(记得docker部署时设置HF_ENDPOINT环境变量)。

以下假设部署后的服务地址为 http://localhost:9997

部署自定义 embedding 模型

准备embedding模型自定义JSON文件

创建文件夹custom_models/embedding

mkdir -p custom_models/embedding

然后创建以下模型自定义JSON文件:

360Zhinao-search.json:

{
    "model_name": "360Zhinao-search",
    "dimensions": 1024,
    "max_tokens": 512,
    "language": ["en", "zh"],
    "model_id": "qihoo360/360Zhinao-search",
    "model_format": "pytorch"
}

gte-Qwen2-7B-instruct.json

{
    "model_name": "gte-Qwen2-7B-instruct",
    "dimensions": 4096,
    "max_tokens": 32768,
    "language": ["en", "zh"],
    "model_id": "Alibaba-NLP/gte-Qwen2-7B-instruct",
    "model_format": "pytorch"
}

zpoint_large_embedding_zh.json:

{
    "model_name": "zpoint_large_embedding_zh",
    "dimensions": 1792,
    "max_tokens": 512,
    "language": ["zh"],
    "model_id": "iampanda/zpoint_large_embedding_zh",
    "model_format": "pytorch"
}

注意:对于下载到本地的模型可以设置 model_uri参数,例如 “[file:///path/to/llama-2-7b](file:///path/to/llama-2-7b)”。

注册自定义 embedding 模型

xinference register --model-type embedding --file custom_models/embedding/360Zhinao-search.json --persist --endpoint http://localhost:9997

xinference register --model-type embedding --file custom_models/embedding/gte-Qwen2-7B-instruct.json --persist --endpoint http://localhost:9997

xinference register --model-type embedding --file custom_models/embedding/zpoint_large_embedding_zh.json --persist --endpoint http://localhost:9997

启动自定义 embedding 模型

xinference launch --model-type embedding --model-name gte-Qwen2-7B-instruct --model-engine transformers  --model-format pytorch --endpoint http://localhost:9997

xinference launch --model-type embedding --model-name 360Zhinao-search --model-engine transformers  --model-format pytorch --endpoint http://localhost:9997

xinference launch --model-type embedding --model-name zpoint_large_embedding_zh --model-engine transformers  --model-format pytorch --endpoint http://localhost:9997

启动bge-m3和bge-reranker-base模型

bge-m3和bge-reranker-base是比较常用的embedding模型和reranking模型。

xinference launch --model-name bge-m3 --model-type embedding --endpoint http://localhost:9997

xinference launch --model-name bge-reranker-base --model-type rerank --endpoint http://localhost:9997

curl调用测试

embedding:

curl http://localhost:9997/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "360Zhinao-search",
    "encoding_format": "float"
  }'

reranking:

curl http://localhost:9997/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
  "model": "bge-reranker-base",
  "query": "I love you",
  "documents": [
    "I hate you",
    "I really like you",
    "天空是什么颜色的",
    "黑芝麻味饼干"
  ],
  "top_n": 3
}'
  • 20
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值