Xinference：大模型部署与分布式推理框架（四）集成LoRA、部署其他模型——视觉模型、Embedding模型、Rerank模型、图像模型

最新推荐文章于 2025-05-15 23:15:19 发布

大模型面微调_

最新推荐文章于 2025-05-15 23:15:19 发布

阅读量2.4k

点赞数 22

CC 4.0 BY-SA版权

文章标签：分布式 embedding 人工智能大模型 AI大模型大模型部署 LoRA

本文链接：https://blog.csdn.net/Code1994/article/details/142593292

四、集成LoRA

Xinference 可以在启动 LLM 和 image 模型时连带一个 LoRA 微调模型用以辅助基础模型。

1、启动时集成LoRA

Xinference目前不会涉及管理 LoRA 模型。用户需要首先下载对应的 LoRA 模型，然后将模型存储路径提供给 Xinference 。

xinference launch <options>
--lora-modules <lora_name1> <lora_model_path1>
--lora-modules <lora_name2> <lora_model_path2>
--image-lora-load-kwargs <load_params1> <load_value1>
--image-lora-load-kwargs <load_params2> <load_value2>
--image-lora-fuse-kwargs <fuse_params1> <fuse_value1>
--image-lora-fuse-kwargs <fuse_params2> <fuse_value2>

from xinference.client import Client

client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")

lora_model1={'lora_name': <lora_name1>, 'local_path': <lora_model_path1>}
lora_model2={'lora_name': <lora_name2>, 'local_path': <lora_model_path2>}
lora_models=[lora_model1, lora_model2]
image_lora_load_kwargs={'<load_params1>': <load_value1>, '<load_params2>': <load_value2>},
image_lora_fuse_kwargs={'<fuse_params1>': <fuse_value1>, '<fuse_params2>': <fuse_value2>}

peft_model_config = {
"image_lora_load_kwargs": image_lora_load_params,
"image_lora_fuse_kwargs": image_lora_fuse_params,
"lora_list": lora_models
}

client.launch_model(
    <other_options>,
    peft_model_config=peft_model_config
)

注意： image_lora_load_kwargs和image_lora_fuse_kwargs 选项只应用于 image 模型。它们对应于 diffusers 库的 load_lora_weights 和 fuse_lora 接口中的额外参数。如果启动的是 LLM 模型，则无需设置这些选项。

2、应用时集成LoRA

对于大语言模型，使用时指定其中一个 lora 。具体地，在 generate_config 参数中配置 lora_name 参数。lora_name 对应 launch 过程中你的配置。

from xinference.client import Client

client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")
model = client.get_model("<model_uid>")
model.chat(
    "<prompt>",
    <other_options>,
    generate_config={"lora_name": "<your_lora_name>"}
)

五、部署其他模型

注意：可能由于Xinference版本或者与模型不完全适配会出现一些问题，可选择降低Xinference版本或更换类似模型。相信Xinference会越来越完善。

1、视觉模型

1）部署

视觉模型是指用于处理和分析视觉数据（如图像和视频）的机器学习或深度学习模型。这些模型的主要目标是理解和解释视觉信息，执行多种任务，包括图像分类、目标检测、图像分割、图像生成等。

可以让模型接收图像并回答有关它们的问题。

视觉模型部署方式与LLM模型部署大同小异，首先点击Launch Model菜单，在LANGUAGE MODELS标签下选择多模态模型。

输入关键词以搜索需要部署的模型。这里以先过滤模型，再搜索选择glm-4v模型为例。

填写部署模型相关参数，执行部署操作

部署完成，查看运行的模型

2）使用Web

使用图片和文字与视觉模型进行对话

3）使用API

模型可以通过两种主要方式获取图像：通过传递图像的链接或直接在请求中传递 base64 编码的图像。

1.使用OpenAI

import openai

client = openai.Client(
    api_key="cannot be empty",
    base_url=f"http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1"
)
response = client.chat.completions.create(
    model="<MODEL_UID>",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "http://xxxx.jpg",
                    },
                },
            ],
        }
    ],
)
print(response.choices[0])

2.上传Base64编码的图片

import openai
import base64

# Function to encode the image
def encode_image(image_path):
with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the base64 string
b64_img = encode_image(image_path)

client = openai.Client(
    api_key="cannot be empty",
    base_url=f"http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1"
)
response = client.chat.completions.create(
    model="<MODEL_UID>",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What’s in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{b64_img}",
                    },
                },
            ],
        }
    ],
)
print(response.choices[0])

2、Embedding模型

1）部署

Embedding模型是一种用于将高维数据（如文本、图像或其他类型的数据）转换为低维向量表示的模型。这种表示方式能够捕捉数据的语义和结构信息，使得相似的对象在向量空间中距离更近。

文本嵌入用于量化不同文本之间的相关性。它们可以应用于各种应用程序，包括搜索、聚类、推荐、异常检测、多样性度量和分类。

嵌入是一组浮点数的向量。两个向量之间的接近程度可以作为它们相似性的指标。距离越小表示相关性越高，而距离越大则表示相关性降低。

首先点击Launch Model菜单，在Embedding Models标签下选择嵌入模型。输入关键词以搜索需要部署的模型，这里搜索选择bge-base-zh-v1.5模型为例。

对于模型参数，几乎不需要设置，直接部署模型即可。

等待部署、运行成功

2）使用API

使用Curl调用API接口

curl -X 'POST' \
  'http://localhost:9997/v1/embeddings' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "model": "bge-base-zh-v1.5",
  "input": "你好啊"
}'

Embedding模型响应结果：

{
"object":"list","model":"bge-base-zh-v1.5-1-0",
"data":[{"index":0,"object":"embedding",
"embedding":[0.029834920540452003,-0.019862590357661247,.......,-0.006424838211387396,0.012447659857571125,-0.05162930488586426]}],
"usage":{"prompt_tokens":37,"total_tokens":37}
}

import openai

client = openai.Client(
  api_key="cannot be empty",
  base_url="http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1"
)
client.embeddings.create(
  model=model_uid,
  input=["What is the capital of China?"]
)

from xinference.client import Client

client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")

model = client.get_model("<MODEL_UID>")
input = "What is the capital of China?"
model.create_embedding(input)

3、Rerank模型

1）部署

给定一个查询和一系列文档，Rerank 会根据与查询的语义相关性从最相关到最不相关对文档进行重新排序。在 Xinference 中，可以通过 Rerank 端点调用 Rerank 模型来对一系列文档进行排序。

首先点击Launch Model菜单，在Rerank Models标签下选择Rerank模型。输入关键词以搜索需要部署的模型，这里搜索选择bge-reranker-base模型为例。

对于模型参数，几乎不需要设置，直接部署模型即可。

等待模型部署、运行成功

2）使用API

可以通过cURL、OpenAI Client或Xinference的来尝试使用Rerank API：

curl -X 'POST' \
  'http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/rerank' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "<MODEL_UID>",
    "query": "A man is eating pasta.",
    "documents": [
        "A man is eating food.",
        "A man is eating a piece of bread.",
        "The girl is carrying a baby.",
        "A man is riding a horse.",
        "A woman is playing violin."
    ]
  }'

from xinference.client import Client

client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_HOST>")
model = client.get_model(<MODEL_UID>)

query = "A man is eating pasta."
corpus = [
    "A man is eating food.",
    "A man is eating a piece of bread.",
    "The girl is carrying a baby.",
    "A man is riding a horse.",
    "A woman is playing violin."
]
print(model.rerank(corpus, query))

4、图像模型

1）部署

图像模型是指用于处理、分析和理解图像数据的机器学习或深度学习模型。这些模型可以执行多种任务，如图像分类、目标检测、图像分割、图像生成等。

首先点击Launch Model菜单，在Image Models标签下选择嵌入模型。这里搜索选择stable-diffusion-v1.5模型为例。

对于模型参数，几乎不需要设置，直接部署模型即可。这里指定模型下载站点。

部署完成，查看运行的模型

2）使用Web

在这个Web界面可以使用文生图、图生图等功能

3）使用API

通过 cURL、OpenAI Client 或 Xinference 的方式尝试使用 Text-to-image API。

Images API提供了两种与图像交互的方法：

文生图端点根据文本从零开始创建图像。

图生图端点允许您生成给定图像的变体。

API 端点	OpenAI 兼容端点
Text-to-Image API	/v1/images/generations
Image-to-image API	/v1/images/variations

使用curl

curl -X 'POST' \
  'http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1/images/generations' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "<MODEL_UID>",
    "prompt": "an apple",
  }'

使用openai

import openai

client = openai.Client(
    api_key="cannot be empty",
    base_url="http://<XINFERENCE_HOST>:<XINFERENCE_PORT>/v1"
)
client.images.generate(
    model=<MODEL_UID>,
    prompt="an apple"
)

使用Xinference Client

from xinference.client import Client

client = Client("http://<XINFERENCE_HOST>:<XINFERENCE_PORT>")

model = client.get_model("<MODEL_UID>")
input_text = "an apple"
model.text_to_image(input_text)