引言
本文介绍基于 swift 的 Qwen-Audio-Chat 语音大模型离线使用指南(需要显存约16GB)。
配置环境
conda create -n swift python=3.10.16
conda activate swift
pip install ms-swift==3.2.0.post2
pip install vllm==0.7.3
pip install lmdeploy==0.7.1
pip install transformers==4.49.0
conda install ffmpeg=4.3 --channel conda-forge
在命令行调用大模型推理
MODELSCOPE_CACHE=./.cache/modelscope/hub CUDA_VISIBLE_DEVICES=0 swift infer \
--model Qwen/Qwen-Audio-Chat \
--infer_backend pt
<<< 你是谁?
我是来自达摩院的大规模语言模型,我叫通义千问。
--------------------------------------------------
<<< <audio>这是首什么样的音乐
Input an audio path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/music.wav
这是一首风格是Pop的音乐。
--------------------------------------------------
<<< <audio>这段语音说了什么
Input an audio path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav
这段语音中说了中文:"今天天气真好呀"。
--------------------------------------------------
<<< 这段语音是男生还是女生
根据音色判断,这段语音是男性。
--------------------------------------------------
参数解释:
MODELSCOPE_CACHE
:指定模型权重存放的路径。CUDA_VISIBLE_DEVICES
:指定显卡索引。model
:指定待部署的模型名称。可在支持的模型和数据集目录中查看所有支持的模型。infer_backend
:推理后端。
更多推理参数见这里。
在服务端部署大模型服务
MODELSCOPE_CACHE=./.cache/modelscope/hub CUDA_VISIBLE_DEVICES=0 swift deploy \
--model Qwen/Qwen-Audio-Chat \
--infer_backend pt \
--served_model_name Qwen-Audio-Chat
--port 8001
参数解释:
MODELSCOPE_CACHE
:指定模型权重存放的路径。CUDA_VISIBLE_DEVICES
:指定显卡索引。model
:指定待部署的模型名称。可在【支持的模型和数据集】目录中查看所有支持的模型。infer_backend
:推理后端。served_model_name
:指定部署后模型调用的别名。port
:端口号。默认为8000。
更多部署参数见这里。
在客户端调用大模型接口
- curl
- 输入样例
curl http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen-Audio-Chat",
"messages": [{"role": "user", "content": [
{"type": "audio", "audio": "http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav"},
{"type": "text", "text": "What does this audio say?"}
]}]
}'
- 输出样例
{
"model": "Qwen-Audio-Chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The audio says: \"今天天气真好呀\".",
"tool_calls": null
},
"finish_reason": "stop",
"logprobs": null
}
],
"usage": {
"prompt_tokens": 83,
"completion_tokens": 12,
"total_tokens": 95
},
"id": "chatcmpl-692050fbe65b4c06bc7872816f23f410",
"object": "chat.completion",
"created": 1741684676
}
- openai
- 代码样例
from openai import OpenAI
client = OpenAI(
api_key='EMPTY',
base_url=f'http://127.0.0.1:8001/v1',
)
model = client.models.list().data[0].id
print(f'model: {model}')
messages = [{'role': 'user', 'content': [
{'type': 'audio', 'audio': 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav'},
{'type': 'text', 'text': 'What does this audio say?'}
]}]
resp = client.chat.completions.create(model=model, messages=messages, max_tokens=512, temperature=0)
query = messages[0]['content']
response = resp.choices[0].message.content
print(f'query: {query}')
print(f'response: {response}')
- 输出样例
model: Qwen-Audio-Chat
query: [{'type': 'audio', 'audio': 'http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav'}, {'type': 'text', 'text': 'What does this audio say?'}]
response: The audio says: "今天天气真好呀".
- swift
- 代码样例
from swift.llm import InferRequest, InferClient, RequestConfig
from swift.plugin import InferStats
engine = InferClient(host='127.0.0.1', port=8001)
print(f'models: {engine.models}')
metric = InferStats()
request_config = RequestConfig(max_tokens=512, temperature=0)
# 这里使用了2个infer_request来展示batch推理
infer_requests = [
InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}]),
InferRequest(messages=[{'role': 'user', 'content': '<audio>What does this audio say?'}],
audios=['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav']),
]
resp_list = engine.infer(infer_requests, request_config, metrics=[metric])
print(f'response0: {resp_list[0].choices[0].message.content}')
print(f'response1: {resp_list[1].choices[0].message.content}')
print(metric.compute())
metric.reset()
- 输出样例
models: ['Qwen-Audio-Chat']
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.10it/s]
response0: I am a large language model created by DAMO Academy. I am called QianWen.
response1: The audio says: "今天天气真好呀".
{'num_prompt_tokens': 106, 'num_generated_tokens': 33, 'num_samples': 2, 'runtime': 1.8124737851321697, 'samples/s': 1.1034642356795, 'tokens/s': 18.20715988871175}
参考链接
https://swift.readthedocs.io/zh-cn/latest/Instruction/推理和部署.html