GLM-4-Voice是由智谱公司开发的一款端到端的情感语音模型。
GLM-4-Voice由三个部分组成,包括Tokenizer、Decoder和9B模型,其中Tokenizer负责将连续语音转换为离散token,Decoder将token转换回连续语音输出,而9B模型则基于GLM-4-9B进行预训练和对齐,理解和生成离散化的语音。
GLM-4-Voice能够模拟不同的情感和语调,如高兴、悲伤、生气、害怕等情绪,并用合适的情绪语气进行回复,这使得它在情感表达上比传统的TTS技术更加自然和细腻。
GLM-4-Voice支持中英文语音以及多种中国方言,尤其擅长北京话、重庆话和粤语。
GLM-4-Voice还支持用户随时打断语音输出,并输入新的指令来调整对话内容,这使得对话更加灵活和符合日常交流的习惯。
GLM-4-Voice可以应用于客服系统、虚拟助手、教育软件等多个场景,提供更为人性化的服务体验。
github项目地址:https://github.com/THUDM/GLM-4-Voice。
一、环境安装
1、python环境
建议安装python版本在3.10以上。
2、pip库安装
pip install torch==2.3.0+cu118 torchvision==0.18.0+cu118 torchaudio==2.3.0 --extra-index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
3、glm-4-voice-9b模型下载:
git lfs install
git clone https://modelscope.cn/models/ZhipuAI/glm-4-voice-9b
4、glm-4-voice-tokenizer模型下载:
git lfs install
git clone https://modelscope.cn/models/ZhipuAI/glm-4-voice-tokenizer
5、glm-4-voice-decoder模型下载:
git lfs install
git clone https://modelscope.cn/models/ZhipuAI/glm-4-voice-decoder
二、功能测试
1、运行测试:
(1)python代码调用测试
import argparse
import json
from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
from transformers import AutoModel, AutoTokenizer
import torch
import uvicorn
from threading import Thread
from queue import Queue
class TokenStreamer:
def __init__(self, skip_prompt: bool = False, timeout=None):
self.skip_prompt = skip_prompt
self.token_queue = Queue()
self.stop_signal = object()
self.next_tokens_are_prompt = True
self.timeout = timeout
def put(self, value):
if value.dim() > 1:
value = value.squeeze(0)
if self.skip_prompt and self.next_tokens_are_prompt:
self.next_tokens_are_prompt = False
return
for token in value.tolist():
self.token_queue.put(token)
def end(self):
self.token_queue.put(self.stop_signal)
def __iter__(self):
return self
def __next__(self):
value = self.token_queue.get(timeout=self.timeout)
if value is self.stop_signal:
raise StopIteration
return value
class ModelWorker:
def __init__(self, model_path, device='cuda'):
self.device = device
self.tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
self.model = AutoModel.from_pretrained(model_path, trust_remote_code=True).to(device).eval()
@torch.inference_mode()
def generate_stream(self, params):
prompt = params["prompt"]
temperature = float(params.get("temperature", 1.0))
top_p = float(params.get("top_p", 1.0))
max_new_tokens = int(params.get("max_new_tokens", 256))
inputs = self.tokenizer([prompt], return_tensors="pt").to(self.device)
streamer = TokenStreamer(skip_prompt=True)
thread = Thread(target=self.model.generate, kwargs={
'inputs': inputs,
'max_new_tokens': max_new_tokens,
'temperature': temperature,
'top_p': top_p,
'streamer': streamer
})
thread.start()
for token_id in streamer:
yield json.dumps({"token_id": token_id, "error_code": 0}) + "\n"
def generate_stream_gate(self, params):
try:
yield from self.generate_stream(params)
except Exception as e:
print("Caught Unknown Error:", e)
yield json.dumps({"text": "Server Error", "error_code": 1}) + "\n"
app = FastAPI()
@app.post("/generate_stream")
async def generate_stream_endpoint(request: Request):
params = await request.json()
return StreamingResponse(worker.generate_stream_gate(params), media_type="application/json")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, default="localhost")
parser.add_argument("--port", type=int, default=10000)
parser.add_argument("--model-path", type=str, default="glm-4-voice-9b")
args = parser.parse_args()
worker = ModelWorker(args.model_path, device='cuda')
uvicorn.run(app, host=args.host, port=args.port, log_level="info")
未完......
更多详细的欢迎关注:杰哥新技术