CosyVoice部署和测试

CosyVoice是多语言大型语音生成模型,提供推理、训练和部署全流程能力。

1. 安装

1.1 下载源码

git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
# If you failed to clone submodule due to network failures, please run following command until success
cd CosyVoice
git submodule update --init --recursive

1.2 安装环境

conda create -n cosyvoice python=3.8
conda activate cosyvoice
# pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform.
conda install -y -c conda-forge pynini==2.1.5

cd CosyVoice
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com

# If you encounter sox compatibility issues
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel

1.3 下载模型

# git模型下载,请确保已安装git lfs
mkdir -p pretrained_models
git clone https://www.modelscope.cn/iic/CosyVoice-300M.git pretrained_models/CosyVoice-300M
git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git pretrained_models/CosyVoice-300M-SFT
git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git pretrained_models/CosyVoice-300M-Instruct
git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd

# SDK模型下载
from modelscope import snapshot_download
snapshot_download('iic/CosyVoice-300M', local_dir='pretrained_models/CosyVoice-300M')
snapshot_download('iic/CosyVoice-300M-SFT', local_dir='pretrained_models/CosyVoice-300M-SFT')
snapshot_download('iic/CosyVoice-300M-Instruct', local_dir='pretrained_models/CosyVoice-300M-Instruct')
snapshot_download('iic/CosyVoice-ttsfrd', local_dir='pretrained_models/CosyVoice-ttsfrd')

2.基本用法

对于零样本/跨语言推理,请使用CosyVoice-300M模型。对于 sft 推理,请使用CosyVoice-300M-SFT模型。对于指令推理,请使用CosyVoice-300M-Instruct模型。首先,添加third_party/Matcha-TTS到您的PYTHONPATH

export PYTHONPATH=third_party/Matcha-TTS

将以下代码拷贝到test.py中,然后运行python test.py

from cosyvoice.cli.cosyvoice import CosyVoice
from cosyvoice.utils.file_utils import load_wav
import torchaudio

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-SFT')
# sft usage
print(cosyvoice.list_avaliable_spks())
# change stream=True for chunk stream inference
for i, j in enumerate(cosyvoice.inference_sft('你好,我是通义生成式语音大模型,请问有什么可以帮您的吗?', '中文女', stream=False)):
    torchaudio.save('sft_{}.wav'.format(i), j['tts_speech'], 22050)

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M')
# zero_shot usage, <|zh|><|en|><|jp|><|yue|><|ko|> for Chinese/English/Japanese/Cantonese/Korean
prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000)
for i, j in enumerate(cosyvoice.inference_zero_shot('收到好友从远方寄来的生日礼物,那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐,笑容如花儿般绽放。', '希望你以后能够做的比我还好呦。', prompt_speech_16k, stream=False)):
    torchaudio.save('zero_shot_{}.wav'.format(i), j['tts_speech'], 22050)
# cross_lingual usage
prompt_speech_16k = load_wav('cross_lingual_prompt.wav', 16000)
for i, j in enumerate(cosyvoice.inference_cross_lingual('<|en|>And then later on, fully acquiring that company. So keeping management in line, interest in line with the asset that\'s coming into the family is a reason why sometimes we don\'t buy the whole thing.', prompt_speech_16k, stream=False)):
    torchaudio.save('cross_lingual_{}.wav'.format(i), j['tts_speech'], 22050)

cosyvoice = CosyVoice('pretrained_models/CosyVoice-300M-Instruct')
# instruct usage, support <laughter></laughter><strong></strong>[laughter][breath]
for i, j in enumerate(cosyvoice.inference_instruct('在面对挑战时,他展现了非凡的<strong>勇气</strong>与<strong>智慧</strong>。', '中文男', 'Theo \'Crimson\', is a fiery, passionate rebel leader. Fights with fervor for justice, but struggles with impulsiveness.', stream=False)):
    torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], 22050)

3.网页演示

# change iic/CosyVoice-300M-SFT for sft inference, or iic/CosyVoice-300M-Instruct for instruct inference
python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M

<think>好的,我需要帮助用户找到在Windows系统上部署CosyVoice的教程或文档。用户之前的问题是关于这个的,现在需要详细步骤。 首先,回顾已有的引用。引用[2]提到了在Windows上以管理员身份运行PowerShell,使用在线安装命令,设置清华的镜像源,这可能是安装依赖的步骤。引用[4]提到了HTTP请求调用API的方法,但用户需要的是部署,不是调用,所以可能不太相关。引用[1]提到了Web演示页面,可能部署后可以通过这个进行测试。 接下来,我需要整理引用中的信息,结合可能的常见部署步骤。用户可能需要安装依赖、下载代码、配置环境、运行服务等步骤。引用[2]中的命令是关键,应该详细说明。另外,安装完成后可能需要启动服务,如引用[1]中的Web演示。 还需要检查是否有其他步骤,比如环境变量设置、权限问题,或者是否需要安装Python、Git等工具。可能还需要克隆仓库,安装PyTorch等依赖,但引用中没有提到,可能需要根据一般经验补充。 用户可能需要注意事项,比如网络问题、依赖安装时间长,或者需要管理员权限。引用[3]中的apt-get不适用于Windows,所以忽略。确保步骤适用于Windows,使用PowerShellpip等工具。 最后,生成相关问题,帮助用户解决可能遇到的后续问题,比如性能优化、常见错误处理等。</think>### 分步说明:Windows 系统部署 CosyVoice 教程 根据官方引用信息常见技术栈部署流程,整理步骤如下: 1. **环境准备** - 以管理员身份打开 PowerShell - 设置国内镜像源加速下载(清华大学源): ```powershell $env:INSTALL_INDEX_URL = "https://pypi.tuna.tsinghua.edu.cn/simple" ```[^2] 2. **执行在线安装** ```powershell Invoke-Expression (Invoke-WebRequest -Uri "https://get.gpustack.ai" -UseBasicParsing).Content ```[^2] *注:此过程会自动安装 Python、CUDA Toolkit 等依赖,网络不稳定时可能耗时 10-30 分钟* 3. **验证安装** - 安装完成后输入命令测试: ```powershell cosyvoice --version ``` 4. **启动 Web 演示(可选)** 通过浏览器访问本地服务: ```powershell cosyvoice web-demo ``` 即可体验 sft/zero_shot/cross_lingual/instruct 等多种推理模式[^1] 5. **API 调用准备** 配置服务端口后,可通过 HTTP POST 请求调用: ```powershell cosyvoice api-start ``` 请求示例参考格式: ```json { "data": [ "合成文本内容", "音色类型", "语言类型", "", null, null, "", 0, "false", 1 ] }[^4] ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值