CosyVoice：重塑声音，跨越语种的自然声音体验！

算家计算

已于 2024-07-31 16:41:20 修改

阅读量2.4k

点赞数 22

分类专栏：模型构建文章标签：人工智能 AIGC

于 2024-07-31 15:57:47 首次发布

本文链接：https://blog.csdn.net/SJJS_1/article/details/140824543

版权

模型构建专栏收录该内容

169 篇文章

订阅专栏

一、介绍

CosyVoice 是一个开源的超强 TTS（‌文本转语音）‌模型，‌它支持多种生成模式，‌具有极强的语音自然可控性。‌

二、特点

语音合成：能够将文本转换为自然流畅的语音输出。
多语种支持：支持多种语言的语音合成，例如英语、中文等。
个性化调整：可能支持音色、语速等参数的调整，以实现个性化的语音输出。

可应用于多个场景，如

辅助技术：可用于语音助手、自动电话系统、无障碍辅助等场景，帮助用户更便捷地获取信息。
创意内容创作：艺术家、内容创作者可以利用其生成独特的声音效果。
教育和培训：用于创建教育内容或培训材料的语音配音。

三、部署流程

1. 克隆并安装

克隆仓库

git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git
# If you failed to clone submodule due to network failures, please run following command until success
cd CosyVoice
git submodule update --init --recursive

创建 conda 环境

conda create -n cosyvoice python=3.8
conda activate cosyvoice
# pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform.
conda install -y -c conda-forge pynini==2.1.5
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com

# If you encounter sox compatibility issues
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel

2. 模型下载

# git模型下载，请确保已安装git lfs
mkdir -p pretrained_models
git clone https://www.modelscope.cn/iic/CosyVoice-300M.git pretrained_models/CosyVoice-300M
git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git pretrained_models/CosyVoice-300M-SFT
git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git pretrained_models/CosyVoice-300M-Instruct
git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git pretrained_models/CosyVoice-ttsfrd

四、网页演示

# change iic/CosyVoice-300M-SFT for sft inference, or iic/CosyVoice-300M-Instruct for instruct inference
python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M