用 Python 调用 CosyVoice 模型，实现情感化语音合成的完整教程

曦紫沐

已于 2025-05-07 14:53:45 修改

阅读量1.1k

点赞数 30

分类专栏：语音模型文章标签： CosyVoice 语音合成 Python编程

于 2025-05-07 14:46:43 首次发布

本文链接：https://blog.csdn.net/qq_41797451/article/details/147762969

版权

语音模型专栏收录该内容

4 篇文章

订阅专栏

👋 大家好，欢迎来到我的技术分享频道！今天我们将一起探索一个非常有趣且实用的技术：使用 Python 脚本调用 CosyVoice 模型，生成带情绪的语音文件。如果你对 AI 语音合成感兴趣，或者正在寻找一款支持情感表达的 TTS 工具，那么这篇博客一定会对你有所帮助！

🧠 什么是 CosyVoice？

CosyVoice 是由 FunAudioLLM 团队开发的一款轻量级文本到语音（TTS）模型，它不仅能够将文字转化为自然流畅的语音，还支持多种情感表达（如 happy、sad、neutral），从而让语音更具“人情味”。

这使得 CosyVoice 特别适合以下应用场景：

虚拟助手
有声读物 / 故事朗读
视频配音
客服语音播报
游戏角色对话等

📦 本文目标

我们将会讲解并提供一个完整的 Python 脚本，它可以：

封装对 CosyVoice API 的调用
支持指定不同声音角色和情绪
自动创建输出目录
保存为 MP3 格式
包含异常处理与执行时间统计

🧾 完整脚本展示

import requests
import os
import argparse
import time

def generate_tts(text, output_path, voice="FunAudioLLM/CosyVoice2-0.5B:alex", emotion="happy"):
    """
    Generate text-to-speech audio using CosyVoice and save to specified path.
    
    Args:
        text (str): The text to convert to speech
        output_path (str): Path where to save the audio file
        voice (str): Voice ID to use
        emotion (str): Emotion to apply (e.g., "happy", "sad", "neutral")
    
    Returns:
        bool: True if successful, False otherwise
    """
    start_time = time.time()
    os.makedirs(os.path.dirname(os.path.abspath(output_path)), exist_ok=True)
    formatted_input = f"Can you say it with a {emotion} emotion? <|endofprompt|>{text}"
    
    url = "https://api.siliconflow.cn/v1/audio/speech"
    payload = {
        "model": "FunAudioLLM/CosyVoice2-0.5B",
        "input": formatted_input,
        "voice": voice,
        "response_format": "mp3",
        "sample_rate": 32000,
        "stream": False,
        "speed": 1,
        "gain": 0
    }
    headers = {
        "Authorization": "Bearer <api key>",
        "Content-Type": "application/json"
    }
    
    try:
        print(f"Generating audio for text: '{text}'")
        response = requests.post(url, json=payload, headers=headers)
        
        if response.status_code == 200:
            with open(output_path, 'wb') as audio_file:
                audio_file.write(response.content)
            
            generation_time = time.time() - start_time
            print(f"Audio saved successfully to: {output_path}")
            print(f"Generation time: {generation_time:.2f} seconds")
            return True
        else:
            generation_time = time.time() - start_time
            print(f"Error: API request failed with status code {response.status_code}")
            print(f"Response: {response.text}")
            print(f"Time elapsed: {generation_time:.2f} seconds")
            return False
    except Exception as e:
        generation_time = time.time() - start_time
        print(f"Exception occurred: {str(e)}")
        print(f"Time elapsed: {generation_time:.2f} seconds")
        return False

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Generate TTS audio using CosyVoice')
    parser.add_argument('--text', type=str, required=True, help='Text to convert to speech')
    parser.add_argument('--output', type=str, required=True, help='Path to save the audio file')
    parser.add_argument('--voice', type=str, default="FunAudioLLM/CosyVoice2-0.5B:alex", help='Voice ID to use')
    parser.add_argument('--emotion', type=str, default="happy", help='Emotion to apply (e.g., happy, sad, neutral)')
    
    args = parser.parse_args()
    
    generate_tts(args.text, args.output, args.voice, args.emotion)

# Available options:  
# FunAudioLLM/CosyVoice2-0.5B:alex,  
# FunAudioLLM/CosyVoice2-0.5B:anna,  
# FunAudioLLM/CosyVoice2-0.5B:bella,  
# FunAudioLLM/CosyVoice2-0.5B:benjamin,  
# FunAudioLLM/CosyVoice2-0.5B:charles,  
# FunAudioLLM/CosyVoice2-0.5B:claire,  
# FunAudioLLM/CosyVoice2-0.5B:david,
# FunAudioLLM/CosyVoice2-0.5B:diana

📌 使用方式详解

1. 函数调用示例：

from cosyvoice import generate_tts

# 示例文本
text_to_speak = "今天的天气非常好，我想出去散步。"
output_file_path = "./output/my_speech.mp3"

print("开始生成音频...")
result = generate_tts(text_to_speak, output_file_path)
if result:
    print("音频生成成功！")
else:
    print("音频生成失败，请检查错误信息。")

# Or with custom voice and emotion
# generate_tts(
#     text_to_speak, 
#     "./output/custom_speech.mp3",
#     voice="FunAudioLLM/CosyVoice2-0.5B:alex",
#     emotion="neutral"
# )

2. 命令行调用示例：

python tts_script.py --text "你好，我是你的语音助手。" --output ./output/demo.mp3 --voice FunAudioLLM/CosyVoice2-0.5B:anna --emotion sad

⚙️ 参数说明

参数名	类型	默认值	描述
`text`	str	无	需要转换成语音的文字
`output`	str	无	输出音频文件路径
`voice`	str	alex	可选角色：alex, anna, bella 等
`emotion`	str	happy	可选情绪：happy, sad, neutral