SenseVoice多语言语音理解模型之最新部署落地经验

置顶

杰说新技术

已于 2024-07-21 20:52:38 修改

阅读量6.1k

点赞数 18

分类专栏： AIGC 文章标签： AIGC 人工智能语言模型

于 2024-07-15 06:00:00 首次发布

本文链接：https://blog.csdn.net/m0_71062934/article/details/140423005

版权

SenseVoice是阿里云通义实验室开发的一款多语言音频基础模型，专注于高精度多语言语音识别、情感辨识和音频事件检测。

SenseVoice支持超过50种语言的识别，并且在中文和粤语上的识别效果优于Whisper模型，提升了50%以上。

SenseVoice具备强大的情感识别能力，能够检测音乐、掌声、笑声、哭声、咳嗽、喷嚏等多种常见人机交互事件。

SenseVoice模型在推理速度上表现出色，其小型模型SenseVoice-Small采用非自回归端到端框架，10秒音频的推理时间仅为70毫秒，比Whisper-large快15倍。

github项目地址：https://github.com/FunAudioLLM/SenseVoice。

一、环境安装

1、python环境

建议安装python版本在3.10以上。

2、库安装

pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install funasr-onnx gradio -i https://pypi.tuna.tsinghua.edu.cn/simple

为了方便音频处理，还需要安装ffmpeg，命令如下：

apt install ffmpeg

3、SenseVoiceSmall模型下载