FreeSWITCH 简单图形化界面33 - 使用FunASR把通话录音转成文字

最新推荐文章于 2025-06-20 11:53:30 发布

原创最新推荐文章于 2025-06-20 11:53:30 发布 · 1.1k 阅读

6 ·

CC 4.0 BY-SA版权

文章标签：

#FreeSWITCH #freeswitch #ippbx

freeswitch1.10.11 同时被 3 个专栏收录

50 篇文章

订阅专栏

freeswitch1.10.10

44 篇文章

订阅专栏

freeswitch1.10.12

21 篇文章

订阅专栏

FreeSWITCH 简单图形化界面33 -使用FunASR把通话录音转成文字

测试环境
1、FunASR简介
2、安装FunASR
3、示例代码
4、测试代码

测试环境

http://myfs.f3322.net:8020/
用户名：admin，密码：admin

FreeSWITCH界面安装参考：https://blog.csdn.net/jia198810/article/details/137820796

1、FunASR简介

FunASR是一款基于深度学习的语音识别工具，它能够将语音信号快速转换为文字，为用户提供便捷的语音交互体验。该工具不仅支持多种语言识别，还具备高准确率、低延迟和灵活定制等特点，广泛应用于语音输入、语音转写、语音助手等场景。

开源地址：https://github.com/modelscope/FunASR/blob/main/README_zh.md

2、安装FunASR

参考使用手册：
安装funasr之前，确保已经安装了下面依赖环境:

python>=3.8
torch>=1.13
torchaudio

pip安装

pip3 install -U funasr

或者从源代码安装

git clone https://github.com/alibaba/FunASR.git && cd FunASR
pip3 install -e ./

3、示例代码

https://github.com/modelscope/FunASR/blob/main/README_zh.md里的示例代码

# https://github.com/modelscope/FunASR/blob/main/README_zh.md
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "iic/SenseVoiceSmall"

model = AutoModel(
    model=model_dir,
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cuda:0",
)

# en
res = model.generate(
    input=f"{model.model_path}/example/en.mp3",
    cache={},
    language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  #
    merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

4、测试代码

可以在https://www.modelscope.cn/models?page=1&tasks=auto-speech-recognition下载各种各样的模型库。
在这里插入图片描述
模型库都有测试用例，我们使用其中一个，进行录音识别，代码如下：

#!/usr/local/python310/bin/python3.10
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess
"""
多人语音识别
"""
funasr_model = AutoModel(model="iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch",
                        vad_model="damo/speech_fsmn_vad_zh-cn-16k-common-pytorch",
                        punc_model="damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch",
                        spk_model="damo/speech_campplus_sv_zh-cn_16k-common",
                        )
res = funasr_model.generate(input=f"/usr/local/freeswitch/recording/17122111021-1006.mp3", batch_size_s=300)
print(res[0]['text'])
data=res[0]['sentence_info']
for item in data:
    spk = item['spk']
    text = item['text']
    start = item['start']
    end = item['end']
    print(f"讲话人{spk+1}: {text}")

识别结果：

root@xiaojia-X9SCI-X9SCA /h/xiaojia [1]# /usr/local/python310/bin/python asr2.py
Key Conformer already exists in model_classes, re-register
Key Linear already exists in adaptor_classes, re-register
Key TransformerDecoder already exists in decoder_classes, re-register
Key LightweightConvolutionTransformerDecoder already exists in decoder_classes, re-register
Key LightweightConvolution2DTransformerDecoder already exists in decoder_classes, re-register
Key DynamicConvolutionTransformerDecoder already exists in decoder_classes, re-register
Key DynamicConvolution2DTransformerDecoder already exists in decoder_classes, re-register
funasr version: 1.1.14.
Check update of funasr, and it would cost few times. You may disable it by set `disable_update=True` in AutoModel
You are using the latest version of funasr-1.1.14
2024-11-16 11:19:40,884 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
/usr/local/python310/lib/python3.10/site-packages/funasr/train_utils/load_pretrained_model.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ori_state = torch.load(path, map_location=map_location)
2024-11-16 11:19:54,010 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2024-11-16 11:19:54,856 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
2024-11-16 11:19:59,347 - modelscope - WARNING - Using branch: master as version is unstable, use with caution
Detect model requirements, begin to install it: /root/.cache/modelscope/hub/damo/speech_campplus_sv_zh-cn_16k-common/requirements.txt
install model requirements successfully
/usr/local/python310/lib/python3.10/site-packages/funasr/train_utils/load_pretrained_model.py:39: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ori_state = torch.load(path, map_location=map_location)
rtf_avg: 0.177: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.89it/s]
  0%|                                                                                                                              | 0/1 [00:00<?, ?it/s/usr/local/python310/lib/python3.10/site-packages/funasr/models/paraformer/model.py:251: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with autocast(False):
[W1116 11:20:16.032852714 NNPACK.cpp:61] Could not initialize NNPACK! Reason: Unsupported hardware.
rtf_avg: 0.649: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.60it/s]
rtf_avg: 0.050: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 13.26it/s]
rtf_avg: 0.668: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.56it/s]
rtf_avg: 0.042: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 15.65it/s]
rtf_avg: 0.516: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.53it/s]
rtf_avg: 0.043: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 15.26it/s]
rtf_avg: 0.360: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.49it/s]
rtf_avg: 0.037: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 17.94it/s]
rtf_avg: 0.246: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.21it/s]
rtf_avg: 0.028: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 23.52it/s]
rtf_avg: 0.235: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.22it/s]
rtf_avg: 0.024: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 27.44it/s]
rtf_avg: 0.208: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.21it/s]
rtf_avg: 0.029: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 22.77it/s]
rtf_avg: 0.205: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.15it/s]
rtf_avg: 0.022: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 30.34it/s]
rtf_avg: 0.166: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.06s/it]
rtf_avg: 0.024: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 27.22it/s]
rtf_avg: 0.140: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.49s/it]
rtf_avg: 0.023: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 28.33it/s]
rtf_avg: 0.129: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.61s/it]
rtf_avg: 0.021: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 31.15it/s]
rtf_avg: -0.159: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  6.26it/s]
rtf_avg: 0.204, time_speech:  62.956, time_escape: 12.815: 100%|███████████████████████████████████████████████████████████| 1/1 [00:12<00:00, 12.85s/it]
 I机驱动没有啊，驱动可以，我联网啊，你那直接有吧？唉，你好，哎，我想问一下，那个就是我们这儿有一台你们那个调度调度服务器。第三年那个它坏了，我就想问一下，你们那边这个能修吧，您哪个公司呃，我们是那个就是我是一个集成，就是给那个人方做服务的，然后叫京东名字，然后我们那个设备是在广元人防办。嗯，你那个网口那个转接到头是吧？嗯，你保里面薄里面，你哪公司我查一下，我让我们销组查一下哦，我们没买过你们东西哦，有u盘吗？也是。那您有你想修那个设备是吧？嗯，对哦，行，那我让我们销售联系你吧，然给您报个价。嗯，好好好好好。
讲话人1:  i 机驱动没有啊，
讲话人1: 驱动可以，
讲话人2: 我联网啊，
讲话人2: 你那直接有吧？
讲话人3: 唉，
讲话人3: 你好，
讲话人3: 哎，
讲话人3: 我想问一下，
讲话人3: 那个就是我们这儿有一台你们那个调度调度服务器。
讲话人3: 第三年那个它坏了，
讲话人3: 我就想问一下，
讲话人3: 你们那边这个能修吧，
讲话人4: 您哪个公司呃，
讲话人3: 我们是那个就是我是一个集成，
讲话人3: 就是给那个人方做服务的，
讲话人3: 然后叫京东名字，
讲话人3: 然后我们那个设备是在广元人防办。
讲话人2: 嗯，
讲话人2: 你那个网口那个转接到头是吧？
讲话人3: 嗯，
讲话人3: 你保里面薄里面，
讲话人4: 你哪公司我查一下，
讲话人4: 我让我们销组查一下哦，
讲话人3: 我们没买过你们东西哦，
讲话人1: 有 u 盘吗？
讲话人3: 也是。
讲话人3: 那您有你想修那个设备是吧？
讲话人3: 嗯，
讲话人3: 对哦，
讲话人4: 行，
讲话人4: 那我让我们销售联系你吧，
讲话人4: 然给您报个价。
讲话人3: 嗯，
讲话人3: 好好好好好。