PaddleSpeech是飞桨推出的一个开源语音处理工具包,提供了完整的端到端语音处理解决方案,包括语音识别(ASR)、语音合成(TTS)、语音增强和语音翻译等功能。
https://github.com/PaddlePaddle/PaddleSpeech
一、华为鲲鹏CPU验证
1.购买华为云虚拟私有云VPC和弹性云服务器ECS
详细流程参考:创建华为云弹性云服务器ECS流程_华为云创建的弹性云服务-CSDN博客
已有可跳过此步骤。
ECS配置概要:
基础配置
计费模式: 按需计费
区域/可用区: 华北-北京四 | 随机分配
实例
规格: 鲲鹏通用计算增强型 | 8vCPUs | 16GiB | kc1.2xlarge.2
操作系统
镜像: Huawei Cloud EulerOS 2.0 64bit for kAi2p with HDK 23.0.1 and CANN 7.0.0.1 RC
存储与备份
系统盘: 通用型SSD, 200GiB
网络
虚拟私有云: 选择已有VPC
源/目的检查: 开启
安全组
default
公网访问
弹性公网IP: 全动态BGP | 按流量计费 | 1 Mbit/s
云服务器管理
云服务器名称: 自定义
登录凭证: 密码
其他设为默认。
2.环境搭建
使用PaddleSpeech最难的部分就在于配置环境和安装,因为README和setup.py太久没有维护,当中提供的下载路径和依赖版本很多已经弃用失效,Issues中提问也得不到有效回复。安装编译中遇到了各种问题,只能根据报错提示结合Issues中的讨论来解决。
以下是经过验证可以正常运行的环境配置。
从 GitHub 拉取代码。
git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech
conda创建python3.9环境。
conda create --name PaddleSpeech python=3.9
conda activate PaddleSpeech
README相关依赖中推荐paddlepaddle<=2.5.1,2.5.1版本已失效,也可以使用2.4.2版本,但后续为了部署在昇腾NPU上需要安装适配的3.0.0b2版本。
python -m pip install paddlepaddle==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
为了能够与paddlepaddle版本对应,使用paddlespeech的develop版本,通过源码编译方式安装项目,需要安装conda 依赖。
conda install -y -c conda-forge sox libsndfile swig bzip2
如果出现报错:
Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:
- sox
conda渠道没有适用版本的依赖,用pip安装即可。
pip install sox
安装依赖,进行编译。
pip install pytest-runner
# 请确保目前处于PaddleSpeech项目的根目录
pip install .
编译这一步各种报错,setup.py文件里面的很多依赖版本都有问题,按照报错先都改为不指定版本。例如:
ERROR: Ignored the following versions that require a different python version: 1.14.0 Requires-Python >=3.10; 1.14.0rc1 Requires-Python >=3.10; 1.14.0rc2 Requires-Python >=3.10; 1.14.1 Requires-Python >=3.10; 2.1.0 Requires-Python >=3.10; 2.1.0rc1 Requires-Python >=3.10; 2.1.1 Requires-Python >=3.10; 2.1.2 Requires-Python >=3.10; 2.1.3 Requires-Python >=3.10; 3.10.0rc1 Requires-Python >=3.10
ERROR: Could not find a version that satisfies the requirement opencc==1.1.6 (from paddlespeech) (from versions: 0.1, 0.2, 1.1.8, 1.1.9)
ERROR: No matching distribution found for opencc==1.1.6
base = [...
opencc==1.1.6 改为opencc
paddleaudio>=1.1.0 改为paddleaudio
...]
编译成功。
3.下载示例音频
PaddleSpeech支持 16k wav 格式音频,提供了测试音频示例,也可以使用自己准备的音频:
wget https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav
4.运行代码
新建run_asr.py文件,进行语音识别。
import paddle
from paddlespeech.cli.asr.infer import ASRExecutor
from paddlespeech.cli.text.infer import TextExecutor
print(f"Current device: {paddle.device.get_device()}")
# 语音识别
asr = ASRExecutor()
asr_result = asr(audio_file="zh.wav")
print(asr_result)
# 标点恢复
text_punc = TextExecutor()
result = text_punc(text=asr_result)
print(result)
运行提示缺少依赖kaldiio。
pip install kaldiio
再次运行虽然有报错但是模型能够正常推理输出,第一次运行需要下载模型会耗时比较久。
which: no ccache in (/root/miniconda3/envs/PaddleSpeech/bin:/root/miniconda3/condabin:/usr/local/Ascend/ascend-toolkit/latest/bin:/usr/local/Ascend/ascend-toolkit/latest/compiler/ccec_compiler/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin)
/root/miniconda3/envs/PaddleSpeech/lib/python3.9/site-packages/paddle/utils/cpp_extension/extension_utils.py:686: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
warnings.warn(warning_message)
/root/miniconda3/envs/PaddleSpeech/lib/python3.9/site-packages/_distutils_hack/__init__.py:31: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
warnings.warn(
Current device: cpu
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 499M/499M [00:34<00:00, 14.3MB/s]
WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.
Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.
To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: Could not find a version that satisfies the requirement paddlespeech_ctcdecoders (from versions: none)
ERROR: No matching distribution found for paddlespeech_ctcdecoders
2024-11-21 11:13:24.766 | INFO | paddlespeech.s2t.modules.ctc:<module>:45 - paddlespeech_ctcdecoders not installed!
W1121 11:13:24.876871 388779 dygraph_functions.cc:83253] got different data type, run type promotion automatically, this may cause data type been changed.
2024-11-21 11:13:24.943 | INFO | paddlespeech.s2t.modules.embedding:__init__:153 - max len: 5000
我认为跑步最重要的就是给我带来了身体健康
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 438M/438M [00:38<00:00, 11.5MB/s]
[2024-11-21 11:14:26,790] [ INFO] - Loading configuration file /root/.paddlespeech/models/ernie_linear_p7_wudao-punc-zh/1.0/ernie_linear_p7_wudao-punc-zh.tar/ckpt/model_config.json
[2024-11-21 11:14:26,791] [ INFO] - Loading weights file /root/.paddlespeech/models/ernie_linear_p7_wudao-punc-zh/1.0/ernie_linear_p7_wudao-punc-zh.tar/ckpt/model_state.pdparams
[2024-11-21 11:14:29,396] [ INFO] - Loaded weights file from disk, setting weights to model.
[2024-11-21 11:14:38,135] [ INFO] - All model checkpoint weights were used when initializing ErnieForTokenClassification.
[2024-11-21 11:14:38,135] [ INFO] - All the weights of ErnieForTokenClassification were initialized from the model checkpoint at /root/.paddlespeech/models/ernie_linear_p7_wudao-punc-zh/1.0/ernie_linear_p7_wudao-punc-zh.tar/ckpt.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ErnieForTokenClassification for predictions without further training.
[2024-11-21 11:14:38,158] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/ernie-1.0/tokenizer_config.json
[2024-11-21 11:14:38,159] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/ernie-1.0/special_tokens_map.json
我认为,跑步最重要的,就是给我带来了身体健康。
虽然不影响运行结果,但还是尽量分析解决一下这些警告:
-
which: no ccache in (/root/miniconda3/envs/PaddleSpeech/bin:/root/miniconda3/condabin:/usr/local/Ascend/ascend-toolkit/latest/bin:/usr/local/Ascend/ascend-toolkit/latest/compiler/ccec_compiler/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin) /root/miniconda3/envs/PaddleSpeech/lib/python3.9/site-packages/paddle/utils/cpp_extension/extension_utils.py:686: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message)
提示在当前环境中没有找到
ccache
。ccache
是一个编译缓存工具,可以显著加快重新编译的速度。如果不介意重新编译所有源文件的时间,可以选择忽略这个警告。如果希望提高编译速度,可以按照提示安装ccache
。conda install -c conda-forge ccache
-
/root/miniconda3/envs/PaddleSpeech/lib/python3.9/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn(
这个警告表示
setuptools
正在替换distutils
,并且在未来这种替换可能会失败,setuptools
项目中建议通过更新setuptools
来解决。WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip. Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue. To avoid this problem you can invoke Python with '-m pip' instead of running pip directly.
这个警告表示您正在使用的
pip
是通过一个旧的脚本包装器调用的,这在未来可能会导致问题。建议使用python -m pip
命令来调用pip。
python -m pip install --upgrade setuptools
按照提示更新setuptools,警告还是存在。
-
这个警告表示找不到满足要求的pspeech_ctcdecoders版本,pip安装也无法找到合适的版本。lssues中反馈这个包不是必须的,发现项目代码里有,还是尝试一下手动编译:ERROR: Could not find a version that satisfies the requirement paddlespeech_ctcdecoders (from versions: none) ERROR: No matching distribution found for paddlespeech_ctcdecoders
cd third_party/ctc_decoders bash setup.sh pip install --user .
重新运行run_asr.py文件:
5.构建本地示例应用
5.1查看弹性公网IP和设置开放端口
详细流程参考:查看华为云ECS弹性公网IP和设置开放端口-CSDN博客
查看后得到用于登录Web应用界面的IP地址。这里为http://113.44.138.39:7863/。
5.2使用gradio构建Web应用界面
安装gradio库。
pip install gradio
新建gradio_show.py文件内容为:
import gradio as gr
import paddle
from paddlespeech.cli.asr.infer import ASRExecutor
from paddlespeech.cli.text.infer import TextExecutor
print(f"Current device: {paddle.device.get_device()}")
# 初始化 ASR 和 TextExecutor
text_punc = TextExecutor()
asr = ASRExecutor()
def process_audio(audio_file):
# 进行语音识别
asr_result = asr(audio_file=audio_file)
print(f"ASR Result: {asr_result}")
# 进行文本加标点符号处理
result = text_punc(text=asr_result)
print(f"Punctuation Result: {result}")
return asr_result, result
# 创建 Gradio 接口
demo = gr.Interface(
fn=process_audio,
inputs=gr.Audio(type="filepath"), # 输入类型为音频文件路径
outputs=[
gr.Textbox(label="ASR Result"), # 输出 ASR 结果
gr.Textbox(label="Punctuation Result") # 输出加标点符号的结果
],
title="语音识别与文本加标点符号",
description="上传音频文件,进行语音识别并添加标点符号。"
)
# 启动 Gradio 应用
if __name__ == "__main__":
demo.launch(server_name="0.0.0.0", server_port=7863)
nohup 运行gradio_show.py,并在浏览器转到http://113.44.138.39:7863/打开服务页面,可以上传音频文件进行语音识别:
nohup python gradio_show.py &
二、部署到华为昇腾NPU
昇腾环境:
芯片类型:昇腾910B3
CANN版本:CANN 7.0.1.5
驱动版本:23.0.6
操作系统:Huawei Cloud EulerOS 2.0
1.查看NPU硬件信息
npu-smi info
如果 Health 状态为 OK,说明 NPU 和 CANN 正常运行。
2.NPU环境下运行代码
2.1 下载PaddlePaddle的NPU插件
PaddlePaddle框架适配昇腾NPU目前只支持CANN 8.0.RC1,需要升级CANN驱动版本,所以采用docker镜像。
拉取飞桨官方发布的昇腾 NPU 开发镜像,
docker pull registry.baidubce.com/device/paddle-npu:cann80RC1-ubuntu20-aarch64-gcc84-py39
启动容器,指定可见的 NPU 卡号
docker run -it --name paddlespeech -v $(pwd):/work \
--privileged --network=host --shm-size=128G -w=/work \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
-e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
registry.baidubce.com/device/paddle-npu:cann80RC1-ubuntu20-$(uname -m)-gcc84-py39 /bin/bash
安装paddle NPU 插件包
python -m pip install paddlepaddle==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
python -m pip install paddle-custom-npu==3.0.0b2 -i https://www.paddlepaddle.org.cn/packages/stable/npu/
检查当前安装版本
python -c "import paddle_custom_device; paddle_custom_device.npu.version()"
2.2 参照华为云上流程进行环境搭建
步骤参照ECS环境搭建。
其中conda依赖改为直接用系统包管理器安装:
apt-get install -y sox libsndfile1 swig bzip2 ccache
2.3 运行代码
运行run_ocr.py代码:
λ devserver-314b-1 /work/PaddleSpeech {develop} python run_asr.py
I1122 17:53:15.621927 46 init.cc:236] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.9/dist-packages/paddle_custom_device
I1122 17:53:15.621981 46 init.cc:145] Try loading custom device libs from: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1122 17:53:16.312119 46 custom_device.cc:1099] Succeed in loading custom runtime in lib: /usr/local/lib/python3.9/dist-packages/paddle_custom_device/libpaddle-custom-npu.so
I1122 17:53:16.319656 46 custom_kernel.cc:63] Succeed in loading 357 custom kernel(s) from loaded lib(s), will be used like native ones.
I1122 17:53:16.319842 46 init.cc:157] Finished in LoadCustomDevice with libs_path: [/usr/local/lib/python3.9/dist-packages/paddle_custom_device]
I1122 17:53:16.319882 46 init.cc:242] CustomDevice: npu, visible devices count: 1
/usr/local/lib/python3.9/dist-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
warnings.warn(
W1122 17:53:18.899689 46 ir_context.cc:305] custom_op.my_add_n op already registered.
W1122 17:53:18.899780 46 custom_operator.cc:969] Operator (my_add_n) has been registered.
W1122 17:53:18.899803 46 ir_context.cc:305] custom_op.update_inputs_ op already registered.
W1122 17:53:18.899811 46 custom_operator.cc:969] Operator (update_inputs) has been registered.
W1122 17:53:18.899827 46 ir_context.cc:305] custom_op.get_output_ op already registered.
W1122 17:53:18.899835 46 custom_operator.cc:969] Operator (get_output) has been registered.
W1122 17:53:18.899847 46 ir_context.cc:305] custom_op.set_stop_value_multi_ends op already registered.
W1122 17:53:18.899855 46 custom_operator.cc:969] Operator (set_stop_value_multi_ends) has been registered.
W1122 17:53:18.899875 46 ir_context.cc:305] custom_op.fused_blha_layer_op op already registered.
W1122 17:53:18.899883 46 custom_operator.cc:969] Operator (fused_blha_layer_op) has been registered.
W1122 17:53:18.899893 46 ir_context.cc:305] custom_op.remove_padding op already registered.
W1122 17:53:18.899900 46 custom_operator.cc:969] Operator (remove_padding) has been registered.
W1122 17:53:18.899911 46 ir_context.cc:305] custom_op.fused_get_rotary_embedding op already registered.
W1122 17:53:18.899920 46 custom_operator.cc:969] Operator (fused_get_rotary_embedding) has been registered.
W1122 17:53:18.899931 46 ir_context.cc:305] custom_op.fused_rope op already registered.
W1122 17:53:18.899940 46 ir_context.cc:305] custom_op.fused_rope_grad op already registered.
W1122 17:53:18.899947 46 custom_operator.cc:969] Operator (fused_rope) has been registered.
W1122 17:53:18.899957 46 ir_context.cc:305] custom_op.rms_norm_npu op already registered.
W1122 17:53:18.899967 46 ir_context.cc:305] custom_op.rms_norm_npu_grad op already registered.
W1122 17:53:18.899976 46 custom_operator.cc:969] Operator (rms_norm_npu) has been registered.
W1122 17:53:18.899988 46 ir_context.cc:305] custom_op.fused_mm_reduce_scatter op already registered.
W1122 17:53:18.899997 46 custom_operator.cc:969] Operator (fused_mm_reduce_scatter) has been registered.
W1122 17:53:18.900010 46 ir_context.cc:305] custom_op.fused_allgather_mm op already registered.
W1122 17:53:18.900018 46 custom_operator.cc:969] Operator (fused_allgather_mm) has been registered.
W1122 17:53:18.900029 46 ir_context.cc:305] custom_op.rebuild_padding_v2 op already registered.
W1122 17:53:18.900038 46 custom_operator.cc:969] Operator (rebuild_padding_v2) has been registered.
W1122 17:53:18.900050 46 ir_context.cc:305] custom_op.flash_attention_npu op already registered.
W1122 17:53:18.900063 46 ir_context.cc:305] custom_op.flash_attention_npu_grad op already registered.
W1122 17:53:18.900069 46 custom_operator.cc:969] Operator (flash_attention_npu) has been registered.
W1122 17:53:18.900079 46 ir_context.cc:305] custom_op.write_cache_kv_ op already registered.
W1122 17:53:18.900086 46 custom_operator.cc:969] Operator (write_cache_kv) has been registered.
W1122 17:53:18.900095 46 ir_context.cc:305] custom_op.rebuild_padding op already registered.
W1122 17:53:18.900102 46 custom_operator.cc:969] Operator (rebuild_padding) has been registered.
W1122 17:53:18.900112 46 ir_context.cc:305] custom_op.get_padding_offset op already registered.
W1122 17:53:18.900120 46 custom_operator.cc:969] Operator (get_padding_offset) has been registered.
W1122 17:53:18.900132 46 ir_context.cc:305] custom_op.fused_mm_allreduce op already registered.
W1122 17:53:18.900142 46 custom_operator.cc:969] Operator (fused_mm_allreduce) has been registered.
W1122 17:53:18.900151 46 ir_context.cc:305] custom_op.get_token_penalty_multi_scores_v2_ op already registered.
W1122 17:53:18.900159 46 custom_operator.cc:969] Operator (get_token_penalty_multi_scores_v2) has been registered.
W1122 17:53:18.900171 46 ir_context.cc:305] custom_op.get_padding_offset_v2 op already registered.
W1122 17:53:18.900178 46 custom_operator.cc:969] Operator (get_padding_offset_v2) has been registered.
W1122 17:53:18.900192 46 ir_context.cc:305] custom_op.lm_head op already registered.
W1122 17:53:18.900199 46 custom_operator.cc:969] Operator (lm_head) has been registered.
W1122 17:53:18.900211 46 ir_context.cc:305] custom_op.quant_int8 op already registered.
W1122 17:53:18.900219 46 custom_operator.cc:969] Operator (quant_int8) has been registered.
W1122 17:53:18.900231 46 ir_context.cc:305] custom_op.qkv_transpose_split op already registered.
W1122 17:53:18.900238 46 custom_operator.cc:969] Operator (qkv_transpose_split) has been registered.
W1122 17:53:18.900249 46 ir_context.cc:305] custom_op.save_with_output op already registered.
W1122 17:53:18.900256 46 custom_operator.cc:969] Operator (save_with_output) has been registered.
W1122 17:53:18.900266 46 ir_context.cc:305] custom_op.set_value_by_flags_and_idx op already registered.
W1122 17:53:18.900274 46 custom_operator.cc:969] Operator (set_value_by_flags_and_idx) has been registered.
W1122 17:53:18.900285 46 ir_context.cc:305] custom_op.save_output_ op already registered.
W1122 17:53:18.900293 46 custom_operator.cc:969] Operator (save_output) has been registered.
W1122 17:53:18.900305 46 ir_context.cc:305] custom_op.set_value_by_flags_and_idx_v2_ op already registered.
W1122 17:53:18.900314 46 custom_operator.cc:969] Operator (set_value_by_flags_and_idx_v2) has been registered.
W1122 17:53:18.900324 46 ir_context.cc:305] custom_op.dequant_int8 op already registered.
W1122 17:53:18.900333 46 custom_operator.cc:969] Operator (dequant_int8) has been registered.
W1122 17:53:18.900343 46 ir_context.cc:305] custom_op.get_token_penalty_multi_scores op already registered.
W1122 17:53:18.900350 46 custom_operator.cc:969] Operator (get_token_penalty_multi_scores) has been registered.
W1122 17:53:18.900362 46 ir_context.cc:305] custom_op.step_paddle_ op already registered.
W1122 17:53:18.900370 46 custom_operator.cc:969] Operator (step_paddle) has been registered.
W1122 17:53:18.900379 46 ir_context.cc:305] custom_op.set_stop_value_multi_ends_v2_ op already registered.
W1122 17:53:18.900388 46 custom_operator.cc:969] Operator (set_stop_value_multi_ends_v2) has been registered.
W1122 17:53:18.900398 46 ir_context.cc:305] custom_op.write_int8_cache_kv_ op already registered.
W1122 17:53:18.900404 46 custom_operator.cc:969] Operator (write_int8_cache_kv) has been registered.
W1122 17:53:18.900415 46 ir_context.cc:305] custom_op.encode_rotary_qk_ op already registered.
W1122 17:53:18.900422 46 custom_operator.cc:969] Operator (encode_rotary_qk) has been registered.
W1122 17:53:18.900431 46 ir_context.cc:305] custom_op.transpose_remove_padding op already registered.
W1122 17:53:18.900439 46 custom_operator.cc:969] Operator (transpose_remove_padding) has been registered.
Current device: npu:0
W1122 17:53:40.505782 46 dygraph_functions.cc:83253] got different data type, run type promotion automatically, this may cause data type been changed.
2024-12-06 17:53:41.525 | INFO | paddlespeech.s2t.modules.embedding:__init__:153 - max len: 5000
[2024-12-06 17:53:45,123] [CRITICAL] transformation.py:149 - Catch a exception from 0th func: LogMelSpectrogramKaldi(fs=16000, n_mels=80, n_frame_shift=10.0, n_frame_length=25.0, dither=1.0))
Traceback (most recent call last):
File "/work/PaddleSpeech/run_asr.py", line 10, in <module>
asr_result = asr(audio_file="zh.wav")
File "/work/PaddleSpeech/paddlespeech/cli/utils.py", line 328, in _warpper
return executor_func(self, *args, **kwargs)
File "/work/PaddleSpeech/paddlespeech/cli/asr/infer.py", line 510, in __call__
self.preprocess(model, audio_file)
File "/work/PaddleSpeech/paddlespeech/cli/asr/infer.py", line 275, in preprocess
audio = preprocessing(audio, **preprocess_args)
File "/work/PaddleSpeech/paddlespeech/audio/transform/transformation.py", line 147, in __call__
xs = [func(x, **_kwargs) for x in xs]
File "/work/PaddleSpeech/paddlespeech/audio/transform/transformation.py", line 147, in <listcomp>
xs = [func(x, **_kwargs) for x in xs]
File "/work/PaddleSpeech/paddlespeech/audio/transform/spectrogram.py", line 372, in __call__
mat = kaldi.fbank(
File "/usr/local/lib/python3.9/dist-packages/paddleaudio/compliance/kaldi.py", line 462, in fbank
strided_input, signal_log_energy = _get_window(
File "/usr/local/lib/python3.9/dist-packages/paddleaudio/compliance/kaldi.py", line 170, in _get_window
offset_strided_input = paddle.nn.functional.pad(
File "/usr/local/lib/python3.9/dist-packages/paddle/nn/functional/common.py", line 2036, in pad
out = _C_ops.pad3d(x, pad, mode, value, data_format)
NotImplementedError: (Unimplemented) npu npu only support mode=constant right now,but received mode is replicate .
[Hint: Expected mode == "constant", but received mode:replicate != "constant":constant.] (at /paddle/backends/npu/kernels/pad3d_kernel.cc:43)
根据提示修改源码File "/usr/local/lib/python3.9/dist-packages/paddleaudio/compliance/kaldi.py", line 170, in _get_window:
169 if preemphasis_coefficient != 0.0:
170 if paddle.device.get_device().startswith('npu'):
171 mode = 'constant'
172 else:
173 mode = 'replicate'
174 offset_strided_input = paddle.nn.functional.pad(
175 strided_input.unsqueeze(0), (1, 0),
176 data_format='NCL',
177 mode=mode).squeeze(0) # (m, window_size + 1)
178 strided_input = strided_input - preemphasis_coefficient * offset_strided_input[:, :
179 -1]
运行run_asr.py得出结果: