sherpa-onnx
Python接口调用
该项目分为sherpa-onnx版本与sherpa-ncnn版本,本文采用的是sherpa-onnx版本地址为
https://github.com/k2-fsa/sherpa-onnx
另一版本地址为
https://github.com/k2-fsa/sherpa-ncnn
需要事先安装sherpa-onnx包
cpu版本:
pip install sherpa-onnx -f https://k2-fsa.github.io/sherpa/onnx/cpu-cn.html
gpu版本需要安装对应的包,需要实现安装11.8版本的cuda,参考链接:
https://k2-fsa.github.io/sherpa/onnx/python/install.html#method-2-from-pre-compiled-wheels-cpu-cuda
再安装
pip install sherpa-onnx==1.11.1+cuda -f https://k2-fsa.github.io/sherpa/onnx/cuda.html
1、语音唤醒:
该项目位于sherpa-onnx/python-api-examples/目录下的keyword-spotter.py与keyword-spotter-from-microphone.py,一个是用于进行实时麦克风唤醒,另一个是音频唤醒
下载模型权重文件:
GitHub
cd /path/to/sherpa-onnx
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2
tar xf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2
rm sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2
ls -lh sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01
Modelscope
cd /path/to/sherpa-onnx
git lfs install
git clone https://www.modelscope.cn/pkufool/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.git
ls -lh sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01
代码分别如下:
keyword-spotter-from-microphone.py
#!/usr/bin/env python3
# Real-time keyword spotting from a microphone with sherpa-onnx Python API
#
# Please refer to
# https://k2-fsa.github.io/sherpa/onnx/kws/pretrained_models/index.html
# to download pre-trained models
import argparse
import sys
from pathlib import Path
from typing import List
try:
import sounddevice as sd
except ImportError:
print("Please install sounddevice first. You can use")
print()
print(" pip install sounddevice")
print()
print("to install it")
sys.exit(-1)
import sherpa_onnx
def assert_file_exists(filename: str):
assert Path(filename).is_file(), (
f"{
filename} does not exist!\n"
"Please refer to "
"https://k2-fsa.github.io/sherpa/onnx/kws/pretrained_models/index.html to download it"
)
def get_args():
parser = argparse.ArgumentParser(
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
parser.add_argument(
"--tokens",
type=str,
default=r"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/tokens.txt", # 这里设置默认值
help="Path to tokens.txt",
)
parser.add_argument(
"--encoder",
type=str,
default=r"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/encoder-epoch-12-avg-2-chunk-16-left-64.onnx",
help="Path to the transducer encoder model",
)
parser.add_argument(
"--decoder",
type=str,
default=r"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/decoder-epoch-12-avg-2-chunk-16-left-64.onnx",
help="Path to the transducer decoder model",
)
parser.add_argument(
"--joiner",
type=str,
default=r"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/joiner-epoch-12-avg-2-chunk-16-left-64.onnx",
help="Path to the transducer joiner model",
)
parser.add_argument(
"--num-threads",
type=int,
default=1,
help="Number of threads for neural network computation",
)
parser.add_argument(
"--provider",
type=str,
default="cpu",
help="Valid values: cpu, cuda, coreml",
)
parser.add_argument(
"--max-active-paths",
type=int,
default=4,
help="""
It specifies number of active paths to keep during decoding. 它指定了解码过程中要保留的活动路径的数量。
""",
)
parser.add_argument(
"--num-trailing-blanks",
type=int,
default=1,
help="""The number of trailing blanks a keyword should be followed. Setting
to a larger value (e.g. 8) when your keywords has overlapping tokens
between each other.
""",
)
parser.add_argument(
"--keywords-file",
type=str,
default=r"./sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/test_wavs/test_keywords.txt",
help="""
The file containing keywords, one words/phrases per line, and for each
phrase the bpe/cjkchar/pinyin are separated by a space. For example:
▁HE LL O ▁WORLD
x iǎo ài t óng x ué
""",
)
parser.add_argument(
"--keywords-score",
type=float,
default=1.0,
help="""
The boosting score of each token for keywords. The larger the easier to
survive beam search.
""",
)
parser.add_argument(
"--keywords-threshold",
type=float,
default=0.25,
help="""
The trigger threshold (i.e. probability) of the keyword. The larger the
harder to trigger.
""",
)
return parser.parse_args()
def main():
args = get_args()
devices = sd.query_devices()
if len(devices) == 0:
print("No microphone devices found")
sys.exit(0)
print(devices)
default_input_device_idx = sd.default.device[0]
print(f'Use default device: {
devices[default_input_device_idx]["name"]