目录
阿里达摩院FunASR:高效语音识别工具 免费
模型:
huggingface-cli download --resume-download funasr/paraformer-zh --local-dir funasr--paraformer-zh
huggingface-cli download --resume-download funasr/fsmn-vad --local-dir funasr--fsmn-vad
推理代码:
from funasr import AutoModel
chunk_size = [0, 10, 5] # [0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 # number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 # number of encoder chunks to lookback for decoder cross-attention
from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model=r"E:\project\yuyin\funasr--paraformer-zh", model_revision="v2.0.4",
vad_model=r"E:\project\yuyin\funasr--fsmn-vad", vad_model_revision="v2.0.4",
punc_model="ct-punc-c", punc_model_revision="v2.0.4",
# spk_model="cam++", spk_model_revision="v2.0.2",
)
res = model.generate(input=r"C:\Users\Administrator\Documents\WeChat Files\libanggeng\FileStorage\File\2024-12\1.m4a",
batch_size_s=300,
hotword='魔搭')
print(res)
运行期间不能科学上网,否则报错:
ct-punc-c is not registered
Traceback (most recent call last):
File "E:\project\jijia\tools_jijia\yuyin\yuyinshibie.py", line 13, in <module>
model = AutoModel(model=r"E:\project\yuyin\funasr--paraformer-zh", model_revision="v2.0.4",
File "D:\ProgramData\miniconda3\lib\site-packages\funasr\auto\auto_model.py", line 145, in __init__
punc_model, punc_kwargs = self.build_model(**punc_kwargs)
File "D:\ProgramData\miniconda3\lib\site-packages\funasr\auto\auto_model.py", line 259, in build_model
assert model_class is not None, f'{kwargs["model"]} is not registered'
AssertionError: ct-punc-c is not registered
更多例子:
paddle deep speech
dragonfly
不错的功能介绍
音频转特征向量
GitHub - librosa/librosa: Python library for audio and music analysis
librosa安装
2024.04.27 测试ok Win11系统
pip install librosa
import os
import numpy as np
from transformers import Wav2Vec2Processor, Wav2Vec2Model
import torch
import librosa
def load_example_input(audio_path, processor=None):
if processor is None:
processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")
speech_array, sampling_rate = librosa.load(os.path.join(audio_path), sr=16000)
audio_feature = np.squeeze(processor(speech_array, sampling_rate=sampling_rate).input_values)
audio_feature = np.reshape(audio_feature, (-1, audio_feature.shape[0]))
return torch.FloatTensor(audio_feature)
audio_path=r'demo/wav/man.wav'
load_example_input(audio_path)
语音识别
pip install SpeechRecognition
pip install pyaudio
import librosa
import speech_recognition as sr
# 录制音频
r = sr.Recognizer()
with sr.Microphone() as source:
print("请开始说话...")
audio = r.listen(source)
# 将音频转换为文本
try:
text = r.recognize_google(audio)
print("识别结果:", text)
except sr.UnknownValueError:
print("无法识别音频")
except sr.RequestError as e:
print(f"请求出错:{e}")