语音识别 学习笔记2024

本文介绍了Python库librosa用于音频特征提取的方法,以及如何结合Wav2Vec2模型进行音频转为特征向量。同时展示了如何使用SpeechRecognition进行语音识别的基本操作。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

目录

阿里达摩院FunASR:高效语音识别工具 免费

模型:

推理代码:

运行期间不能科学上网,否则报错:

更多例子:

paddle deep speech

dragonfly

不错的功能介绍

librosa安装

语音识别


阿里达摩院FunASR:高效语音识别工具 免费

GitHub - modelscope/FunASR: A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

模型:

huggingface-cli download --resume-download funasr/paraformer-zh --local-dir funasr--paraformer-zh
huggingface-cli download --resume-download funasr/fsmn-vad --local-dir funasr--fsmn-vad

推理代码:



from funasr import AutoModel

chunk_size = [0, 10, 5]  # [0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4  # number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1  # number of encoder chunks to lookback for decoder cross-attention


from funasr import AutoModel
# paraformer-zh is a multi-functional asr model
# use vad, punc, spk or not as you need
model = AutoModel(model=r"E:\project\yuyin\funasr--paraformer-zh", model_revision="v2.0.4",
                  vad_model=r"E:\project\yuyin\funasr--fsmn-vad", vad_model_revision="v2.0.4",
                  punc_model="ct-punc-c", punc_model_revision="v2.0.4",
                  # spk_model="cam++", spk_model_revision="v2.0.2",
                  )
res = model.generate(input=r"C:\Users\Administrator\Documents\WeChat Files\libanggeng\FileStorage\File\2024-12\1.m4a",
                     batch_size_s=300,
                     hotword='魔搭')
print(res)

运行期间不能科学上网,否则报错:

ct-punc-c is not registered

Traceback (most recent call last):
  File "E:\project\jijia\tools_jijia\yuyin\yuyinshibie.py", line 13, in <module>
    model = AutoModel(model=r"E:\project\yuyin\funasr--paraformer-zh", model_revision="v2.0.4",
  File "D:\ProgramData\miniconda3\lib\site-packages\funasr\auto\auto_model.py", line 145, in __init__
    punc_model, punc_kwargs = self.build_model(**punc_kwargs)
  File "D:\ProgramData\miniconda3\lib\site-packages\funasr\auto\auto_model.py", line 259, in build_model
    assert model_class is not None, f'{kwargs["model"]} is not registered'
AssertionError: ct-punc-c is not registered

更多例子:

魔搭社区

paddle deep speech

GitHub - yeyupiaoling/PaddlePaddle-DeepSpeech: 基于PaddlePaddle实现的语音识别,中文语音识别。项目完善,识别效果好。支持Windows,Linux下训练和预测,支持Nvidia Jetson开发板预测。

dragonfly

不错的功能介绍

librosa,一个很有趣的 Python 库! - 简书

音频转特征向量

GitHub - librosa/librosa: Python library for audio and music analysis

librosa安装

2024.04.27 测试ok Win11系统

pip install librosa

import os

import numpy as np
from transformers import Wav2Vec2Processor, Wav2Vec2Model
import torch

import librosa


def load_example_input(audio_path, processor=None):
    if processor is None:
        processor = Wav2Vec2Processor.from_pretrained("facebook/wav2vec2-base-960h")

    speech_array, sampling_rate = librosa.load(os.path.join(audio_path), sr=16000)

    audio_feature = np.squeeze(processor(speech_array, sampling_rate=sampling_rate).input_values)

    audio_feature = np.reshape(audio_feature, (-1, audio_feature.shape[0]))

    return torch.FloatTensor(audio_feature)


audio_path=r'demo/wav/man.wav'


load_example_input(audio_path)

语音识别

pip install SpeechRecognition

pip install pyaudio

import librosa
import speech_recognition as sr

# 录制音频
r = sr.Recognizer()
with sr.Microphone() as source:
    print("请开始说话...")
    audio = r.listen(source)

# 将音频转换为文本
try:
    text = r.recognize_google(audio)
    print("识别结果:", text)
except sr.UnknownValueError:
    print("无法识别音频")
except sr.RequestError as e:
    print(f"请求出错:{e}")

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

AI算法网奇

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值