pytorch使用speechbrain和huggingface中预训练模型实现语音（中文）转文字的推理例子

最新推荐文章于 2024-06-17 16:32:51 发布

qq_37401291

最新推荐文章于 2024-06-17 16:32:51 发布

阅读量2.6k

点赞数 1

文章标签： pytorch 深度学习人工智能语音识别

本文链接：https://blog.csdn.net/qq_37401291/article/details/128729205

版权

import librosa
import torch
import IPython.display as display
from transformers import Wav2Vec2ForCTC, Wav2Vec2Tokenizer
import warnings
warnings.filterwarnings("ignore")
# !pip install speechbrain


audio_file = f"B31_385.wav"
#load audio file
audio, sampling_rate = librosa.load(audio_file, sr=16_000)

# # audio
# display.Audio(audio_file, autoplay=True)

#load pre-trained model and tokenizer
tokenizer = Wav2Vec2Tokenizer.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn")
model = Wav2Vec2ForCTC.from_pretrained("jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn")

input_values = tokenizer(audio, return_tensors='pt').input_values
input_values

# store logits (non-normalized predictions)
logits = model(input_values).logits
logits

# store predicted id's
# pass the logit values to softmax to get the predicted values
predicted_ids = torch.argmax(logits, dim=-1)

# pass the prediction to the tokenzer decode to get the transcription
transcriptions = tokenizer.decode(predicted_ids[0])

transcriptions

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'Wav2Vec2CTCTokenizer'. 
The class this function is called from is 'Wav2Vec2Tokenizer'.





'地<unk>是内部圈层的最外层由封化的土层和坚映的岩石组成所以地<unk>也可称为岩石圈'


from speechbrain.pretrained import EncoderDecoderASR

asr_model = EncoderDecoderASR.from_hparams(source="speechbrain/asr-transformer-aishell",
                                           savedir="pretrained_models/asr-transformer-aishell")
asr_model.transcribe_file(audio_file)

The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.





'地价 是 内部 圈层 的 最 外 层 由 分化 的 吐槽 和 签应 的 延迟 组成 所以 地区 而 也 可 称 为 严实 圈'

from speechbrain.pretrained.interfaces import foreign_class

#使用显卡推理
asr_model = foreign_class(source="speechbrain/asr-wav2vec2-ctc-aishell", pymodule_file="custom_interface.py",
                          classname="CustomEncoderDecoderASR", run_opts={"device": "cuda"})
asr_model.transcribe_file(audio_file)

Some weights of the model checkpoint at TencentGameMate/chinese-wav2vec2-large were not used when initializing Wav2Vec2Model: ['project_q.bias', 'project_hid.bias', 'quantizer.codevectors', 'quantizer.weight_proj.weight', 'project_q.weight', 'project_hid.weight', 'quantizer.weight_proj.bias']
- This IS expected if you are initializing Wav2Vec2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).





['地',
 '俏',
 '是',
 '内',
 '部',
 '圈',
 '层',
 '的',
 '最',
 '外',
 '层',
 '由',
 '封',
 '化',
 '的',
 '吐',
 '层',
 '和',
 '接',
 '应',
 '的',
 '沿',
 '石',
 '组',
 '成',
 '所',
 '以',
 '地',
 '俏',
 '也',
 '可',
 '称',
 '为',
 '颜',
 '石',
 '圈']

import torch
import torchaudio
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

# test_dataset = load_dataset("common_voice", "zh-CN", split="test")

tokenizer = Wav2Vec2Processor.from_pretrained("ydshieh/wav2vec2-large-xlsr-53-chinese-zh-cn-gpt")
model = Wav2Vec2ForCTC.from_pretrained("ydshieh/wav2vec2-large-xlsr-53-chinese-zh-cn-gpt")

input_values = tokenizer(audio, return_tensors='pt').input_values
input_values

# store logits (non-normalized predictions)
logits = model(input_values).logits
logits

# store predicted id's
# pass the logit values to softmax to get the predicted values
predicted_ids = torch.argmax(logits, dim=-1)

# pass the prediction to the tokenzer decode to get the transcription
transcriptions = tokenizer.decode(predicted_ids[0])

transcriptions

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
It is strongly recommended to pass the ``sampling_rate`` argument to this function. Failing to do so can result in silent errors that might be hard to debug.





'地壳是内部圈层的最外层由丰化的土层和坚硬的岩始组成所以地壳也可称为岩石圈'

qq_37401291

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
1
评论
pytorch使用speechbrain和huggingface中预训练模型实现语音（中文）转文字的推理例子

【代码】pytorch使用speechbrain和huggingface中预训练模型实现语音（中文）转文字的推理例子。
复制链接

扫一扫