本文所用文件的链接
链接:https://pan.baidu.com/s/1RWNVHuXMQleOrEi5vig_bQ
提取码:p57s
语音识别
语音识别可以实现通过一段音频信息(wav波) 识别出音频的内容.
通过傅里叶变换, 可以将时间域的声音分解为一系列不同频率的正弦函数的叠加. 通过频率谱线的特殊分布, 建立音频内容与文本之间的对应关系, 以此作为模型训练的基础.
语音识别
梅尔频率倒谱系数(MFCC) 描述了与声音内容密切相关的13个特殊频率所对应的能量分布. 那么我们就可以使用梅尔频率倒谱系数(MFCC)矩阵作为语音识别的特征. 基于隐马尔科夫模型进行模式识别, 找到测试样本最匹配的声音模型, 从而识别语音内容.
- 准备多个声音样本作为训练数据. 并且为每个音频都标明其类别.
- 读取每一个音频文件, 获取音频文件的mfcc矩阵.
- 以mfcc作为训练样本, 进行训练.
- 对测试样本进行测试. (基于隐马模型)
MFCC相关API:
import scipy.io.wavfile as wf
import python_speech_features as sf
sample_rate, sigs = wf.read('../xx.wav')
mfcc = sf.mfcc(sigs, sample_rate)
案例: MFCC提取
"""
MFCC提取
"""
import scipy.io.wavfile as wf
import python_speech_features as sf
import matplotlib.pyplot as mp
sample_rate, sigs=wf.read(
'../ml_data/filter.wav')
mfcc = sf.mfcc(sigs, sample_rate)
print(mfcc.shape)
mp.matshow(mfcc.T, cmap='gist_rainbow')
mp.title('MFCC')
mp.ylabel('Features', fontsize=14)
mp.xlabel('Samples', fontsize=14)
mp.tick_params(labelsize=10)
mp.show()
隐马尔科夫模型相关API:
import hmmlearn.hmm as hl
# 构建隐马模型
# n_components: 用几个高斯函数拟合样本数据
# covariance_type:使用相关矩阵辅对角线进行相关性比较
# n_iter: 最大迭代上限
model = hl.GaussianHMM(
n_components=4,
covariance_type='diag',
n_iter=1000)
model.fit(mfccs)
# 通过训练好的隐马模型 验证音频mfcc的得分
# 匹配度越好, 得分越高
score = model.score(test_mfcc)
案例:
"""
语音识别
"""
import os
import numpy as np
import scipy.io.wavfile as wf
import python_speech_features as sf
import hmmlearn.hmm as hl
def search_files(directory):
directory = os.path.normpath(directory)
# {'apple':[dir,dir,dir], 'banana':[dir..]}
objects = {}
#当前目录, 当前目录子目录, 文件列表
for curdir,subdirs,files in \
os.walk(directory):
for file in files:
if file.endswith('.wav'):
label = curdir.split(os.path.sep)[-1]
if label not in objects:
objects[label] = []
path = os.path.join(curdir, file)
objects[label].append(path)
return objects
train_samples = \
search_files('../ml_data/speeches/training')
# 整理训练集, 把每一个类别中的音频的mfcc
# 摞在一起, 基于隐马模型开始训练.
train_x, train_y = [], []
for label, filenames in train_samples.items():
mfccs = np.array([])
for filename in filenames:
sample_rate, sigs = wf.read(filename)
mfcc = sf.mfcc(sigs, sample_rate)
if len(mfccs) == 0:
mfccs = mfcc
else:
mfccs = np.append(mfccs, mfcc, axis=0)
train_x.append(mfccs)
train_y.append(label)
# 基于隐马模型进行训练, 把所有类别的模型都存起来
# 一共7个类别循环7次
models = {}
for mfccs, label in zip(train_x, train_y):
model = hl.GaussianHMM(n_components=4,
covariance_type='diag', n_iter=1000)
models[label] = model.fit(mfccs)
# 读取测试集中的文件, 使用每个模型对文件进行
# 评分, 取分值大的模型对应的label作为预测类别
test_samples = \
search_files('../ml_data/speeches/testing')
# 整理测试集, 提取每一个文件的mfcc
test_x, test_y = [], []
for label, filenames in test_samples.items():
mfccs = np.array([])
for filename in filenames:
sample_rate, sigs = wf.read(filename)
mfcc = sf.mfcc(sigs, sample_rate)
if len(mfccs) == 0:
mfccs = mfcc
else:
mfccs = np.append(mfccs, mfcc, axis=0)
test_x.append(mfccs)
test_y.append(label)
# 使用7个模型, 对每一个文件进行预测得分.
pred_test_y = []
# test_x一共7个样本, 遍历7次, 每次验证1个文件
for mfccs in test_x:
best_score, best_label = None, None
for label, model in models.items():
score = model.score(mfccs)
if (best_score is None) or \
(best_score < score):
best_score, best_label=score,label
pred_test_y.append(best_label)
print(test_y)
print(pred_test_y)