TensorFlow入门教程(20)语音识别固化模型和应用

最新推荐文章于 2023-08-15 17:04:18 发布

__Fang Wei__

最新推荐文章于 2023-08-15 17:04:18 发布

阅读量4.2k

点赞数 5

分类专栏： tensorflow 文章标签： tensorflow 语音识别

本文链接：https://blog.csdn.net/rookie_wei/article/details/89721441

版权

tensorflow 专栏收录该内容

70 篇文章 126 订阅

订阅专栏

#
#作者：韦访
#博客：https://blog.csdn.net/rookie_wei
#微信：1007895847
#添加微信的备注一下是CSDN的
#欢迎大家一起学习
#

1、概述

上三讲，我们将语音识别的模型训练出来了，模型训练完以后总得拿来用啊，所以，这一讲，我们就来固化模型，并应用。

环境配置：

操作系统：Win10 64位

显卡：GTX 1080ti

Python：Python3.7

TensorFlow：1.15.0

2、固化模型

固化模型很简单，我们只要将cpkt格式的模型转成pb格式的即可。

参考博客：

https://blog.csdn.net/rookie_wei/article/details/90546290

我这里就直接给出代码，

def main(argv=None):
 
    #看指定路径有没有我们要用的ckpt模型，没有就退出
    save_path = 'model'
    save_file = os.path.join(save_path, 'birnn_speech_recognition.cpkt-170.meta')
    if os.path.exists(save_file) is False:
        print('Not found ckpt file!')
        exit()
    
    #我们要保存的pb模型的文件名
    savePbFile = os.path.join(save_path, 'birnn_speech_recognition.pb')
    
    with tf.Session() as sess:
        # 加载图
        saver = tf.train.import_meta_graph(save_file)
    
        # 使用最后一次保存的
        saver.restore(sess, tf.train.latest_checkpoint(save_path))
    
        # 我们要固化哪些tensor
        output_graph_def = graph_util.convert_variables_to_constants(
            sess=sess,
            input_graph_def= sess.graph_def,
            output_node_names=['input', 'seq_length', 'keep_dropout', 'pred']
        )
    
        # 保存
        with tf.gfile.GFile(savePbFile, 'wb') as fd:
            fd.write(output_graph_def.SerializeToString())

执行上述代码以后，如果成功的话，会在model文件夹下生成birnn_speech_recognition.pb文件。

这样，我们固化的工作就做好了。

3、应用

接下来就要使用固化后的模型了，因为我们上一讲中将所有字符都存到characters.txt文件中了，所有我们就不再需要几个G的数据库了，只需要将characters.txt文件中的字符导入到列表中即可，代码如下，

words, _ = load_words_table_()

接着就导入pb模型，从模型中找到我们上面固化的那几个tensor，

# 打开pb模型文件
with gfile.FastGFile(save_pb_file, 'rb') as fd:
	# 导入图
	graph_def = tf.GraphDef()
	graph_def.ParseFromString(fd.read())
	sess.graph.as_default()
	tf.import_graph_def(graph_def, name='')

	# 根据名字获取对应的tensorflow
	input = sess.graph.get_tensor_by_name('input:0')
	seq_length = sess.graph.get_tensor_by_name('seq_length:0')
	dropout = sess.graph.get_tensor_by_name('keep_dropout:0')
	pred = sess.graph.get_tensor_by_name('pred:0')

然后，使用CTC decoder，

# 使用CTC decoder
decoder, _ = ctc_ops.ctc_beam_search_decoder(pred, seq_length, merge_repeated=False)

接着将稀疏矩阵转为稠密矩阵，

# 将稀疏矩阵转为稠密矩阵
dense_decoder = tf.sparse_tensor_to_dense(decoder[0], default_value=0)

获取要识别的语音文件的mfcc特征，

#获取要识别的语音文件的mfcc特征
source, source_lengths = get_mfcc(argv[1])

万事准备好了，开始计算吧，

#开始计算          
dense_decoded = sess.run(dense_decoder, feed_dict={input: source, seq_length: source_lengths, dropout: 1.0})

最后，输出识别结果，

# 输出识别结果   
dense_decoded = np.asarray(dense_decoded, dtype=np.int32)                     
if (len(dense_decoded) > 0):
    decoded_str = dense_to_text(dense_decoded[0], words)
    print('Decoded:  {}'.format(decoded_str))

执行以下命令运行上面的代码，对A11_100.wav文件进行语音识别，

python test.py A11_100.wav

运行结果，

我们打开数据集中的data/A11_100.wav.trn文件看看我们识别结果对不对，

完全一致，效果不错。

如果你们想自己录音测试的话，音频数据一定要是单通道、采样率16k、格式S16_LE。

说白了就是你录音的音频文件格式要跟数据集中完全一致，而且因为我们训练的时候没有加入噪声，所以，你录的音频最好也不要有噪声。大家最好能将上一讲的代码稍微修改一下，在训练的时候加入一下随机噪声。

4、完整代码

完整代码链接如下，

https://mianbaoduo.com/o/bread/ZZyXkps=

__Fang Wei__

关注

5
点赞
踩
40

收藏

觉得还不错? 一键收藏
33
评论
TensorFlow入门教程(20)语音识别固化模型和应用

##作者：韦访#博客：https://blog.csdn.net/rookie_wei#微信：1007895847#添加微信的备注一下是CSDN的#欢迎大家一起学习#------韦访201904301、概述好多网友说语音识别的模型训练完了，不知道怎么用，毕设做这个语音识别，要毕不了业啦，让我抽空写个测试代码啊。我只想说，好了，开玩笑，我不开车。今天五一放假有时间，...
复制链接

扫一扫

专栏目录