SUMMARY OF > Small footprint keyword spotting using deep neural networks

本文概述了一种使用深度神经网络(DNN)进行小内存关键字识别的方法,对比了传统基于HMM的基线模型,并讨论了DNN在减少运行时间和内存占用方面的优势。特征提取模块每10ms生成一次特征,使用RNN-VAD识别语音区域,DNN预测后验概率以简化后处理。实验表明,DNN在多个方面优于HMM,包括鲁棒性和模型大小。
摘要由CSDN通过智能技术生成

DNN keyword spotting - summary

论文链接

https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/42537.pdf

1. Problem description

针对语音识别中的关键字识别即KWS问题

1.1 Previous approach

  • Large vocabulary continuous speech recognition (LVCSR) -> rich lattices [2,3,4]
    • Offline processing
    • Indexing and search for keywords
    • Often used in search large audio databases
  • Keyword/Filler HMM [5,6,7,8,9]
    • Online Viterbi algorithm (Computational expensive)
    • One HMM model is trained for each word and the other is trained as fillers (For non-keyword segment)
  • Large margin formulation *[10,11]
  • RNN [12,13]
    • Need longer time span to identify keywords, long latency

2. Model proposed

在这里插入图片描述

2.1 Deep KWS

  • Pros:
    • Keywords as well as sub-word units
    • No need for sequence search algorithm(decoding)
    • Shorter run time and latency
    • smaller memory footprint
2.1.1 component
2.1.1.1 Feature extraction module

The same as baseline HMM system

  • Rate: A vector of features every 10ms
    • Computed every 10ms over a window of 25ms
      在这里插入图片描述
  • Procedure:
    1. Use RNN-VAD*[14]* to identify speech region
    2. Generate log-fbank feature (Add sufficient left and right context)
      • Deep KWS use 10 future frames and 30 past frames
        • Why asymmetric? → reduce latency
      • HMM baseline use 5 future frames and 10 past frames
2.1.1.2 DNN

Predict posterior probabilities

  • Structure:
    • (FC → ReLU)* → Softmax
  • Labeling:
    • Represent entire words/ sub-words
      • Computational efficient
      • Simpler posterior handling
      • Output: { x j , i j } j \{x_j,i_j\}_j { xj,ij}j
        Where x j x_j xj stand for j t h j^{th} jthframe, i i i is label number(see below)
  • Training:
    • LR decay: exponential
    • Maximize cross-entropy training criterion to train DNN:
      F ( θ ) = ∑ j l o g p i j j F(\theta) = \sum_{j}{logp{_i}_{_j}{_j}} F
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值