SUMMARY OF > Small footprint keyword spotting using deep neural networks

文章目录DNN keyword spotting 阅读笔记1. Problem description1.1 Previous approach2. Model proposed2.1 Deep KWS2.1.1 component2.1.1.1 Feature extraction module2.1.1.2 DNN2.1.1.3 Posterior handling3. Experiment3...
摘要由CSDN通过智能技术生成

DNN keyword spotting - summary

论文链接

https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/42537.pdf

1. Problem description

针对语音识别中的关键字识别即KWS问题

1.1 Previous approach

  • Large vocabulary continuous speech recognition (LVCSR) -> rich lattices [2,3,4]
    • Offline processing
    • Indexing and search for keywords
    • Often used in search large audio databases
  • Keyword/Filler HMM [5,6,7,8,9]
    • Online Viterbi algorithm (Computational expensive)
    • One HMM model is trained for each word and the other is trained as fillers (For non-keyword segment)
  • Large margin formulation *[10,11]
  • RNN [12,13]
    • Need longer time span to identify keywords, long latency

2. Model proposed

在这里插入图片描述

2.1 Deep KWS

  • Pros:
    • Keywords as well as sub-word units
    • No need for sequence search algorithm(decoding)
    • Shorter run time and latency
    • smaller memory footprint
2.1.1 component
2.1.1.1 Feature extraction module

The same as baseline HMM system

  • Rate: A vector of features every 10ms
    • Computed every 10ms over a window of 25ms
      在这里插入图片描述
  • Procedure:
    1. Use RNN-VAD*[14]* to identify speech region
    2. Generate log-fbank feature (Add sufficient left and right context)
      • Deep KWS use 10 future frames and 30 past frames
        • Why asymmetric? → reduce latency
      • HMM baseline use 5 future frames and 10 past frames
2.1.1.2 DNN

Predict posterior probabilities

  • Structure:
    • (FC → ReLU)* → Softmax
  • Labeling:
    • Represent entire words/ sub-words
      • Computational efficient
      • Simpler posterior handling
      • Output: { x j , i j } j \{x_j,i_j\}_j { xj,ij}j
        Where x j x_j xj stand for j t h j^{th} jthframe, i i i is label number(see below)
  • Training:
    • LR decay: exponential
    • Maximize cross-entropy training criterion to train DNN:
      F ( θ ) = ∑ j l o g p i j j F(\theta) = \sum_{j}{logp{_i}_{_j}{_j}} F
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值