语音识别-关键词检测

最新推荐文章于 2023-07-23 09:05:32 发布

xmdxcsj

最新推荐文章于 2023-07-23 09:05:32 发布

阅读量1w

点赞数 1

分类专栏：语音识别文章标签：关键词检测

语音识别专栏收录该内容

10 篇文章 8 订阅

订阅专栏

introduction

word-spotting,audio indexing,spoken term detection
输出的是word lattice，根据lattice计算关键词的后验概率。

A T W V = m e a n (N c o r r e c t ( s ) N t r u e ( s ) - β N s p u r i o u s ( s ) T - N t r u e ( s ))

$ATWV=mean(\frac{N_{correct}(s)}{N_{true}(s)}-\beta\frac{N_{spurious}(s)}{T-N_{true}(s)})$
其中

Ncorrect(s) $N_{correct}(s)$ 表示检测正确的个数，

Ntrue(s) $N_{true}(s)$ 表示reference中关键词的个数，

Nspurious(s) $N_{spurious}(s)$ 检测错误的个数，T表示音频的秒数。

β $\beta$ 在evaluation中一般设置为999.9。
检测系统共有四部分：
1. speech-to-text engine
输出lattice和single-best phonetic transcripts
2. indexer
The indexer takes these as input and creates an index containing a precomputed list of candidate detection records for each word in the speech-to-text lexicon. The index also contains the phonetic
transcripts to accommodate out-of-vocabulary search terms.
3. detector
The detector loads the index and processes a list of search terms, generating a sorted, scored list of detection records for each term.
4. decider
the decider takes the lists of candidate detections and the cost parameter β and sets a per-term score threshold for making yes/no decisions.

systerm

recognition

对于离线的大量语音数据，首先进行分段，然后使用通用语音识别系统对语音进行解码，获得lattice（边上包含有声学得分和语言得分）。
如果直接根据识别结果进行关键词检测，将会导致更多的漏报情况，因为同音词的存在。

indexing

建索引。假设lattice中出现的所有候选词分别是 $w_1,w_2,...,w_L$ .
1. 首先计算每一个出现在lattice里面词 $w_i$ 的后验概率。根据lattice中包含有的似然得分信息。
2. 对同一时间段出现的相同词 $w_i$ 的后验概率累加作为最后的得分.
3. 使用 $L$ 个独立的链表对所有lattice的词 $w_i$ 进行汇总，按照后验概率从大到小的顺序。