触发字检测(Trigger Word Detection)

来源:Coursera吴恩达深度学习课程

随着语音识别的发展,越来越多的设备可以通过你的声音来唤醒,这有时被叫做触发字检测系统(rigger word detection systems)。我们来看一看如何建立一个触发字系统。

触发字系统的例子包括Amazon echo,它通过单词Alexa唤醒;还有百度DuerOS设备,通过"小度你好"来唤醒;苹果的Siri用Hey Siri来唤醒;Google Home使用Okay Google来唤醒,这就是触发字检测系统。假如你在卧室中,有一台Amazon echo,你可以在卧室中简单说一句: Alexa, 现在几点了?就能唤醒这个设备。它将会被单词"Alexa"唤醒,并回答你的询问。Andrew想教会我们如何构建一个触发字检测系统

如上图所示,现在有一个这样的RNN结构,我们要做的就是把一个音频片段(an audio clip)计算出它的声谱图特征(spectrogram features)得到特征向量x^<1>, x^<2>, x^<3>...,然后把它放到RNN中,最后定义我们的目标标签y。假如音频片段中的这一点是某人刚刚说完一个触发字,比如"Alexa",或者"小度你好" 或者"Okay Google",那么在这一点之前,你就可以在训练集中把目标标签都设为0,然后在这个点之后把目标标签设为1。假如在一段时间之后,触发字又被说了一次,比如是在这个点说的,那么就可以再次在这个点之后把目标标签设为1。这样的标签方案对于RNN来说是可行的,并且确实运行得非常不错。不过该算法一个明显的缺点就是它构建了一个很不平衡的训练集(a very imbalanced training set),0的数量比1多太多了。

还有一个解决方法,虽然听起来有点简单粗暴,但确实能使其变得更容易训练。如上图所示,比起只在一个时间步上去输出1,其实你可以在输出变回0之前,多次输出1,或说在固定的一段时间内输出多个1。这样就稍微提高了1与0的比例。在音频片段中,触发字刚被说完之后,就把多个目标标签设为1,这里触发字又被说了一次。说完以后,又让RNN去输出1。在之后的编程练习中,你可以进行更多这样的操作

Andrew:这就是触发字检测,希望你能对自己感到自豪。因为你已经学了这么多深度学习的内容,现在你可以只用几分钟时间,就能用一张幻灯片来描述触发字能够实现它,并让它发挥作用。你甚至可能在你的家里用触发字系统做一些有趣的事情,比如打开或关闭电器,或者可以改造你的电脑,使得你或者其他人可以用触发字来操作它。

说明:记录学习笔记,如果错误欢迎指正!转载请联系我。

  • 3
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
基于词在大数据文本的出现频率来进行分词 using System; using System.Text.RegularExpressions; using System.Collections.Generic; using System.Linq; using System.Text; using System.IO; namespace WordDetection { class Program { static WordDetector wordDetector = null; static StreamWriter sw = null; private static void PrintResults() { if (sw == null) return; foreach (string word in wordDetector.FinalWords) { sw.WriteLine("{0}\t{1}", word, wordDetector.Freq[word]); } } static void Main(string[] args) { if (args.Length < 2) { Console.WriteLine("Usage: worddetector <input> <freq output>"); return; } wordDetector = new WordDetector(args[0]); //wordDetector.ProcessOver += PrintResults; var sw = new StreamWriter(args[1]); wordDetector.Process(); PrintResults(); sw.Flush(); sw.Close(); } } public class WordDetector { public Action ProcessOver = null; internal struct CharPos { public char ThisChar; public bool PositionOnRight; public CharPos(char value, bool positionOnRight) { this.ThisChar = value; this.PositionOnRight = positionOnRight; } } public const int MaxWordLength = 5, // 要检测的最长的词组长度 MinFreq = 10; // 词语出现的最小频数 public const double PSvPThreshold = 100, // theta_c EntropyThreshold = 1.3; // theta_f HashSet<string> finalWords = new HashSet<string>(); Dictionary<string, Dictionary<CharPos, int>> words = new Dictionary<string, Dictionary<CharPos, int>>(); Dictionary<string, int> freq = new Dictionary<string, int>(); Dictionary<string, double> ps = new Dictionary<string, double>(); Regex regSplit = new Regex(@"\W+|[a-zA-Z0-9]+", RegexOptions.Compiled | RegexOptions.Multiline); StreamReader sr = null; int total = 0; string _filename = ""; public HashSet<string> FinalWords { get { return finalWords; } } public Dictionary<string, int> Freq { get { return freq; } } public WordDetector (string filename) { _filename = filename; renewStreamReader(); } private void renewStreamReader () { sr = new StreamReader(_filename); } public void StartProcess () { System.Threading.Thread thr = new System.Threading.Thread(new System.Threading.ThreadStart(Process)); thr.Start(); } private void wordInfoEntropy (string word, out double leftEntropy, out double rightEntropy) { leftEntropy = rightEntropy = 0; double totalL = 0, totalR = 0; foreach (KeyValuePair<CharPos, int> pair in words[word]) { if (pair.Key.PositionOnRight) totalR += pair.Value; else totalL += pair.Value; } if (totalL <= 0) leftEntropy = double.MaxValue; if (totalR <= 0) rightEntropy = double.MaxValue; foreach (KeyValuePair<CharPos, int> pair in words[word]) { double p; if (pair.Key.PositionOnRight) { p = (double)pair.Value / totalR; rightEntropy -= p * Math.Log(p); } else { p = (double)pair.Value / totalL; leftEntropy -= p * Math.Log(p); } } } public void Process () { Console.WriteLine("Reading input..."); string line = ""; while ((line = sr.ReadLine()) != null) { total += addParagraph (line); } finalizeParagraph (); sr.Close (); Console.WriteLine("Building candidate word list..."); foreach (KeyValuePair<string, double> pair in ps) { if (pair.Key.Length < 2 || pair.Key.Length > MaxWordLength) continue; double p = 0; for (int i=1; i<pair.Key.Length; ++i) { double t = ps [pair.Key.Substring (0, i)] * ps [pair.Key.Substring (i)]; p = Math.Max (p, t); } if (freq [pair.Key] >= MinFreq && pair.Value / p > PSvPThreshold) words.Add (pair.Key, new Dictionary<CharPos, int>()); } renewStreamReader (); Console.WriteLine("Preparing word/adjacent character list..."); foreach(string cword in freq.Keys) { string wl = cword.Length > 1 ? cword.Substring(1) : "", wr = cword.Length > 1 ? cword.Substring(0, cword.Length - 1) : "", wc = cword.Length > 2 ? cword.Substring(1, cword.Length - 1) : ""; CharPos c = new CharPos('a', false); int frq = freq[cword]; if (words.ContainsKey(wl)) { c = new CharPos(cword[0], false); if (words[wl].ContainsKey(c)) words[wl][c] += frq; else words[wl].Add(c, frq); } if (words.ContainsKey(wr)) { c = new CharPos(cword[cword.Length - 1], true); if (words[wr].ContainsKey(c)) words[wr][c] += frq; else words[wr].Add(c, frq); } if (words.ContainsKey(wc)) { c = new CharPos(cword[0], false); if (words[wc].ContainsKey(c)) words[wc][c] += frq; else words[wc].Add(c, frq); c = new CharPos(cword[cword.Length - 1], true); if (words[wc].ContainsKey(c)) words[wc][c] += frq; else words[wc].Add(c, frq); } } Console.WriteLine("Calculating word information entropy..."); foreach (string word in words.Keys) { double leftEntropy = 0, rightEntropy = 0; wordInfoEntropy(word, out leftEntropy, out rightEntropy); if (leftEntropy < EntropyThreshold || rightEntropy < EntropyThreshold) continue; finalWords.Add(word); } Console.WriteLine("Done. Writing results."); if (ProcessOver != null) ProcessOver.Invoke(); } private int addParagraph (string paragraph) { int incr_total = 0; foreach (string sentence in regSplit.Split(paragraph)) { if (sentence.Length < 2) continue; for (int i = 0; i<sentence.Length; ++i) { for (int j = 1; j<=MaxWordLength+2 && i+j-1<sentence.Length; ++j) { string word = sentence.Substring (i, j); if (!freq.ContainsKey(word)) freq.Add(word, 0); freq [word]++; ++incr_total; } } } return incr_total; } private void finalizeParagraph () { foreach (string key in freq.Keys) ps.Add (key, (double)freq [key] / total); } public void Close () { sr.Close(); } } }
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值