A Statistical Analysis of On-Off Patterns in 16 Conversations
文章引用
Brady, P. T. (1968). A Statistical Analysis of On-Off Patterns in 16 Conversations. The Bell System Technical Journal, 47, 73–91.
文章主要介绍了talkspurt的定义,以及对一些电话数据的检测得到的talkspurt、pause等的统计信息。
概念介绍
Talkspurt的定义
- 首先进行音频活动检测(Voice Activity Detect),设置一定的阈值,如果在5ms内音频穿越一次阈值,则认为该5ms内存在音频活动;
- 如果连续活动的时间不超过15ms,则认为是噪声,忽略;
- 执行200ms的Fill-in操作
Pause的定义
- talkspurt的间隔即为pause。
Hangover
- 将talkspurt向后延长一个固定的时间。如下图所示,最初的音频经过hangover后,由3个talkspurt变为2个talkspurt,其中第一个和第二个talkspurt被合并为一个talkspurt
Fill-in
- 将时间差小于某一阈值的talkspurt合并为同一个talkspurt。示例参见下图。
(The technique of obtaining on-off speech patterns, although already documented,' is summarized as follows. A flip-flop is set any time speech (full-wave rectified and unfiltered) from speaker A crosses a threshold. This flip-flop is examined and cleared every 5 milliseconds,with the output being a 1 if the threshold was crossed, 0 otherwise. The resulting string of Is (spurts) and Os (gaps) is examined for short spurts; all spurts ^15 msec* are erased. After this is done, all gaps less than 200 msec are filled in to account for momentary interruptions, such as those due to stop consonants. The resulting on-off pattern consists, by definition used here, of talkspurts and pauses. An identical procedure is used for speaker B.)
Hangover和Fill-in对talkspurt的影响
Hangover 和 Fill-in 操作主要目的是减小talkspurt出现的频率,同时,在VoIP中对相对较长的pause进行压缩或者延长,对通话质量的影响更小;但是hangover的存在可能会大幅增加音频的活动时间。而Fill-in操作要在两个相邻talkspurt均出现的情况下才能确定,因而并不适用于实时系统,但是和hangover相比,更适合应用在非实时场景(monolog)。
Hangover和Fill-in对talkspurt的影响参见下图
talkspurt、pause等的一些参数统计
talkspurt的分布
pause的分布
参考文献
Brady, P. T. (1968). A Statistical Analysis of On-Off Patterns in 16 Conversations. The Bell System Technical Journal, 47, 73–91.
Gruber, J. G. (1982). A comparison of measured and calculated speech temporal parameters relevant to speech activity detection.IEEE Transactions on Communications,COM-30(4), 728–38. http://doi.org/10.1109/TCOM.1982.1095514