dtw语音识别代码 java_基于DTW的语音识别源程序

最新推荐文章于 2022-08-14 11:10:10 发布

咖啡猫的眼泪

最新推荐文章于 2022-08-14 11:10:10 发布

阅读量125

点赞数

文章标签： dtw语音识别代码 java

本文链接：https://blog.csdn.net/weixin_35214885/article/details/114494363

版权

本文介绍了语音识别技术，包括将语音转换为机器可读输入的过程，以及语音活动检测（VAD）在音频处理中的应用。MFCC是用于语音识别的关键特征，它模拟人类听觉系统的响应。VAD则用于检测音频中的语音段，常用于语音编码和识别，以优化处理并节省资源。

摘要由CSDN通过智能技术生成

Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to machine-readable input (for example, to key presses, using the binary code for a string of character codes). The term "voice recognition" is sometimes incorrectly used to refer to speech recognition, when actually referring to speaker recognition, which attempts to identify the person speaking, as opposed to what is being said. Confusingly, journalists and manufacturers of devices that use speech recognition for control commonly use the term Voice Recognition when they mean Speech Recognition.

Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal cepstrum. This frequency warping can allow for better representation of sound.

Voice activity detection (also known as speech activity detection or, more simply, speech detection) is a technique used in speech processing wherein the presence or absence of human speech is detected in regions of audio (which may also contain music, noise, or other sound) [1]. The main uses of VAD are in speech coding and speech recognition. It can facilitate speech processing, and can also be used to deactivate some processes during non-speech segments: it can avoid unnecessary coding/transmission of silence packets in VOIP, saving on computation and on network bandwidth.