speex从1.2版本开始支持静音检测vad(还有降噪、回声消除、自动增益控制agc、抖动buffer、重采样等一堆功能)等针对语音的预处理功能,实现在libspeexdsp库中。
真正用起来后,发现各种坑!
首先我打开了降噪、agc和vad,结果预处理后的音频播放起来有电流突突声(不知道怎么形容,看图)
因为speex初始化时frame size填的20ms帧长,所以各位从上图可以看到,每隔20ms,波形会出现一个突变,突变从20ms对齐处开始,持续1.5ms左右
将降噪和agc关闭后,现象不变,还跟上图一样
察看speexdsp源码中的preprocess.c文件,发现speex_preprocess_state_init函数默认打开降噪,不过我用speex_preprocess_ctl函数显式关闭后,结果还是如上图。而speex_preprocess_run函数里面有段注释吓到我了
/* If noise suppression is off, don't apply the gain (but then why call this in the first place!) */
speexdsp的降噪也是摆设,打开降噪功能后,背景噪声根本没有任何减少(还增加了它自己引入的电流突突声)
speexdsp还有个问题:即使是单纯的背景噪声,它也可能将其检测为语音,感觉它是单纯基于频域,即只要属于高频成分,一律认为是人声
以上两点导致vad功能完全不可用
最后附上代码,好奇的同学可以自行尝试
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <assert.h>
#include <speex/speex_preprocess.h>
#define SAMPLE_RATE (16000)
#define FRAME_SIZE (20) //ms
#define SAMPLES_PER_FRAME (SAMPLE_RATE/1000 * FRAME_SIZE)//每毫秒16个样点
#define FRAME_BYTES (SAMPLES_PER_FRAME * 2)//每个样点2字节(单通道)
int main()
{
size_t n = 0;
FILE *inFile = fopen("/run/shm/rec_whp.raw", "rb");
FILE *outFile = fopen("/run/shm/rec_spx2.raw", "wb");
char *buf = malloc(FRAME_BYTES);
assert(buf != NULL);
SpeexPreprocessState *state = speex_preprocess_state_init(FRAME_SIZE, SAMPLE_RATE);
int denoise = 0;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DENOISE, &denoise); //关闭降噪
//speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_NOISE_SUPPRESS, &noiseSuppress); //设置噪声的dB
//speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC, &agc);//增益
//speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC_LEVEL,&agcLevel);//设置增益的dB
//int vad = 1, vadProbStart = 80, vadProbContinue = 65;
int vad = 1, vadProbStart = 99, vadProbContinue = 99;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_VAD, &vad); //静音检测
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_PROB_START , &vadProbStart); //Set probability required for the VAD to go from silence to voice
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_PROB_CONTINUE, &vadProbContinue); //Set probability required for the VAD to stay in the voice state (integer percent)
while (1)
{
n = fread(buf, 2, SAMPLES_PER_FRAME, inFile);
if (n == 0)
break;
speex_preprocess_run(state, (spx_int16_t*)(buf));
fwrite(buf, 2, SAMPLES_PER_FRAME, outFile);
}
free(buf);
fclose(inFile);
fclose(outFile);
speex_preprocess_state_destroy(state);
return 0;
}
编译运行:
gcc squelch.c -lspeexdsp
./a.out
还好我最终用自己想出来的方法实现了静音检测,虽然应用范围较窄,但符合我们的使用场景
