Android 单独抽取 WebRtc-VAD（语音端点检测）模块

最新推荐文章于 2024-09-11 15:16:05 发布

Code王工

最新推荐文章于 2024-09-11 15:16:05 发布

阅读量3.9k

点赞数 3

分类专栏： Android 文章标签： webrtc 语音断句音频断句 VAD 语音端点检测

本文链接：https://blog.csdn.net/always_and_forever_/article/details/81110539

版权

Android 专栏收录该内容

29 篇文章 1 订阅

订阅专栏

本文基于webrtc最新源码进行抽取编译做简单讲解。

最终目的是Android 单独抽取 WebRtc-VAD 模块，封装好JNI层，并且ndk-build出so库。

先来看一下vad模块的头文件，webrtc_vad.h,该文件路径为common_audio\vad\include\webrtc_vad.h

个人认为，在正式编译这套库之前，应该先大体了解一下这套库能干什么，调用流程是什么，明白了核心的方向并且理清思路之后再着手去做。说到这，头文件的作用各位可想而知。

那我来做这件事情的时候，我在理清思路之后，最终把整件事情划分为这样几个模块或步骤：

1.创建一个Android项目，创建的时候可选“”include c++ support“”，我说的是可选，这需要一定的cmake基础，不过也很简单，我决定用普通的项目，使用ndk-build

2.将vad模块的c/c++源码抽取出来，什么意思，就是把vad相关能用到的文件拿出来，网上有些博客有介绍，不过我认为时间比较早了，但是有一定的参考价值，在这里你可以根据头文件和c实现倒着往回找，或者参考其他博主的文章，找到一些核心的文件，根据文件的include引用，在找到其他文件

3.封装JNI，这需要有一定的jni基础，这里就不说了，这不是本章的重点，可自行上网学习

4.ndk-build出so库

5.编写android测试代码，比如写一个录音器，然后语音输入，调用jni，返回结果，vad的结果当然就是正在说话或者没有说话了，头文件中

1 - (Active Voice),0 - (Non-active Voice),

6.最后一步，enjoy！

在这里，贴一下vad的头文件源码

// Creates an instance to the VAD structure.
VadInst* WebRtcVad_Create();

// Frees the dynamic memory of a specified VAD instance.
//
// - handle [i] : Pointer to VAD instance that should be freed.
void WebRtcVad_Free(VadInst* handle);

// Initializes a VAD instance.
//
// - handle [i/o] : Instance that should be initialized.
//
// returns        : 0 - (OK),
//                 -1 - (null pointer or Default mode could not be set).
int WebRtcVad_Init(VadInst* handle);

// Sets the VAD operating mode. A more aggressive (higher mode) VAD is more
// restrictive in reporting speech. Put in other words the probability of being
// speech when the VAD returns 1 is increased with increasing mode. As a
// consequence also the missed detection rate goes up.
//
// - handle [i/o] : VAD instance.
// - mode   [i]   : Aggressiveness mode (0, 1, 2, or 3).
//
// returns        : 0 - (OK),
//                 -1 - (null pointer, mode could not be set or the VAD instance
//                       has not been initialized).
int WebRtcVad_set_mode(VadInst* handle, int mode);

// Calculates a VAD decision for the |audio_frame|. For valid sampling rates
// frame lengths, see the description of WebRtcVad_ValidRatesAndFrameLengths().
//
// - handle       [i/o] : VAD Instance. Needs to be initialized by
//                        WebRtcVad_Init() before call.
// - fs           [i]   : Sampling frequency (Hz): 8000, 16000, or 32000
// - audio_frame  [i]   : Audio frame buffer.
// - frame_length [i]   : Length of audio frame buffer in number of samples.
//
// returns              : 1 - (Active Voice),
//                        0 - (Non-active Voice),
//                       -1 - (Error)
int WebRtcVad_Process(VadInst* handle, int fs, const int16_t* audio_frame,
                      size_t frame_length);

// Checks for valid combinations of |rate| and |frame_length|. We support 10,
// 20 and 30 ms frames and the rates 8000, 16000 and 32000 Hz.
//
// - rate         [i] : Sampling frequency (Hz).
// - frame_length [i] : Speech frame buffer length in number of samples.
//
// returns            : 0 - (valid combination), -1 - (invalid combination)
int WebRtcVad_ValidRateAndFrameLength(int rate, size_t frame_length);

WebRtcVad_Create	创建VAD实例，生成实例并创建内存地址
WebRtcVad_Free	销毁VAD实例，参数为VAD实例，即内存地址
WebRtcVad_Init	初始化VAD，参数同上
WebRtcVad_set_mode	类似精度设置，参数为0-3
WebRtcVad_Process	核心处理方法，参数为实例，采样率，字节流，字节流长度
WebRtcVad_ValidRateAndFrameLength	检验参数有效组合，根据采样率与字节长度判断是否合法，此方法可不做重点，也可不用

上述方法依次调用为创建--初始化--数据处理--销毁资源

部分android代码