AXE模式隐私号基于语音流分析的用户接听识别方案

本文链接：https://blog.csdn.net/weixin_37697242/article/details/122856754

该博客介绍了如何使用TarsosDSP库进行语音活动检测（VAD），以识别并区分回铃音、彩铃和用户说话的波形段。通过对波形的分析和静音检测，实现对450Hz嘟声和连续音乐特征的匹配，从而确定用户接听电话的时刻。在实际应用中，这一方法能够有效地在无接通回调的情况下，识别出用户接听电话的情况。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

背景

在使用AXE模式隐私号外呼用户时发现几家隐私号服务提供商并不是都有接通回调可以设置
所以需要设置通用的用户接听识别方案(录音和播报欢迎语等场景)

目的

在接入语音模型训练之前通过波形准确识别嘟嘟嘟和彩铃覆盖90%以上的case

调研

VAD:
语音活动检测(Voice Activity Detection,VAD)又称语音端点检测,语音边界检测。目的是从声音信号流里识别和消除长时间的静音期，以达到在不降低业务质量的情况下节省话路资源的作用，它是IP电话应用的重要组成部分。静音抑制可以节省宝贵的带宽资源，可以有利于减少用户感觉到的端到端的时延。

TarsosDSP:
git 地址: https://github.com/JorenSix/TarsosDSP
TarsosDSP is a Java library for audio processing. Its aim is to provide an easy-to-use interface to practical music processing algorithms implemented, as simply as possible, in pure Java and without any other external dependencies. The library tries to hit the sweet spot between being capable enough to get real tasks done but compact and simple enough to serve as a demonstration on how DSP algorithms works. TarsosDSP features an implementation of a percussion onset detector and a number of pitch detection algorithms: YIN, the Mcleod Pitch method and a “Dynamic Wavelet Algorithm Pitch Tracking” algorithm. Also included is a Goertzel DTMF decoding algorithm, a time stretch algorithm (WSOLA), resampling, filters, simple synthesis, some audio effects, and a pitch shifting algorithm.

回铃音:
表示被叫用户处于被振铃状态，采用频率为450±25Hz的交流电源，发送电平为-10±3dBm，它是5s断续的信号音，即1s送，4s断，与振铃音一致。
在这里插入图片描述
彩铃音:
连续不间断的音乐波形

思路

在这里插入图片描述
根据对波形的分析从左到右分为三段分别为

“请输入四位分机号以#号键结束”
“振铃嘟嘟嘟”
“用户说话”

所以目的分为三步
4. 跳过特定时长绕过输入分机号的播报
5. 对沉默后的第一段活跃做检测去匹配彩铃特征或者嘟声特征
6. 找到跳出特征的时刻就是用户接听的时刻

代码实现

使用TarsosDSP提供的静音检测能力和频率识别能力
注意要自己引入一下依赖 tarsos包在上面调研的tarsos介绍的git地址里

调用:

 public static void main (String[] args){
   

        PickUp pickUp = new PickUp("xxx.wav", 8000, 16, 1000, 4500);
        pickUp.start();
        System.exit(-1);
    }

PickUp:

package xxx;

import be.tarsos.dsp.AudioDispatcher;
import be.tarsos.dsp.AudioEvent;
import be.tarsos.dsp.AudioProcessor;
import be.tarsos.dsp.SilenceDetector;
import be.tarsos.dsp.io.TarsosDSPAudioFloatConverter;
import be.tarsos.dsp.io.TarsosDSPAudioFormat;
import be.tarsos.dsp.io.UniversalAudioInputStream;
import be.tarsos.dsp.pitch.PitchDetectionHandler;
import be.tarsos.dsp.pitch.PitchDetectionResult;
import be.tarsos.dsp.pitch.PitchProcessor;
import java.io.*;
import java.util.concurrent.ConcurrentLinkedQueue;

public class PickUp {
   

    public enum RingbackType {
   
        UNCHECK,DU_NORMALITY,DU_OTHER,SONG;
    }

    private ConcurrentLinkedQueue<byte[]> audioQueue = new ConcurrentLinkedQueue<byte[]>();
    private boolean isFinishReadFile = false; // 是否读取完文件
    private String filePath;
    private String fileName;