silero-vad 官方新增了java 的demo

最新推荐文章于 2024-07-21 16:14:03 发布

java_lilin

最新推荐文章于 2024-07-21 16:14:03 发布

阅读量2k

点赞数 21

文章标签： java freeswitch silero-vad

本文链接：https://blog.csdn.net/Java_lilin/article/details/134707945

版权

原来参考android GitHub - gkonovalov/android-vad: Android Voice Activity Detection (VAD) library. Supports WebRTC VAD GMM, Silero VAD DNN, Yamnet VAD DNN models.的kt改写java demo 可费劲了

上个月 https://github.com/snakers4/silero-vad/tree/master/examples/java-example 官方新增了例子在java判断pcm 的静音简单了

package org.example;

import ai.onnxruntime.OrtException;
import javax.sound.sampled.*;
import java.util.Map;

public class App {

private static final String MODEL_PATH = "src/main/resources/silero_vad.onnx";
private static final int SAMPLE_RATE = 16000;
private static final float START_THRESHOLD = 0.6f;
private static final float END_THRESHOLD = 0.45f;
private static final int MIN_SILENCE_DURATION_MS = 600;
private static final int SPEECH_PAD_MS = 500;
private static final int WINDOW_SIZE_SAMPLES = 2048;

public static void main(String[] args) {
// Initialize the Voice Activity Detector
SlieroVadDetector vadDetector;
try {
vadDetector = new SlieroVadDetector(MODEL_PATH, START_THRESHOLD, END_THRESHOLD, SAMPLE_RATE, MIN_SILENCE_DURATION_MS, SPEECH_PAD_MS);
} catch (OrtException e) {
System.err.println("Error initializing the VAD detector: " + e.getMessage());
return;
}

// Set audio format
AudioFormat format = new AudioFormat(SAMPLE_RATE, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);

// Get the target data line and open it with the specified format
TargetDataLine targetDataLine;
try {
targetDataLine = (TargetDataLine) AudioSystem.getLine(info);
targetDataLine.open(format);
targetDataLine.start();
} catch (LineUnavailableException e) {
System.err.println("Error opening target data line: " + e.getMessage());
return;
}

// Main loop to continuously read data and apply Voice Activity Detection
while (targetDataLine.isOpen()) {
byte[] data = new byte[WINDOW_SIZE_SAMPLES];

int numBytesRead = targetDataLine.read(data, 0, data.length);
if (numBytesRead <= 0) {
System.err.println("Error reading data from target data line.");
continue;
}

// Apply the Voice Activity Detector to the data and get the result
Map<String, Double> detectResult;
try {
detectResult = vadDetector.apply(data, true);
} catch (Exception e) {
System.err.println("Error applying VAD detector: " + e.getMessage());
continue;
}

if (!detectResult.isEmpty()) {
System.out.println(detectResult);
}
}

// Close the target data line to release audio resources
targetDataLine.close();
}
}

运行加下onnx的 dll git 下载下

System.load("F:\\jar\\onnxruntime-win-x64-1.16.3\\lib\\onnxruntime.dll");

对应基于freeswitch 获取到的pcm数据判断静音就简单了

vadDetector.apply(data, true); 主要方法就是get float值

// Call the model to get the prediction probability of speech
float speechProb = 0;
try {
    speechProb = model.call(new float[][]{audioData}, samplingRate)[0];
} catch (OrtException e) {
    throw new RuntimeException(e);
}

有兴趣可以到https://item.taobao.com/item.htm?id=653611115230