unity获取麦克风音量_Unity-麦克风检查是否静音

在Unity中,通过Microphone.Start方法记录音频并播放以检查用户是否停止说话。使用AudioSource获取SpectrumData,通过分析频谱数据的平均值判断是否有人在讲话。设置阈值来过滤噪声,当平均值低于该阈值时,认为用户已停止说话。此方法可以实现实时检测,并避免用户听到自己的声音,通过AudioMixer将音量设为-80来实现。
摘要由CSDN通过智能技术生成

bd96500e110b49cbb3cd949968f18be7.png

We use the standard method of recording audio in Unity:

_sendingClip = Microphone.Start(_device, true, 10, 16000);

where _sendingClip is the AudioClip and _device is the device name.

I'd like to know when the user stops speaking, which can happen after 2 seconds, or even 10.

I've looked at different sources to find an answer, but could not find one:

The idea is that when a user stops talking, the audio is send to a speech recognition server without a delay and without audio getting cut off when the user is still speaking.

Solutions don't need to be in code format. A general direction of where to look would be nice.

解决方案

You can send the recording audioclip to an AudioSource and play it using:

audioSource.clip = Microphone.Start(_device, true, 60, 16000);

while (!(Microphone.GetPosition(null) > 0)) { }

audioSource.Play();

When it is playing, you can get the SpectrumData from the audio. When the user is speaking the spectrumdata will show more peaks. You can check the average of the SpectrumData audio to determine if someone is speaking. You should set some sort of minimum level, as you will probably have some noise in the recordings. If the average of the spectrumdata is above the determined level, someone is speaking, if it's below that, the user stopped speaking.

float[] clipSampleData = new float[1024];

bool isSpeaking=false;

void Update(){

audioSource.GetSpectrumData(clipSampleData, 0, FFTWindow.Rectangular);

float currentAverageVolume = clipSampleData.Average();

if(currentAverageVolume>minimumLevel){

isSpeaking=true

}

else if(isSpeaking){

isSpeaking=false;

//volume below level, but user was speaking before. So user stopped speaking

}

}

You can put that check in the Update method, the spectrumdata will be the spectrumdata of the last frame. So it will be close to realtime.

The minimum level can be determined by just recording something silent, you can do that before the user needs to speak, or in a set-up kind of way.

With this solution the user will hear itself speak, you can set the output of the audiosource to the audiomixer, and put that volume to -80. So it will still recognize the data, but doesn't output the sound to the user. Setting the volume to 0 on the audioSource will give 0 spectrumdata, so use the audiomixer in that case.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值