基本思想是把时域信号转换到频域进行处理,处理完毕后再转回时域信号,具体算法可以参考:
基本谱减法去噪_speex 谱减法降噪 c++-CSDN博客
2020年5月10日补充:新增C#使用Speex降噪的代码,在文章最后
使用C#对语音信号降噪处理比较困难,查阅资料知道可以使用Webrtc或者speex进行降噪,不过核心思想都是把C++转成dll库供C#调用,由于对C++不是很熟悉,折腾了好久都没有实现,如果想了解一下,下面的文章可以参考一下:
Webrtc: https://www.cnblogs.com/mod109/p/5767867.html
https://www.cnblogs.com/Hard/p/csharp-use-webrtc-noisesuppression.html
speex::https://www.cnblogs.com/mod109/p/5744468.html
开源speex的语音增强(去噪)算法介绍_speex开源算法-CSDN博客
https://www.cnblogs.com/zhuweisky/archive/2010/09/16/1827896.html(这个是傲瑞科技的马甲)
speex源码: http://www.speex.org
http://zxy15914507674.gitee.io/shared_resource_name/speex-1.2beta3-win32.zip (这个源码我改动过,有点问题)
上面的链接被码云废掉了,直接去我的仓库下载:张祥裕/分享的资源名称,找打对应的
speex-1.2beta3-win32.zip 下载即可
国内在多人语音聊天中,能使用C#进行二次开发的公司有傲瑞科技傲瑞科技-即时通讯-企业即时通讯-视频会议-即时通信-信创国产化,但是要收钱,而且说好的提供源码的,屁都不是,核心的全部封装成dll了,网上的文章只不过是为了宣传它的公司的产品罢了
最后考虑使用python的librosa模块实现,采用WCF和XML-RPC的方式进行调用(本文并没有实现)
本文大部分内容转自:谱减法 语音去噪_#使用前三帧作为噪声估计-CSDN博客
测试环境:
window server 2012
Anaconda
步骤:
下面代码中的测试文件可以从这里下载:http://zxy15914507674.gitee.io/shared_resource_name/librosa资源文件.rar
上面的链接被码云废掉了,直接去我的仓库下载:张祥裕/分享的资源名称,找打对应的
libtosa资源文件.rar 下载即可
1 安装librosa模块,参考:音频处理库—librosa的安装与使用-CSDN博客
由于我使用的的Anaconda,所以使用命令
conda install -c conda-forge librosa
进行安装
2 当报NoBackendError这样的错误时,还需要安装ffmpeg模块,输入下面的命令
conda install ffmpeg -c conda-forge
3 输入代码如下:
import numpy as np
import librosa
import scipy
from scipy import io
class SpecSub(object):
def __init__(self, input_wav):
self.data, self.fs = librosa.load(input_wav, sr=None, mono=True)
self.noise_frame = 3 # 使用前三帧作为噪声估计
self.frame_duration = 200/1000 # 200ms 帧长
self.frame_length = np.int(self.fs * self.frame_duration)
self.fft = 2048 # 2048点fft
def main(self):
noise_data = self.get_noise_data()
oris = librosa.stft(self.data, n_fft=self.fft) # Short-time Fourier transform,
mag = np.abs(oris) # get magnitude
angle = np.angle(oris) # get phase
ns = librosa.stft(noise_data, n_fft=self.fft)
mag_noise = np.abs(ns)
mns = np.mean(mag_noise, axis=1) # get mean
sa = mag - mns.reshape((mns.shape[0], 1)) # reshape for broadcast to subtract
sa0 = sa * np.exp(1.0j * angle) # apply phase information
y = librosa.istft(sa0) # back to time domain signal
scipy.io.wavfile.write('./output.wav', self.fs, (y * 32768).astype(np.int16)) # save signed 16-bit WAV format
def get_noise_data(self):
noise_data = self.data[0:self.frame_length]
for i in range(1, self.noise_frame):
noise_data = noise_data + self.data[i*self.frame_length:(i+1)*self.frame_length]
noise_data = noise_data / self.noise_frame
return noise_data
ss = SpecSub('./test.wav')
ss.main()
print('done')
输出的效果还算不错,但发现1M不到的音频文件降噪后变成3M多的音频文件,在实时语音聊天中,这明显不符合要求,而且该模块读入的是待处理的音频文件,而不是字节流,这意味着C#发送过来的音频数据(字节数组形式的数组)只能还原为音频文件才能给python进行处理,这明显是不行的,不知你有什么好的办法,请多多指教。
2020年5月10日补充:
使用C#封装C++语言实现的Speex
Demo源码下载:http://zxy15914507674.gitee.io/shared_resource_name/speexdsp-1.2rc3.rar
上面的链接被码云废掉了,直接去我的仓库下载:张祥裕/分享的资源名称,找打对应的
speexdsp-1.2rc3.rar 下载即可
Demo源码目录结构:
具体封装的细节请参考我写的博客:三种C++转C#的方法(带指针转换、Demo源码)-CSDN博客
注意:下面的代码只适用于标准的wav格式的音频,那些把mp3后缀改为wav后缀的音频文件也能播放的不行,因为文件头不一样
C++核心代码如下:
// SpeexWinProj.cpp : 定义控制台应用程序的入口点。
//
#include "stdafx.h"
#ifdef HAVE_CONFIG_H
#include "config.h"
#endif
#include "speex/speex_jitter.h"
#include "speex/speex_echo.h"
#include "speex/speex_preprocess.h"
#include "speex/speex_resampler.h"
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#define HEADLEN 44
#define SAMPLE_RATE (48000)
#define SAMPLES_PER_FRAME (1024)
#define FRAME_SIZE (SAMPLES_PER_FRAME * 1000/ SAMPLE_RATE)
#define FRAME_BYTES (SAMPLES_PER_FRAME)
union jbpdata {
unsigned int idx;
unsigned char data[4];
};
void synthIn(JitterBufferPacket *in, int idx, int span) {
union jbpdata d;
d.idx = idx;
in->data = (char*)d.data;
in->len = sizeof(d);
in->timestamp = idx * 10;
in->span = span * 10;
in->sequence = idx;
in->user_data = 0;
}
void jitterFill(JitterBuffer *jb) {
char buffer[65536];
JitterBufferPacket in, out;
int i;
out.data = buffer;
jitter_buffer_reset(jb);
for(i=0;i<100;++i) {
synthIn(&in, i, 1);
jitter_buffer_put(jb, &in);
out.len = 65536;
if (jitter_buffer_get(jb, &out, 10, NULL) != JITTER_BUFFER_OK) {
printf("Fill test failed iteration %d\n", i);
}
if (out.timestamp != i * 10) {
printf("Fill test expected %d got %d\n", i*10, out.timestamp);
}
jitter_buffer_tick(jb);
}
}
void TestJitter()
{
char buffer[65536];
JitterBufferPacket in, out;
int i;
JitterBuffer *jb = jitter_buffer_init(10);
out.data = buffer;
/* Frozen sender case */
jitterFill(jb);
for(i=0;i<100;++i) {
out.len = 65536;
jitter_buffer_get(jb, &out, 10, NULL);
jitter_buffer_tick(jb);
}
synthIn(&in, 100, 1);
jitter_buffer_put(jb, &in);
out.len = 65536;
if (jitter_buffer_get(jb, &out, 10, NULL) != JITTER_BUFFER_OK) {
printf("Failed frozen sender resynchronize\n");
} else {
printf("Frozen sender: Jitter %d\n", out.timestamp - 100*10);
}
return ;
}
///降噪的方法,第一个参数为需要进行降噪的文件名,第二个参数是降噪完毕后输出的文件名
void TestNoise(char *pSrcFile,char *pDenoiseFile)
{
size_t n = 0;
FILE *inFile, *outFile;
fopen_s(&inFile, pSrcFile, "rb");
fopen_s(&outFile,pDenoiseFile, "wb");
char *headBuf = (char*)malloc(HEADLEN);
char *dataBuf = (char*)malloc(FRAME_BYTES * 2 );
memset(headBuf, 0, HEADLEN);
memset(dataBuf, 0, FRAME_BYTES);
assert(headBuf != NULL);
assert(dataBuf != NULL);
SpeexPreprocessState *state = speex_preprocess_state_init(1024, SAMPLE_RATE);
int denoise = 1;
int noiseSuppress = -25;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DENOISE, &denoise);
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_NOISE_SUPPRESS, &noiseSuppress);
int i;
i = 0;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC, &i);
i = 80000;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC_LEVEL, &i);
i = 0;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB, &i);
float f = 0;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_DECAY, &f);
f = 0;
speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_LEVEL, &f);
bool flag = true;
while (1)
{
//直接读入音频文件前44个字节并写入目标文件,因为前44个字节表示头文件,不能进行降噪,不然会打不开
if (flag == true)
{
flag = false;
n = fread(headBuf, 1, HEADLEN, inFile);
if (n == 0)
break;
fwrite(headBuf, 1, HEADLEN, outFile);
}
else
{
//每次读入1024个字节
n = fread(dataBuf, 1, SAMPLES_PER_FRAME, inFile);
if (n == 0)
break;
//对读入的1024个字节进行降噪
speex_preprocess_run(state, (spx_int16_t*)(dataBuf));
//写入降噪后的1024个字节
fwrite(dataBuf, 1, SAMPLES_PER_FRAME, outFile);
}
}
free(headBuf);
free(dataBuf);
fclose(inFile);
fclose(outFile);
speex_preprocess_state_destroy(state);
}
//本来是研究传入和传出都是字节数组的,这样就可以远程传输了,奈何对C++不熟悉,失败了
void TestNoise_Buffer(char *input,char *output,int fileSize)
{
// SpeexPreprocessState *state = speex_preprocess_state_init(1024, SAMPLE_RATE);
// int denoise = 1;
// int noiseSuppress = -25;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DENOISE, &denoise);
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_NOISE_SUPPRESS, &noiseSuppress);
//
// int i;
// i = 0;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC, &i);
// i = 80000;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_AGC_LEVEL, &i);
// i = 0;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB, &i);
// float f = 0;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_DECAY, &f);
// f = 0;
// speex_preprocess_ctl(state, SPEEX_PREPROCESS_SET_DEREVERB_LEVEL, &f);
//
// char *dataBuf = (char*)malloc(FRAME_BYTES * 2 );
// memset(dataBuf, 0, FRAME_BYTES);
//
// int numCount=0;
//
//
// while (numCount<fileSize-1024)
// {
// for(int i=0;i<1024;i++)
// {
// *(dataBuf++)=*(input++);
// }
// speex_preprocess_run(state,(spx_int16_t*)dataBuf);
// for(int i=0;i<1024;i++)
// {
// *(output++)=*(dataBuf++);
// }
// numCount=numCount+1024;
//
// }
//
// free(dataBuf);
//
// speex_preprocess_state_destroy(state);
//
//
}
void _TestEcho(char *pSrcFile,char *pEchoFile,char *pAudioFile)
{
#define NN_ECHO 128
#define TAIL 1024
FILE *echo_fd, *ref_fd, *e_fd;
short echo_buf[NN_ECHO], ref_buf[NN_ECHO], e_buf[NN_ECHO];
SpeexEchoState *st;
SpeexPreprocessState *den;
int sampleRate = 8000;
echo_fd = fopen(pSrcFile, "rb");
ref_fd = fopen(pEchoFile, "rb");
e_fd = fopen(pAudioFile, "wb");
st = speex_echo_state_init(NN_ECHO, TAIL);
den = speex_preprocess_state_init(NN_ECHO, sampleRate);
speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate);
speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st);
while (!feof(ref_fd) && !feof(echo_fd))
{
fread(ref_buf, sizeof(short), NN_ECHO, ref_fd);
fread(echo_buf, sizeof(short), NN_ECHO, echo_fd);
speex_echo_cancellation(st, ref_buf, echo_buf, e_buf);
speex_preprocess_run(den, e_buf);
fwrite(e_buf, sizeof(short), NN_ECHO, e_fd);
}
speex_echo_state_destroy(st);
speex_preprocess_state_destroy(den);
fclose(e_fd);
fclose(echo_fd);
fclose(ref_fd);
}
int TestResampler()
{
#define NNTR 256
spx_uint32_t i;
short *in;
short *out;
float *fin, *fout;
int count = 0;
SpeexResamplerState *st = speex_resampler_init(1, 8000, 12000, 10, NULL);
speex_resampler_set_rate(st, 96000, 44100);
speex_resampler_skip_zeros(st);
in = (short*)malloc(NNTR*sizeof(short));
out = (short*)malloc(2*NNTR*sizeof(short));
fin = (float*)malloc(NNTR*sizeof(float));
fout = (float*)malloc(2*NNTR*sizeof(float));
while (1)
{
spx_uint32_t in_len;
spx_uint32_t out_len;
fread(in, sizeof(short), NNTR, stdin);
if (feof(stdin))
break;
for (i=0;i<NNTR;i++)
fin[i]=in[i];
in_len = NNTR;
out_len = 2*NNTR;
/*if (count==2)
speex_resampler_set_quality(st, 10);*/
speex_resampler_process_float(st, 0, fin, &in_len, fout, &out_len);
for (i=0;i<out_len;i++)
out[i]=floor(.5+fout[i]);
/*speex_warning_int("writing", out_len);*/
fwrite(out, sizeof(short), out_len, stdout);
count++;
}
speex_resampler_destroy(st);
free(in);
free(out);
free(fin);
free(fout);
return 0;
}
其中的void TestNoise(char *pSrcFile,char *pDenoiseFile)方法是降噪的方法,也是需要封装的方法,其它方法没有研究
C#代码:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;
namespace CSharpCall
{
class Program
{
//加载dll库,参数为dll库的名称,返回句柄
[DllImport("kernel32")]
public static extern IntPtr LoadLibrary(string lpFileName);
//通过句柄释放dll库
[DllImport("Kernel32")]
public static extern bool FreeLibrary(IntPtr handle);
//根据函数名输出库函数,返回函数的指针
[DllImport("Kernel32")]
public static extern IntPtr GetProcAddress(IntPtr handle, String funcname);
[UnmanagedFunctionPointerAttribute(CallingConvention.Cdecl)]
unsafe public delegate void TestNoise_delegate(char* pSrcFile, char* pDenoiseFile);
//[UnmanagedFunctionPointerAttribute(CallingConvention.Cdecl)]
// unsafe public delegate void TestNoise_Buffer_delegate(char *input,char *ouput,int fileSize);
unsafe static void Main(string[] args)
{
//加载c++对应的dll库
IntPtr dll = LoadLibrary("SpeexWinProj.dll");
IntPtr TestNoise_func = GetProcAddress(dll, "TestNoise");
//根据库函数TestNoise_func获取委托实例
TestNoise_delegate TestNoise = (TestNoise_delegate)Marshal.GetDelegateForFunctionPointer(TestNoise_func, typeof(TestNoise_delegate));
string fileNameInput = "test1.wav";
char* fileName_Iniput = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameInput).ToPointer();
string fileNameOutPut = "out.wav";
char* fileName_Output = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameOutPut).ToPointer();
TestNoise(fileName_Iniput, fileName_Output);
Console.WriteLine("转换完成");
//IntPtr TestNoise_Buffer_func = GetProcAddress(dll, "TestNoise_Buffer");
根据库函数TestNoise_func获取委托实例
//TestNoise_Buffer_delegate TestNoise_Buffer = (TestNoise_Buffer_delegate)Marshal.GetDelegateForFunctionPointer(TestNoise_Buffer_func, typeof(TestNoise_Buffer_delegate));
//FileStream fs = new FileStream("test1.wav", FileMode.Open);
//byte []fileBuffer=new byte[fs.Length];
//fs.Read(fileBuffer, 0, fileBuffer.Length);
//fs.Close();
//int sampleRate = 1024;
//byte[] outbuffer = new byte[fileBuffer.Length];
//for (int i = 0; i < 44; i++)
//{
// outbuffer[i]=fileBuffer[i];
//}
//byte[] input = new byte[fileBuffer.Length - 44];
//byte[] output = new byte[fileBuffer.Length];
//for (int i = 0; i <input.Length; i++)
//{
// input[i] = fileBuffer[i+44];
//}
//char* inPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(input,0).ToPointer();
//char* outPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(output, 0).ToPointer();
//TestNoise_Buffer(inPut, outPut, input.Length);
[MarshalAs(UnmanagedType.LPArray)] byte[]
//for (int i = 0; i < output.Length; i++)
//{
// fileBuffer[i + 44] = Convert.ToByte(*(outPut++)) < 255 ? Convert.ToByte(*(outPut++)) : Convert.ToByte(254);
//}
//FileStream fw = new FileStream("out1.wav", FileMode.Create);
//fw.Write(outbuffer, 0, outbuffer.Length);
//fw.Close();
//Console.WriteLine("转换完成");
Console.ReadKey();
}
}
}
使用Speex处理音频的字节数组,可以参考:C#封装C++编写的Speex实现wav音频降噪(字节数组)_c++降噪-CSDN博客
其中对数据类型转型的核心方法:
1 把字符串转换为指针
string fileNameInput = "test1.wav";
char* fileName_Iniput = (char*)System.Runtime.InteropServices.Marshal.StringToHGlobalAnsi(fileNameInput).ToPointer();
2 把字节数组转为指针
byte[] input = new byte[1024];
char* inPut = (char*)System.Runtime.InteropServices.Marshal.UnsafeAddrOfPinnedArrayElement(input,0).ToPointer();
好了,本文到此结束,如果本文对你有帮助,资助2毛钱作为鼓励呗,穷逼一个,就当筹个网费吧