【语音处理】用于音频盲源分离的谐波矢量分析 (HVA)(Matlab代码实现)

  👨‍🎓个人主页:研学社的博客 

💥💥💞💞欢迎来到本博客❤️❤️💥💥

🏆博主优势:🌞🌞🌞博客内容尽量做到思维缜密,逻辑清晰,为了方便读者。

⛳️座右铭:行百里者,半于九十。

📋📋📋本文目录如下:🎁🎁🎁

目录

💥1 概述

📚2 运行结果

🎉3 参考文献

 🌈4 Matlab代码、文章


💥1 概述

该文提出基于音频盲源分离(BSS)通用算法框架的谐波矢量分析(HVA)。卷积音频混合的BSS通常在麦克风和源的数量相等时通过多通道线性滤波执行(确定的情况)。本文讨论了这种基于批处理的确定的BSS。为了估计解混滤波器,对源信号进行有效建模非常重要。一个成功的例子是独立矢量分析(IVA),它通过每个源中频率分量之间的共现来对信号进行建模。为了给源建模提供更多的自由度,本文提出了确定BSS的一般框架。它基于使用原始双分离算法的即插即用方案,使我们能够通过时频模板隐式地对源信号进行建模。通过使用所提出的框架,可以通过设计增强源信号的掩码来开发确定的BSS算法。作为其应用的一个例子,我们通过定义一个时频掩码来提出HVA,该时频掩模通过头谱的稀疏性增强音频信号的谐波结构。实验表明,HVA在语音和音乐信号方面均优于IVA和独立低秩矩阵分析(ILRMA)。

详细文章见第4部分。

📚2 运行结果

 部分代码:

function [result,w] = HVA(signal,param)

[spec,param] = STFT(signal,param);
param.signalSpec = spec;
spec = whitening(spec);

[w,y,X,Xt,M,mu1,mu2,alpha,param] = initialization(spec,param);
for k = 1:param.iterNum
    wOld = w;
    yOld = y;
    w = proxLogDet(w - mu1*mu2*Xt(y), mu1);
    z = y + X(2*w - wOld);
    y = (1 - M(z)) .* z;
    y = alpha*y + (1-alpha)*yOld;
    w = alpha*w + (1-alpha)*wOld;
end

sep = backProjection(X(w),param);
result = iSTFT(sep,param);
result(size(signal,1)+1:end,:) = [];
end

%%% Local Functions %%%

function w = proxLogDet(w,mu)
for f = 1:size(w,3)
    [U,S,V] = svd(w(:,:,f));
    s = diag(S);
    s = (s + sqrt(s.^2 + 4*mu))/2;
    w(:,:,f) = U*diag(s)*V';
end
end

function M = HVAmask(x,lambda,kappa,fftn)
y = log(abs(x) + 1e-3);
meanY = mean(y,3);
y = y - meanY;
z = fft(y,fftn,3)/size(y,3);
M = min(1,abs(z)/lambda);
for n = 1:kappa
    M = (1-cos(pi*M))/2;
end
z = M .* z;
z = ifft(z,[],3,'symmetric')*size(y,3);
y = z(:,:,1:size(y,3));
y = y + meanY;
y = exp(2*y);
M = (y ./ sum(y)).^(1/size(y,1));
end

function y = filt3D(w,x,y)
for f = 1:size(w,3)
    y(:,:,f) = w(:,:,f) * x(:,:,f);
end
end

function w = filt3Dtranspose(y,x,w)
for m = 1:size(x,1)
    w(:,m,:) = sum(conj(x(m,:,:)).*y,2);
end
end

%%% Initialization %%%

function [w1,y1,X,Xt,mask,mu1,mu2,alpha,param] = initialization(x,param)
w1 = [eye(param.sourceNum) zeros(param.sourceNum,size(x,1)-param.sourceNum)];
w1 = repmat(w1,1,1,size(x,3)); % initial w

y1 = zeros(size(x)); % initial y (=0)

zeroW = zeros(size(w1)); zeroY = zeros(size(w1,1),size(x,2),size(x,3));

x = x/operatorNorm(x,w1,zeroW,zeroY); % normalized X

X  = @(w) filt3D(w,x,zeroY);          % multiplication of X
Xt = @(y) filt3Dtranspose(y,x,zeroW); % multiplication of Hermite transpose of X

mask = @(z) HVAmask(z,2*param.lambda,param.kappa,2^nextpow2(size(x,3)));

mu1 = param.mu1;
mu2 = param.mu2;
alpha = param.alpha;
end

function opNorm = operatorNorm(x,w,zw,zy)
opt.issym = true; opt.isreal = true;
XtX = @(z) reshape(filt3Dtranspose(filt3D(reshape(z,size(w)),x,zy),x,zw),[],1);
opNorm = sqrt(eigs(XtX,numel(w),1,'lm',opt));
end

%%% Audio BSS Tools %%%

function [X,param] = STFT(sig,param)
wLen = param.STFTwindowLength;
skip = param.STFTshiftSize;

x = [zeros(wLen,size(sig,2)); sig; zeros(wLen,size(sig,2))];
win = hann(wLen,'periodic');
param.window = win;

idx = (1:wLen)' + (0:skip:length(x));
x = [x; zeros(max(idx(:))-length(x),size(x,2))];
idx = idx + length(x)*reshape((0:size(x,2)-1),1,1,[]);

X = fft(win.*x(idx));
X = X(1:floor(wLen/2)+1,:,:);
X = permute(X,[3 2 1]);
end

function x = iSTFT(X,param)
win = param.window;
skip = param.STFTshiftSize;

X = ipermute(X,[3 2 1]);
win = calcCanonicalDualWindow(win,skip);
X = win.*ifft([X;zeros(size(X)-[2 0 0])],'symmetric'); [X1,X2,X3] = size(X);

vec = @(x) x(:);
X = sparse( ...
    vec((1:X1)' + skip*(0:X2-1) + (skip*X2+X1-skip)*reshape(0:X3-1,1,1,[])), ...
    vec(repmat(1:X2,X1,1,X3)), X(:) );
x = reshape(full(sum(X,2)),[],X3);
x(1:length(win),:) = [];
end

function dualWin = calcCanonicalDualWindow(win,skip)
dualWin = [win; zeros(lcm(skip,length(win))-length(win),1)];
dualWin = reshape(dualWin,skip,[]);
dualWin = dualWin./sum(abs(dualWin).^2,2);
dualWin = reshape(dualWin(1:length(win)),[],1);
end

function X = whitening(X)
for f = 1:size(X,3)
    [U,S,~] = svd(X(:,:,f),'econ');
    X(:,:,f) = U*((1./diag(S)).*(U'*X(:,:,f)));
end
end

function y = backProjection(y,param)
BP = @(y,s) ((s*y')/(y*y')).'.*y;
for f = 1:size(y,3)
    y(:,:,f) = BP(y(:,:,f),param.signalSpec(param.refMicIndex,:,f));
end
end

🎉3 参考文献

部分理论来源于网络,如有侵权请联系删除。

[1]K. Yatabe and D. Kitamura, "Determined BSS Based on Time-Frequency Masking and Its Application to Harmonic Vector Analysis," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1609-1625, 2021, doi: 10.1109/TASLP.2021.3073863.、

 🌈4 Matlab代码、文章

  • 7
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值