【语音分离】基于平均谐波结构建模的无监督单声道音乐声源分离（Matlab代码实现）

wlz249

于 2024-05-24 11:29:34 发布

阅读量542

点赞数 8

文章标签： matlab 开发语言

本文链接：https://blog.csdn.net/weixin_66436111/article/details/139170540

版权

👨‍🎓个人主页：研学社的博客

💥💥💞💞欢迎来到本博客❤️❤️💥💥

🏆博主优势：🌞🌞🌞博客内容尽量做到思维缜密，逻辑清晰，为了方便读者。

⛳️座右铭：行百里者，半于九十。

📋📋📋本文目录如下：🎁🎁🎁

目录

💥1 概述

📚2 运行结果

🎉3 参考文献

🌈4 Matlab代码实现

💥1 概述

文献来源：

音乐信号的源分离是一个吸引人但困难的问题，尤其是在单通道的情况下。该文提出一种基于平均谐波结构建模的无监督单声道音乐声源分离算法。在窄音域演奏的假设下，一首乐曲中的不同谐波乐器源往往具有不同但稳定的谐波结构;因此，可以通过谐波结构模型来独特地表征源。给定乐器源的数量，该算法通过聚类从不同帧中提取的谐波结构，直接从混合信号中学习这些模型。然后使用模型从混合信号中提取相应的源。对合成器乐源、真实器乐源和歌声等多种混合信号的实验表明，该算法优于一般的非负矩阵分解（NMF）源分离算法，具有良好的主观聆听质量。作为副作用，该算法估计谐波乐器源的音高。还计算每帧中的并发声音数，这对于一般的多音高估计（MPE）算法来说是一项艰巨的任务。

原文摘要：

Abstract:

Source separation of musical signals is an appealing but difficult problem, especially in the single-channel case. In this paper, an unsupervised single-channel music source separation algorithm based on average harmonic structure modeling is proposed. Under the assumption of playing in narrow pitch ranges, different harmonic instrumental sources in a piece of music often have different but stable harmonic structures; thus, sources can be characterized uniquely by harmonic structure models. Given the number of instrumental sources, the proposed algorithm learns these models directly from the mixed signal by clustering the harmonic structures extracted from different frames. The corresponding sources are then extracted from the mixed signal using the models. Experiments on several mixed signals, including synthesized instrumental sources, real instrumental sources, and singing voices, show that this algorithm outperforms the general nonnegative matrix factorization (NMF)-based source separation algorithm, and yields good subjective listening quality. As a side effect, this algorithm estimates the pitches of the harmonic instrumental sources. The number of concurrent sounds in each frame is also computed, which is a difficult task for general multipitch estimation (MPE) algorithms.

在真实的音乐信号中，几个声源（例如歌声和乐器）是混合的。将单个声源与混合信号分离的任务称为声源分离。这项任务使从事其他应用程序（如信息检索、自动转录和结构化编码）的研究人员感兴趣，因为拥有分离良好的源可以简化他们的问题域。

声源分离问题可以按声源和传感器的数量进行分类。超定和确定的情况是传感器数量分别大于或等于源数量的情况。在这些情况下，独立成分分析（ICA）[1]-[3]和一些使用源统计[4]，[8]的方法可以取得良好的结果。然而，在处理传感器少于源的未确定情况下，它们会遇到困难。在这些情况下，一些最先进的方法采用声源稀疏性[5]，[6]或听觉线索[7]来解决问题。单通道源分离问题是确定不足的源分离问题的极端情况。第二节回顾了解决这一问题的一些方法。

根据所使用的信息，声源分离方法可分为有监督和无监督。监督方法通常需要源单独摘录来训练单个源模型 [8]–[17]，或整体分离模型参数 [18]，然后使用这些模型分离混合信号。无监督方法[19]-[23]，使用的信息较少，采用计算听觉场景分析（CASA）[24]，[25 ]线索，如和谐度和共同开始和偏移时间，来解决分离问题。此外，源信号和混合信号的统计特征，如非负性[26]，稀疏性[4]-[6]，或两者兼而有之[27]也被一些无监督的方法所采用。

在本文中，我们以无监督的方式处理了单通道音乐源分离问题。在这里，每个源都是一个单声道信号，一次最多有一个声音。研究发现，在音乐信号中，谐波结构是谐波乐器在窄音调范围内的近似不变特征。因此，这些仪器的谐波结构是从混合信号的每一帧的频谱中提取的。然后，我们通过对提取的结构进行聚类，给定乐器源的数量来学习平均谐波结构（AHS）模型，即单个乐器的典型谐波结构。使用这些模型，从混合信号中提取相应的源。我们注意到，这种分离算法不需要知道声源的音高。相反，它给出了多音高估计（MPE）结果作为副作用。该算法已经在合成和真实乐器以及歌声的几个混合信号上进行了测试。结果是有希望的。这个想法最早是在[29]中提出的。本文给出了估计F0s和提取谐波结构的不同公式，以及更详细的分析，实验和讨论。

📚2 运行结果

部分代码：

%% Parameters for peak extraction
peakThreshold = 50; % global amplitude threshold (dB)
peakThreshold_rel = 8; % local amplitude threshold (dB)
peakThreshold_freq_min = 60; % minimum frequency to look for a peak (Hz) 60Hz is approx. the first harmonic of 'C2'
peakThreshold_freq_max = 20000; % maximum frequency to look for a peak (Hz)
%movL = round(0.01*nfft); % moving average width (bins)
movL = 9;
typeSmoothing = 'Normal Moving Average'; % type of smoothing function used. Can either be 'Normal Moving Average' or 'Gaussian Moving Average'
%typeSmoothing = 'Gaussian Moving Average'; % type of smoothing function used. Can either be 'Normal Moving Average' or 'Gaussian Moving Average'
%sigma = (0.1*movL);
sigma = 5;
%% ---------- F0's Estimation ----------------
maxf0Num = numberSources; %-- the number of maximum F0s in each frame, highest possible number of sources in a frame
f0min_midi = note2midinum('C2'); %-- lowest possible frequency of F0 (midi number)
f0max_midi = note2midinum('B7'); %-- highest possible frequency of F0 (midi number)
searchRadius_midi = 0.5; % radius around each peak to search for f0s (midi)
f0step_midi = 0.1; % F0 search step (midi)

%% Parameters for calculating Harmonic Structures
maxHarm = 20; %-- The number of harmonics, i.e. harmonic structure feature dimensionality (20 default used)
Thresh_PeakF0Belong = 0.03; %-- The limit of interval (in frequency ratio, i.e., fpeak/k*f0) to decide if a peak is a harmonic or not (0.03 for half semi-tone)
normEnergy_dB = 100; %-- The total energy to normalize the harmonic structure of each F0 (dB)

🎉3 参考文献

部分理论来源于网络，如有侵权请联系删除。

[1]Z. Duan, Y. Zhang, C. Zhang and Z. Shi, "Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 4, pp. 766-778, May 2008, doi: 10.1109/TASL.2008.919073.

🌈4 Matlab代码实现

wlz249

关注

8
点赞
踩
25

收藏

觉得还不错? 一键收藏
0
评论
【语音分离】基于平均谐波结构建模的无监督单声道音乐声源分离（Matlab代码实现）

音乐信号的源分离是一个吸引人但困难的问题，尤其是在单通道的情况下。给定乐器源的数量，该算法通过聚类从不同帧中提取的谐波结构，直接从混合信号中学习这些模型。对合成器乐源、真实器乐源和歌声等多种混合信号的实验表明，该算法优于一般的非负矩阵分解（NMF）源分离算法，具有良好的主观聆听质量。因此，这些仪器的谐波结构是从混合信号的每一帧的频谱中提取的。然后，我们通过对提取的结构进行聚类，给定乐器源的数量来学习平均谐波结构（AHS）模型，即单个乐器的典型谐波结构。单通道源分离问题是确定不足的源分离问题的极端情况。
复制链接

扫一扫