论文地址:https://ieeexplore.ieee.org/abstract/document/9414851
会议:ICASSP2021
Abstract
CycleGAN-VC3中使用的TFAN模块会大大增加计算量。作为替代,本文提出MaskCycleGAN-VC,它是CycleGAN-VC2的一种扩展,使用一种FIF(filling in frames)进行训练。使用FIF,可以将时域Mask应用于输入的Mel频谱图并且可以激励转换器根据周围的帧来填充丢失的帧。FIF能够以自监督的方式学习时频结构,无需其他模块。
As an alternative, we propose MaskCycleGAN-VC, which is another extension of CycleGAN-VC2 and is trained using a novel auxiliary task
called filling in frames (FIF). With FIF, we apply a temporal mask to the input mel-spectrogram and encourage the converter to fill in missing frames based on surrounding frames.This task allows the converter to learn time-frequency structures in a self-supervised manner and eliminates the need for an additional module such as TFAN.
1. Introduction
MaskCycleGAN-VC是CycleGAN-VC2的扩展,它使用filling in frames(FIF)进行训练。我们对输入的Mel频谱图应用时序Mask,并鼓励转换器根据周围的帧填充丢失的帧。
FIF允许转换网络通过补全过程以自我监督的方式学习时频特征结构
存在的问题:CycleGAN-VC2使用MCEP进行转换再重建,这会导致转换过程中时频信息丢失和无法使用神经网络声码器。提出的CycleGAN-VC3虽然能使用TFAN弥补时频损失的问题,但计算量过于庞大。
As an alternative, we propose MaskCycleGAN-VC, which is another extension of CycleGAN-VC2 and is trained using a novel auxiliary task called filling in frames (FIF). With FIF, we apply a temporal mask to the input mel-spectrogram and encourage the converter to fill in the missing frames based on the surrounding frames.
Similarly, FIF allows the converter to learn the time-frequency feature
structure in a self-supervised manner through a complementation process.
2. Conventional CycleGAN-VC2
对抗损失(adversarial loss):让转换后的特征 G X − Y ( x ) G_{X-Y}(x) GX−Y(x)与目标 y y y难以区分
循环一致性损失(