《MASKCYCLEGAN-VC: LEARNING NON-PARALLEL VOICE CONVERSION WITH FILLING IN FRAMES》论文笔记

论文地址:https://ieeexplore.ieee.org/abstract/document/9414851
会议:ICASSP2021

Abstract

CycleGAN-VC3中使用的TFAN模块会大大增加计算量。作为替代,本文提出MaskCycleGAN-VC,它是CycleGAN-VC2的一种扩展,使用一种FIF(filling in frames)进行训练。使用FIF,可以将时域Mask应用于输入的Mel频谱图并且可以激励转换器根据周围的帧来填充丢失的帧。FIF能够以自监督的方式学习时频结构,无需其他模块。

As an alternative, we propose MaskCycleGAN-VC, which is another extension of CycleGAN-VC2 and is trained using a novel auxiliary task
called filling in frames (FIF). With FIF, we apply a temporal mask to the input mel-spectrogram and encourage the converter to fill in missing frames based on surrounding frames.This task allows the converter to learn time-frequency structures in a self-supervised manner and eliminates the need for an additional module such as TFAN.

1. Introduction

MaskCycleGAN-VC是CycleGAN-VC2的扩展,它使用filling in frames(FIF)进行训练。我们对输入的Mel频谱图应用时序Mask,并鼓励转换器根据周围的帧填充丢失的帧。
FIF允许转换网络通过补全过程以自我监督的方式学习时频特征结构
存在的问题:CycleGAN-VC2使用MCEP进行转换再重建,这会导致转换过程中时频信息丢失和无法使用神经网络声码器。提出的CycleGAN-VC3虽然能使用TFAN弥补时频损失的问题,但计算量过于庞大。

As an alternative, we propose MaskCycleGAN-VC, which is another extension of CycleGAN-VC2 and is trained using a novel auxiliary task called filling in frames (FIF). With FIF, we apply a temporal mask to the input mel-spectrogram and encourage the converter to fill in the missing frames based on the surrounding frames.
Similarly, FIF allows the converter to learn the time-frequency feature
structure in a self-supervised manner through a complementation process.

2. Conventional CycleGAN-VC2

对抗损失(adversarial loss):让转换后的特征 G X − Y ( x ) G_{X-Y}(x) GXY(x)与目标 y y y难以区分
在这里插入图片描述
循环一致性损失(

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值