【语音转换:客观评价MCD如何计算?】

计算MCD值

写在前面:感谢github作者Lukelluke,更为详细的参考可点击:Lukelluke

  1. 准备mcdmerlin-master

  2. 准备源语音和转录的语音。 创建两个文件夹,分别存放原语音和转录的语音。源语音和转录的语音要一一对应,文件名一定要相同,否则无法计算。

    mkdir org
    mkdir convert
    
  3. 获取mgc,bap,lf0文件。

     cd merlin-master/egs/voice_conversion/s1/
     ./01_setup.sh sperakera speakerb
    

    sperakera、speakerb会建在database文件夹下面,将org中的源语音文件和convert中的转换的语音分别拷贝到sperakera和speakerb中, 然后执行以下命令:

    ./02_....sh database/sperakera database/sperakera_extract
    ./02_....sh database/sperakerb database/sperakerb_extract
    

    将会把.mgc, .bap, .lf0 三类文件分别提取到sperakera_extract和sperakerb_extract中。
    提取完成后:
    (1)将源语音(也就是sperakera_extract文件夹下)的.mgc文件复制到 mcd/test_data/ref-examples下,
    (2)将转录的语音(也就是sperakerb_extract文件夹下).mgc .bap .lf0三种类型的文件复制到mcd/test_data/synth-examples下

  4. 计算MCD
    将所有源语音和转录语音相互对应的文件名,写入到mcd/test_data/corpus.lst中。 然后执行命令:

    cat test_data/corpus.lst | xargs bin/dtw_synth test_data/ref-examples test_data/synth-examples out
    

    即可计算

corpus.lst文件参考示例:

	p229_p362_081
	p260_p343_386

只有文件名,不带后缀,且保证源文件和转录语音文件名相同

注: 如果报错,可以尝试修改mcd/bin/dtw_synth中import htk_io.vecseq as vsio的vecseq。ctrl点击进去修改即可
def readFile(self, vecSeqFile):
    """Reads a raw vector sequence file.
    
    The dtype of the returned numpy array is always the numpy default
    np.float, which may be 32-bit or 64-bit depending on architecture, etc.
    """
    Vec = np.fromfile(vecSeqFile, dtype=self.dtypeFile)
    lengthOfVec = len(Vec)
    misLenToPad = lengthOfVec % self.vecSize
    means = np.mean(Vec)

    for i in range(misLenToPad):
        Vec = np.insert(Vec, lengthOfVec, means)

    return np.reshape(
        Vec,
        (-1, self.vecSize)
    ).astype(np.float)

    # return np.reshape(
    #     np.fromfile(vecSeqFile, dtype=self.dtypeFile),
    #     (-1, self.vecSize)
    # ).astype(np.float)
根据转换的文件,复制多份对应的源文件,使其文件名对应
# python2
def mycopy3():
    org_path = "/mnt/hgfs/VmwareShare/mcd/org"
    opt4_path = "/mnt/hgfs/VmwareShare/mcd/test"
    opt4_outpath = "/mnt/hgfs/VmwareShare/mcd/test_output"

    for wav in os.listdir(org_path):
        name1 = wav
        # print name1
        for con_name in os.listdir(opt4_path):
            name2 = con_name.split('_')
            print name2
            name3 = name2[1].strip("C") + "_" + name2[2] + ".wav"
            print name3
            if name3 == name1:
                shutil.copy(os.path.join(org_path, name1), os.path.join(opt4_outpath, con_name))

列出不带.wav后缀的文件名
# python2
def list_filename2():
    org_path = "/home/ubuntu/Downloads/merlin-master/egs/voice_conversion/s1/database/speakerb"
    for filename in os.listdir(org_path):
        print filename.strip(".wav")

更新:

之前我遇到的一个错误,感觉大家可能也会遇到:

IndexError: index301 is out of bounds for axis 0 with size 301

也就是下面这段错误,已经解决

processing source_p231_p231_220
MCD = 10.077967 (600 frames)
warping 452 frames -> 600 frames (148 repeated, 0 dropped)

Traceback (most recent call last):
  File "bin/dtw_synth", line 128, in <module>
    main(sys.argv)
  File "bin/dtw_synth", line 120, in main
    synthFullWarped = dtw.warpGeneral(synthFull, synthIndexSeq)
  File "/home/zhai/anaconda3/envs/mcd3/lib/python2.7/site-packages/mcd/dtw.py", line 160, in warpGeneral
    ysWarped = ys[yIndexSeq]
IndexError: index 301 is out of bounds for axis 0 with size 301

请前往github查看问题描述和详细解答:点击此处,github链接
或者直接参考如下:
1、我的音频是16k的。
2、我已经修改过了vecseq.py,请确保已经按照说明修改的。
3、我发现这个错误是发生在bin/dtw_synth中,所以我尝试使用try-catch捕获错误。需要在bin/dtw_synth的for uttId in args.uttIds:加入try-catch。我个人认为这样加入捕获错误是不会影响计算的MCD结果的。
以下是我的bin/dtw_synth的for uttId in args.uttIds:代码:

for uttId in args.uttIds:
        try:
            print 'processing', uttId
            s2 = str(uttId)
            nat = getNatVecSeq(uttId)
            synth = getSynthVecSeq(uttId)
            # ignore 0th cepstral component
            nat = nat[:, 1:]
            synth = synth[:, 1:]

            minCost, path = dtw.dtw(nat, synth, costFn)
            frames = len(nat)

            minCostTot += minCost
            framesTot += frames

            print 'MCD = %f (%d frames)' % (minCost / frames, frames)
            s1 = 'MCD = %f (%d frames)' % (minCost / frames, frames)

            pathCosts = [costFn(nat[i], synth[j]) for i, j in path]
            synthIndexSeq = dtw.projectPathBestCost(path, pathCosts)
            assert len(synthIndexSeq) == len(nat)

            uniqueFrames = len(set(synthIndexSeq))
            repeatedFrames = len(synthIndexSeq) - uniqueFrames
            droppedFrames = len(synth) - uniqueFrames
            assert len(synth) - droppedFrames + repeatedFrames == len(nat)
            print 'warping %s frames -> %s frames (%s repeated, %s dropped)' % (
            len(synth), len(nat), repeatedFrames, droppedFrames)
            s = s2 + s1 + '\n'
            with open('information.txt', mode='a') as f:
                f.write(s)
            print

            for paramOrder, ext in zip(paramOrders, exts):
                vecSeqIo = vsio.VecSeqIo(paramOrder)

                synthFullFile = os.path.join(args.synthDir, uttId + '.' + ext)
                synthFull = vecSeqIo.readFile(synthFullFile)

                synthFullWarped = dtw.warpGeneral(synthFull, synthIndexSeq)
                synthFullWarpedFile = os.path.join(args.outDir, uttId + '.' + ext)
                vecSeqIo.writeFile(synthFullWarpedFile, synthFullWarped)

        except:
            print 'Error'

请自行调整代码间距

  • 2
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 11
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 11
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值