bert-vits2搭建与训练，实现训练个性化音色

小李学不会编程

已于 2023-10-17 15:15:05 修改

阅读量2w

点赞数 16

文章标签： python 人工智能 vscode linux

于 2023-09-27 17:47:09 首次发布

本文链接：https://blog.csdn.net/qq_51506262/article/details/133359555

版权

bert-vits2训练流程

对应的B站搭建流程视频教程：https://www.bilibili.com/video/BV1b34y1g7Ah/?spm_id_from=333.999.0.0&vd_source=2a400e9a86101d2bf4ac9d3ed71e65c9

python 版本 python 3.8 + cuda11.7.
下载bert-vits2 git clone https://github.com/fishaudio/Bert-VITS2.git
pytorch linux直接pip安装， windows去官网下载（nvcc -v查看cuda版本）

模型下载中文模型，日文模型，一般都是缺少大文件，去huggingface下载对应缺少的文件放在项目的bert下。

中：https://huggingface.co/hfl/chinese-roberta-wwm-ext-large
日：https://huggingface.co/cl-tohoku/bert-base-japanese-v3/tree/main
# 下载好放在bert里

测试

python ./text/chinese_bert.py 
python ./text/japanese_bert.py

下载数据集在raw文件夹下创建对应名称的文件夹

 https://www.bilibili.com/read/cv26659988/?from=articleDetail  
 # 放在./raw/{name}下

注：如果想要训练自己的音色，而项目代码并没给如何得到.wav.lab格式的文件形式，下面是我自己写的自动标注代码，只用把音频放在新建的data/name下，name要与下面文件中的对应运行这个文件就行：

import os
from pathlib import Path
import librosa
from scipy.io import wavfile
import numpy as np
import whisper

a="ll" # 请在这里修改说话人的名字，目前只支持中文语音,将音频放在data/ll下

def split_long_audio(model, filepaths, save_dir="data_dir", out_sr=44100):
    files=os.listdir(filepaths)
    filepaths=[os.path.join(filepaths,i)  for i in files]

    i=0
    for file_idx, filepath in enumerate(filepaths):

        save_path = Path(save_dir)
        save_path.mkdir(exist_ok=True, parents=True)

        print(f"Transcribing file {
     file_idx}: '{
     filepath}' to segments...")
        result = model.transcribe(filepath, word_timestamps=True, task="transcribe", beam_size=5

最低0.47元/天解锁文章