通过FastSpeech2中文项目梳理TTS流程1: 数据预处理（preprocess.py)

最新推荐文章于 2024-06-10 09:35:28 发布

BabelBook

最新推荐文章于 2024-06-10 09:35:28 发布

阅读量1.4k

点赞数 5

分类专栏： TTS in FastSpeech2 文章标签： python

本文链接：https://blog.csdn.net/weixin_42745601/article/details/120134899

版权

TTS in FastSpeech2 专栏收录该内容

3 篇文章 9 订阅

订阅专栏

1. 参考github网址：

GitHub - roedoejet/FastSpeech2: An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

2. 数据预处理所用python 命令：

python3 preprocess.py config/AISHELL3/preprocess.yaml

3. 预处理代码解析：

主要运行的代码是preprocessor文件夹下的preprocessor.py

3.1 代码架构overview：

主要是一个class:

class Preprocessor:

下面有 6个functions,作用如下图所示

def __init__(self, config):
“”“加载configs，按照预设路径读入数据”“”

def build_from_path(self):
“”“
主要程序，主要作用是：
     1.加载从precess_utterance这个function里获得的信息
     2.对信息进行normalize,
     3.最后按照指定路径写入文件
      （speaker.json, stats.json, train.txt, val.txt)
“”“

def process_utterance(self, speaker, basename):
“”“
被build_from_path这个function调用
主要作用是
        1.通过get_alignment这个function获取textgrid files里的信息
        2.计算出wav files里的foundamental frequency/pitch
        3.通过stft（短时傅里叶变换）把声音文件转成mel频谱
        4.计算出wav files里的energy
        5.将获得的pitch, energy, mel频谱信息分别写入以.npy为后缀的文件
“”“

def get_alignment(self, tier):
“”“
被process_utterance这个function调用
主要作用是提取textgrid files里的phone,duration,start_time, end_time等信息
“”“

def remove_outlier(self, values):
def normalize(self, in_dir, mean, std):
“”“
这两个function都是用来normalize data的
“”‘

3.2预处理代码作用：

主要是把语音数据，对应的textgrid数据和.lab 文本数据进行整合，提取出需要的energy, pitch, mel-scale spectrogram等信息。

4.预处理代码的输出：

4.1输出文件：

1.speakers.json: speaker信息

2.stats.json: pitch,energy的范围（max-min)

3.train.txt, val.txt: basename, speaker, phone transcription of wav files

4.pitch, duration,mel,energy 文件夹（里面是.npy文件）:

wav files的energy, pitch, mel-scale spectrogram等信息

4.2输出位置：out_dir

BabelBook

关注

5
点赞
踩
10

收藏

觉得还不错? 一键收藏
2
评论
通过FastSpeech2中文项目梳理TTS流程1: 数据预处理（preprocess.py)

1. 参考github网址：https://github.com/roedoejet/FastSpeech22. 数据预处理所用python 命令：python3 preprocess.py config/AISHELL3/preprocess.yaml3. 预处理代码解析：主要运行的代码是preprocessor文件夹下的preprocessor.py3.1 代码架构overview：主要是一个class:class Preprocessor:下面有 6个functio.
复制链接

扫一扫