Tacotron-2-Chinese 中文语音合成
预训练模型下载
仅 Tacotron 频谱预测部分,无 WaveNet 声码器(实验中),可用 Griffin-Lim 合成语音(见下)。
使用标贝数据集训练,为避免爆显存用了 ffmpeg 把语料的采样率从 48KHz 降到了 36KHz,听感基本无区别。
安装依赖
安装 Python 3 和 Tensorflow 1.10(在 Tensorflow 1.14 上用 WaveNet 会有Bug,在 1.10 上正常)。
安装依赖:
apt-get install -y libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg libav-tools
若 libav-tools 安装失败则手动安装:
wget http://launchpadlibrarian.net/339874908/libav-tools_3.3.4-2_all.deb
dpkg -i libav-tools_3.3.4-2_all.deb
安装 requirements:
pip install -r requirements.txt
训练模型
下载 标贝数据集,解压至 Tacotron-2-Chinese 文件夹根目录。目录结构如下:
Tacotron-2-Chinese
|- BZNSYP
|- PhoneLabeling
|- ProsodyLabeling
|- Wave
用 ffmpeg 把 /BZNSYP/Wave/ 中的 wav 的采样率降到36KHz:
ffmpeg.exe -i 输入.wav -ar 36000 输出.wav
预处理数据:
python preprocess.py --dataset='Biaobei'
训练模型(自动从最新 Checkpoint 继续):
python train.py --model='Tacotron-2'
合成语音
用根目录的 sentences.txt 中的文本合成语音。
python synthesize.py --model='Tacotron-2' --text_list='sentences.txt'
若无 WaveNet 模型,仅有频谱预测模型,则仅由 Griffin-Lim 生成语音,输出至 /tacotron_output/logs-eval/wavs/ 文件夹中。
若有 WaveNet 模型,则 WaveNet 生成的语音位于 /wavenet_output/wavs/ 中。
Tacotron-2:
Tensorflow implementation of DeepMind's Tacotron-2. A deep neural network architecture described in this paper: Natural TTS synthesis by conditioning Wavenet on MEL spectogram predictions
This Repository contains additional improvements and attempts over the paper, we thus propose paper_hparams.py file which holds the exact hyperparameters to reproduce the paper results without any additional extras.
Suggested hparams.py file which is default in use, contains the hyperparameters with extras that proved to provide better results in most cases. Feel free to toy with the parameters as needed.
DIFFERENCES WILL BE HIGHLIGHTED IN DOCUMENTATION SHORTLY.
Repository Structure:
Tacotron-2
├──