Deepvoice3_pytorch
PyTorch implementation of convolutional networks-based text-to-speech synthesis models:
arXiv:1710.07654: Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning.
arXiv:1710.08969: Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention.
Folks
Online TTS demo
Notebooks supposed to be executed on https://colab.research.google.com are available:
Highlights
Convolutional sequence-to-sequence model with attention for text-to-speech synthesis
Multi-speaker and single speaker versions of DeepVoice3
Audio samples and pre-trained models
Preprocessor for LJSpeech (en), JSUT (jp) and VCTK datasets, as well as carpedm20/multi-speaker-tacotron-tensorflow compatible custom dataset (in JSON format)
Language-dependent frontend text processor for English and Japanese
Samples
Pretrained models
NOTE: pretrained models are not compatible to master. To be updated soon.
URL
Model
Data
Hyper paramters
Git commit
Steps
DeepVoice3
LJSpeech
640k
Nyanko
LJSpeech
builder=nyanko,preset=nyanko_ljspeech
585k
Multi-speaker DeepVoice3
VCTK
builder=deepvoice3_multispeaker,preset=deepvoice3_vctk
300k + 300k
To use pre-trained models, it's highly recommended that you are on the specific git commit noted above. i.e.,
git checkout ${commit_hash}
Then follow the "Synthesize from a checkpoint" section in the README of the specific git commit. Please notice that the latest development version of the repository may not work.
You could try for example:
# pretrained model (20180505_deepvoice3_checkpoint_step000640000.pth)
# hparams (20180505_deepvoice3_ljspeech.json)
git checkout 4357976
python synthesis.py --preset=20180505_deepvoice3_ljspeech.json \
20180505_deepvoice3_checkpoint_step000640000.pth \
sentences.txt \
output_dir
Notes on hyper parameters
Default hyper parameters, used during preprocessing/training/synthesis stages, are turned for English TTS using LJSpeech dataset. You will have to change some of parameters if you want to try other datasets. See hparams.py for details.
builder specifies which model you want to use. deepvoice3, deepvoice3_multispeaker [1] and nyanko [2] are surpprted.
Hyper parameters described in DeepVoice3 paper for single speaker didn't work for LJ