python语音合成需要的库_用PyTorch实现Deep Voice 3语音合成

最新推荐文章于 2024-01-25 01:44:59 发布

weixin_39921224

最新推荐文章于 2024-01-25 01:44:59 发布

阅读量1k

点赞数

文章标签： python语音合成需要的库

这是一个使用PyTorch实现的基于卷积网络的文本到语音合成模型，包括Deep Voice 3的单声道和多声道版本。提供预训练模型和音频样本，支持LJSpeech、VCTK等数据集，并包含数据预处理和指导注意力的细节。

摘要由CSDN通过智能技术生成

Deepvoice3_pytorch

PyTorch implementation of convolutional networks-based text-to-speech synthesis models:

arXiv:1710.07654: Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning.

arXiv:1710.08969: Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention.

Folks

Online TTS demo

Notebooks supposed to be executed on https://colab.research.google.com are available:

Highlights

Convolutional sequence-to-sequence model with attention for text-to-speech synthesis

Multi-speaker and single speaker versions of DeepVoice3

Audio samples and pre-trained models

Preprocessor for LJSpeech (en), JSUT (jp) and VCTK datasets, as well as carpedm20/multi-speaker-tacotron-tensorflow compatible custom dataset (in JSON format)

Language-dependent frontend text processor for English and Japanese

Samples

Pretrained models

NOTE: pretrained models are not compatible to master. To be updated soon.

URL

Model

Data

Hyper paramters

Git commit

Steps

DeepVoice3

LJSpeech

640k

Nyanko

LJSpeech

builder=nyanko,preset=nyanko_ljspeech

585k

Multi-speaker DeepVoice3

VCTK

builder=deepvoice3_multispeaker,preset=deepvoice3_vctk

300k + 300k

To use pre-trained models, it's highly recommended that you are on the specific git commit noted above. i.e.,

git checkout ${commit_hash}

Then follow the "Synthesize from a checkpoint" section in the README of the specific git commit. Please notice that the latest development version of the repository may not work.

You could try for example:

# pretrained model (20180505_deepvoice3_checkpoint_step000640000.pth)

# hparams (20180505_deepvoice3_ljspeech.json)

git checkout 4357976

python synthesis.py --preset=20180505_deepvoice3_ljspeech.json \

20180505_deepvoice3_checkpoint_step000640000.pth \

sentences.txt \

output_dir

Notes on hyper parameters

Default hyper parameters, used during preprocessing/training/synthesis stages, are turned for English TTS using LJSpeech dataset. You will have to change some of parameters if you want to try other datasets. See hparams.py for details.

builder specifies which model you want to use. deepvoice3, deepvoice3_multispeaker [1] and nyanko [2] are surpprted.

Hyper parameters described in DeepVoice3 paper for single speaker didn't work for LJ

最低0.47元/天解锁文章

weixin_39921224

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫