python语音合成需要的库_用PyTorch实现Deep Voice 3语音合成

这是一个使用PyTorch实现的基于卷积网络的文本到语音合成模型,包括Deep Voice 3的单声道和多声道版本。提供预训练模型和音频样本,支持LJSpeech、VCTK等数据集,并包含数据预处理和指导注意力的细节。
摘要由CSDN通过智能技术生成

Deepvoice3_pytorch

PyTorch implementation of convolutional networks-based text-to-speech synthesis models:

arXiv:1710.07654: Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning.

arXiv:1710.08969: Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention.

Folks

Online TTS demo

Notebooks supposed to be executed on https://colab.research.google.com are available:

Highlights

Convolutional sequence-to-sequence model with attention for text-to-speech synthesis

Multi-speaker and single speaker versions of DeepVoice3

Audio samples and pre-trained models

Preprocessor for LJSpeech (en), JSUT (jp) and VCTK datasets, as well as carpedm20/multi-speaker-tacotron-tensorflow compatible custom dataset (in JSON format)

Language-dependent frontend text processor for English and Japanese

Samples

Pretrained models

NOTE: pretrained models are not compatible to master. To be updated soon.

URL

Model

Data

Hyper paramters

Git commit

Steps

DeepVoice3

LJSpeech

640k

Nyanko

LJSpeech

builder=nyanko,preset=nyanko_ljspeech

585k

Multi-speaker DeepVoice3

VCTK

builder=deepvoice3_multispeaker,preset=deepvoice3_vctk

300k + 300k

To use pre-trained models, it's highly recommended that you are on the specific git commit noted above. i.e.,

git checkout ${commit_hash}

Then follow the "Synthesize from a checkpoint" section in the README of the specific git commit. Please notice that the latest development version of the repository may not work.

You could try for example:

# pretrained model (20180505_deepvoice3_checkpoint_step000640000.pth)

# hparams (20180505_deepvoice3_ljspeech.json)

git checkout 4357976

python synthesis.py --preset=20180505_deepvoice3_ljspeech.json \

20180505_deepvoice3_checkpoint_step000640000.pth \

sentences.txt \

output_dir

Notes on hyper parameters

Default hyper parameters, used during preprocessing/training/synthesis stages, are turned for English TTS using LJSpeech dataset. You will have to change some of parameters if you want to try other datasets. See hparams.py for details.

builder specifies which model you want to use. deepvoice3, deepvoice3_multispeaker [1] and nyanko [2] are surpprted.

Hyper parameters described in DeepVoice3 paper for single speaker didn't work for LJ

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值