TensorFlow2.0 学习笔记八 TensorFlowTTS环境配置准备

最新推荐文章于 2024-09-13 21:53:07 发布

shaynerain

最新推荐文章于 2024-09-13 21:53:07 发布

阅读量768

点赞数

分类专栏： TensorFlow2 文章标签： tensorflow 学习笔记

本文链接：https://blog.csdn.net/shaynerain/article/details/133689181

版权

TensorFlow2 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

前言

在之前的介绍中可以直接使用Docker的GPU版本来进行训练等操作，现在基于这里继续写一下如何准备TensorSpeech下的TensorFlowTTS

使用Docker用GPU的教程：

TensorFlow2.0 学习笔记四利用Docker直接使用GPU版本_shaynerain的博客-CSDN博客

项目地址

GitHub - TensorSpeech/TensorFlowTTS: :stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

需要先说的是项目中推荐使用的GPU2.6版本，对于官网直接部署好的最新版的GPU版本，表示很不想使用旧版本，经过多次尝试，发现GPU2.6版本也非常难部署，现在进行各种尝试看组新版的如何正常部署。

部署

操作可参考TensorFlow2.0 学习笔记七 Docker中设置直接挂载目录_shaynerain的博客-CSDN博客

在该目录中克隆代码

git clone https://github.com/TensorSpeech/TensorFlowTTS.git

2 修改代码中的setup.py文件

因为要使用最新版啊的tensorflow，所以把前面要求版本的给取消，这里展示做出修改的地方

requirements = {
    "install": [
        "tensorflow>=2.7.0",
        "tensorflow-addons>=0.10.0",
        "setuptools>=38.5.1",
        "huggingface_hub>=0.0.8",
        "librosa>=0.7.0",
        "soundfile>=0.10.2",

3 设置pip镜像源，然后进行进入目录进行安装

这一步需要在容器中进行操作

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
pip install .

4 开始运行他的Examples

import numpy as np
import soundfile as sf
import yaml

import tensorflow as tf

from tensorflow_tts.inference import TFAutoModel
from tensorflow_tts.inference import AutoProcessor

# initialize fastspeech2 model.
fastspeech2 = TFAutoModel.from_pretrained("tensorspeech/tts-fastspeech2-ljspeech-en")


# initialize mb_melgan model
mb_melgan = TFAutoModel.from_pretrained("tensorspeech/tts-mb_melgan-ljspeech-en")


# inference
processor = AutoProcessor.from_pretrained("tensorspeech/tts-fastspeech2-ljspeech-en")

input_ids = processor.text_to_sequence("Recent research at Harvard has shown meditating for as little as 8 weeks, can actually increase the grey matter in the parts of the brain responsible for emotional regulation, and learning.")
# fastspeech inference

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference(
    input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0),
    speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32),
    speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32),
    f0_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),
    energy_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32),
)

# melgan inference
audio_before = mb_melgan.inference(mel_before)[0, :, 0]
audio_after = mb_melgan.inference(mel_after)[0, :, 0]

# save to file
sf.write('./audio_before.wav', audio_before, 22050, "PCM_16")
sf.write('./audio_after.wav', audio_after, 22050, "PCM_16")

开始解决各种报错

1 nltk错误？

LookupError:
**********************************************************************
Resource averaged_perceptron_tagger not found.
Please use the NLTK Downloader to obtain the resource:

>>> import nltk
>>> nltk.download('averaged_perceptron_tagger')

For more information see: https://www.nltk.org/data.html

.....

ParseError: unclosed token: line 302, column 6

按照提示运行

import nltk
nltk.download('averaged_perceptron_tagger')

我的还是不行，有提示了解析xml错误，于是我就想直接去下载把这个东西补齐

在github上找到对应的缺失的东西GitHub - nltk/nltk_data: NLTK Data

在他要求的目录上/root/进行克隆

git clone https://github.com/nltk/nltk_data.git

如果克隆失败可直接前往下载

然后复制里面packages目录下所有文件到/root/nltk_data

再次运行成功

剩下直接运行脚本，便能生成wav文件

from: https://blog.csdn.net/shaynerain/article/details/133689181