Tacotron2 NVIDIA版本优化停顿问题之Biao-Bei数据PhonePrssCrystal

尚未补充完整!!!

预处理数据

bash脚本

'/r’错误

sed -i 's/\r$//' filename
整个文件夹
for i in *;do if [[ -f $i ]]; then sed

仿LJS预处理

下载, 解压, 重采样, 调用py制作XXX|abc文件.

#!/usr/bin/env bash

set -e

DATADIR="Biao-Bei"
RARARCHIVE="BZNSYP.rar"
ENDPOINT="https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar"

if [ ! -d "$DATADIR" ]; then
  echo "dataset is missing, solving ..."
  if [ ! -f "$BZ2ARCHIVE" ]; then
    echo "no source!! ..."
    wget "$ENDPOINT"
  fi
  if [ ! -d "Wave" ]; then
    echo "dataset is missing, unpacking ..."
    rar x "$RARARCHIVE"
  fi
  mkdir -p "$DATADIR"
  mv  "Wave" "$DATADIR"
  mv  "ProsodyLabeling" "$DATADIR"
  mv  "PhoneLabeling" "$DATADIR"
  cd "$DATADIR/Wave"
  echo pwd
  for x in ./*.wav
  do
    b=${x##*/}
    sox $b -r 22050 tmp_$b
    rm -rf $b
    mv tmp_$b $b
  done
  cd ..
  cd ..
  python scripts/__Biao-Bei_prepare_dataset.py
fi

提取mel

改代码

启动docker

NV_GPU='7' nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -it --rm --ipc=host -v $PWD:/workspace/tacotron2/ tacotron2 bash
sed -i 's/\r$//' filename
bash

或者是:

CUDA_VISIBLE_DEVICES="0,1"  bash scripts/docker/interactive.sh

训练

只训练一条数据:
bash trian

合成

模型1

借用好像是多核训练Biao-Bei pinyin 的waveGlow, (记得好像是多核训练的, 不是太好), 不热身.

python inference.py --tacotron2 oneCore_Biao-Bei_PhonePrssCrystal_output/checkpoint_Tacotron2_1500 --waveglow Biao-Bei_output/checkpoint_WaveGlow_650 -o Biao-Bei_output/ -i phrases/Biao-Bei_PhonePrssCrystal_phrase.txt --amp-run
模型2

借用单核训练的Biao-Bei pinyin 的waveGlow, 但步数450时loss nan, 就结束了, 不热身.

python inference.py --tacotron2 oneCore_Biao-Bei_PhonePrssCrystal_output/checkpoint_Tacotron2_1500 --waveglow oneCore_Biao-Bei_output/checkpoint_WaveGlow_450 -o oneCore_Biao-Bei_output/ -i phrases/Biao-Bei_PhonePrssCrystal_phrase.txt --amp-run
模型3

借用单核训练的LJSpeeh1.1 的waveGlow, 英文的和中文分布可能会有偏差, 但测试效果很好, 因为标贝我还不会调参, 不热身.

python inference.py --tacotron2 oneCore_Biao-Bei_PhonePrssCrystal_output/checkpoint_Tacotron2_1500 --waveglow oneCore_output/checkpoint_WaveGlow_1000 -o oneCore_Biao-Bei_PhonePrssCrystal_output/  -i phrases/Biao-Bei_PhonePrssCrystal_phrase.txt --amp-run
python inference.py --tacotron2 oneCore_Biao-Bei_PhonePrssCrystal_output/checkpoint_Tacotron2_1500 --waveglow oneCore_output/checkpoint_WaveGlow_1000 -o oneCore_output/ -i phrases/Biao-Bei_PhonePrssCrystal_phrase.txt --amp-run
模型4

如果合成英文的话:

python inference.py --tacotron2 output/checkpoint_Tacotron2_750 --oneCore_output/checkpoint_WaveGlow_1000 -o output/  -i phrases/phrase.txt --amp-run

借用LJSpeech 的waveGlow, 不热身.

python inference.py --tacotron2 oneCore_Biao-Bei_PhonePrssCrystal_output/checkpoint_Tacotron2_1500 --waveglow oneCore_output/checkpoint_WaveGlow_1000 -o oneCore_Biao-Bei_PhonePrssCrystal_output/  -i phrases/Biao-Bei_PhonePrssCrystal_phrase.txt --amp-run

mel, train.sh, docker命令, 明天写.
NV_GPU=‘0, 1, 2, 3’ nvidia-docker run --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -it --rm --ipc=host -v $PWD:/workspace/tacotron2/ tacotron2 bash

mkdir -p Biao-Bei_output
python -m multiproc train.py -m WaveGlow -o ./Biao-Bei_output/ -lr 1e-4 --epochs 1001 -bs 10 --segment-length  8000 --weight-decay 0 --grad-clip-thresh 65504.0 --cudnn-enabled --cudnn-benchmark --log-file ./Biao-Bei_output/waveGlow_nvlog.json --amp-run

关键是1. 用不用desk的mel, 以及test没有输进去.
关键2. 是否是分布式. 速度多快?
关键3. 如何断点续训

python inference.py --tacotron2 Biao-Bei_output/checkpoint_Tacotron2_650 --waveglow Biao-Bei_output/checkpoint_WaveGlow_250 -o Biao-Bei_output/ --include-warmup -i phrases/Biao-Bei_phrase.txt --amp-run

python inference.py --tacotron2 Biao-Bei_PhonePrssCrystal_output/checkpoint_Tacotron2_250--waveglow Biao-Bei_output/checkpoint_WaveGlow_250 -o Biao-Bei_PhonePrssCrystal_output/  -i phrases/Biao-Bei_PhonePrssCrystal_phrase.txt --amp-run

CUDA_VISIBLE_DEVICES=“0,1”

path1=os.path.abspath(’.’)

第一次实验失败–更正

先测试问题

问题1

使用了错误的cleaner, 使得1=>one, 混淆了文字.

问题2

目前的串有些长度浪费, 需要精进设计下.

问题3

train和infer的时候, 代码逻辑有些许不同, 需要两次都过一遍.

怀疑train与infer时seq不同

单独用subset来看train的序列值:
训练时:
[8, 11, 81, 47, 70, 14, 47, 66, 16, 81, 47, 43, 0, 53, 47, 66, 24, 53, 47, 5, 37, 47, 21, 58, 4, 47, 43, 0, 53, 47, 22, 53, 47, 1, 69, 53, 47, 84, 78, 4, 47, 43, 0, 53, 47, 32, 47, 0]
推断时:
[ 8, 11, 81, 47, 70, 14, 47, 66, 16, 81, 47, 43, 0, 53, 47, 66, 24, 53, 47, 5, 37, 47, 21, 58, 4, 47, 43, 0, 53, 47, 22, 53, 47, 1, 69, 53, 47, 84, 78, 4, 47, 43, 0, 53, 47, 32, 47, 0]]

怀疑mel谱和waveGlow

pytorch的exp_dims:

		# const_mel = torch.load('Biao-Bei/mels/000001.pt').unsqueeze(0)
        # print(const_mel.shape)
        # print(mel)
        # print(const_mel)
        # print(const_mel.dtype)
        # mel = const_mel.type_as(mel)
        # print(mel.dtype)

waveGlow是对的.

怀疑预处理数据失去音素含义

实验一: 卡尔普等16句子subset_1

bash train:

bash scripts/Biao-Bei_PhonePrssCrystal_train_subset_1_tacotron2.sh

bash infer:

python inference.py --tacotron2 subset_1_Biao-Bei_PhonePrssCrystal_output/checkpoint_Tacotron2_1500 --waveglow Biao-Bei_output/checkpoint_WaveGlow_650 -o subset_1_Biao-Bei_PhonePrssCrystal_output/  -i phrases/Biao-Bei_PhonePrssCrystal_phrase.txt --amp-run

WaveGlow train:

bash scripts/Biao-Bei_PhonePrssCrystal_train_subset_1_waveglow.sh
现象

infer的mel序列是错的.

标签mel序列是:
tensor([[[ -8.8281, -7.6094, -6.8555, …, -7.5352, -7.8359, -8.2031],
[ -9.6719, -8.8047, -7.8008, …, -7.4727, -7.7617, -8.4141],
[ -8.5625, -8.7031, -8.4688, …, -8.5625, -8.2500, -8.0781],
…,
[-10.5312, -10.5000, -10.5703, …, -11.1094, -11.1328, -11.1328],
[-10.1250, -10.1953, -10.3594, …, -10.4062, -10.7031, -10.9609],
[-10.0938, -9.9688, -9.6719, …, -10.6484, -10.8125, -10.7734]]],
device=‘cuda:0’, dtype=torch.float16)

每次infer的mel序列值都有些变化:
tensor([[[ -7.2305, -7.2383, -7.0508, …, -6.5039, -6.9297, -6.0430],
[ -7.6562, -7.6719, -7.5039, …, -7.1367, -7.2695, -6.4180],
[ -7.6523, -7.7500, -7.7578, …, -8.1016, -8.2031, -7.3633],
…,
[-10.2891, -10.0938, -10.1562, …, -10.3047, -10.1953, -9.2422],
[-10.2656, -10.2891, -10.3359, …, -10.2969, -10.8047, -9.4297],
[-10.3516, -10.1016, -10.1719, …, -10.3594, -10.3281, -9.3828]]],
device=‘cuda:0’, dtype=torch.float16)

tensor([[[ -7.2305, -7.3516, -7.2188, …, -6.2305, -6.6719, -5.6406],
[ -7.6484, -7.6914, -7.5547, …, -7.0117, -7.1562, -6.1367],
[ -7.6484, -7.7383, -7.7344, …, -8.0625, -8.0938, -7.0703],
…,
[-10.2734, -10.0625, -10.1719, …, -10.2656, -10.1641, -9.0469],
[-10.2500, -10.2500, -10.3594, …, -10.2031, -10.6875, -9.2031],
[-10.3359, -10.0312, -10.1406, …, -10.3516, -10.2344, -9.1562]]],
device=‘cuda:0’, dtype=torch.float16)

tensor([[[ -7.1562, -7.1992, -7.2227, …, -6.3516, -6.8867, -6.0273],
[ -7.6133, -7.6172, -7.5820, …, -7.0508, -7.3125, -6.5117],
[ -7.6328, -7.6250, -7.7891, …, -8.1328, -8.3359, -7.5039],
…,
[-10.2891, -10.1719, -10.0938, …, -10.3750, -10.4141, -9.5234],
[-10.2656, -10.2734, -10.1953, …, -10.2812, -10.8984, -9.6094],
[-10.3516, -10.1484, -10.0703, …, -10.3438, -10.4062, -9.5312]]],
device=‘cuda:0’, dtype=torch.float16)

tensor([[[ -7.2070, -7.3867, -7.3242, …, -6.7734, -7.2500, -6.5430],
[ -7.6406, -7.7109, -7.5938, …, -7.3164, -7.6133, -7.0000],
[ -7.6602, -7.7617, -7.7812, …, -8.1250, -8.4219, -7.9062],
…,
[-10.2812, -10.0312, -9.9297, …, -10.6719, -10.6953, -9.8047],
[-10.1953, -10.0547, -10.0234, …, -10.5312, -11.1797, -9.8594],
[-10.3281, -9.9922, -9.8594, …, -10.6016, -10.6484, -9.7656]]],
device=‘cuda:0’, dtype=torch.float16)

tensor([[[ -7.1250, -7.1680, -7.0977, …, -6.6133, -7.2578, -6.3672],
[ -7.5859, -7.5977, -7.4844, …, -7.2695, -7.5938, -6.8516],
[ -7.6055, -7.6211, -7.6680, …, -8.3203, -8.5781, -7.8672],
…,
[-10.2969, -10.2891, -10.1328, …, -10.5547, -10.6875, -9.8672],
[-10.3125, -10.4375, -10.2031, …, -10.4531, -11.2031, -9.9688],
[-10.3672, -10.2656, -10.1250, …, -10.5312, -10.6328, -9.8828]]],
device=‘cuda:0’, dtype=torch.float16)

ckpt恢复

https://pytorch-cn.readthedocs.io/zh/latest/package_references/torch-nn/
module.Missing key(s) in state_dict: "module.
https://discuss.pytorch.org/t/missing-keys-unexpected-keys-in-state-dict-when-loading-self-trained-model/22379/2
infer的时候是默认不适用多个GPU的. 代码压根就强硬不考虑.
x.size()
x, y, num_items = batch_to_gpu(batch)
得到的x是个tuple

def batch_to_gpu(batch):
   text_padded, input_lengths, mel_padded, gate_padded, \
       output_lengths, len_x = batch
   text_padded = to_gpu(text_padded).long()
   input_lengths = to_gpu(input_lengths).long()
   max_len = torch.max(input_lengths.data).item()
   mel_padded = to_gpu(mel_padded).float()
   gate_padded = to_gpu(gate_padded).float()
   output_lengths = to_gpu(output_lengths).long()
   x = (text_padded, input_lengths, mel_padded, max_len, output_lengths)
   y = (mel_padded, gate_padded)
   len_x = torch.sum(output_lengths)
   return (x, y, len_x)
对比train时mel

实验二: 重头到尾读代码以及大胆测试

修正save_sample

实验三: 重新组合音素串

发现代码没有随机顺序

莫名其妙又好了, 不管了.

loss下精训 (下一课题)

发现PhonePrssCrystal loss大

后来发现其实不大, 0.3比pinyin的0.31还小, 但是就是发音有时不对.
symbols中有一个很特殊的.
“p_attention_dropout”: 0.1, 但是代码中有0.5, 不用管, 可是json如何产生?
“p_decoder_dropout”: 0.1,
有attention rnn, 那么decoder rnn呢? 答案: 是有的. 但是T2是共用一个结构吗?

继续训练

头发, 搂腰
结尾处.

完全恢复

模型参数

搞定, 测试过.

学习率

搞定, 测试过.

迭代轮数

搞定.

OPT

不会

执行代码(较好版本)

python inference.py --tacotron2 Biao-Bei_PhonePrssCrystal_output/checkpoint_Tacotron2_750 --waveglow Biao-Bei_output/checkpoint_WaveGlow_650 -o subset_1_Biao-Bei_PhonePrssCrystal_output/  -i phrases/Biao-Bei_PhonePrssCrystal_phrase.txt --amp-run

用subset_1续上

Biao-Bei_PhonePrssCrystal
https://zhuanlan.zhihu.com/p/79887894

跑 LJS WaveGlow

python inference.py --tacotron2 subset_1_Biao-Bei_PhonePrssCrystal_output/checkpoint_Tacotron2_1500 --waveglow output/checkpoint_WaveGlow_300 -o subset_1_Biao-Bei_PhonePrssCrystal_output/  -i phrases/Biao-Bei_PhonePrssCrystal_phrase.txt --amp-run

标贝本身最好的版本:

python inference.py --tacotron2 subset_1_Biao-Bei_PhonePrssCrystal_output/checkpoint_Tacotron2_1500 --waveglow subset_1_Biao-Bei_PhonePrssCrystal_output/checkpoint_WaveGlow_650 -o subset_1_Biao-Bei_PhonePrssCrystal_output/  -i phrases/Biao-Bei_PhonePrssCrystal_phrase.txt --amp-run

中文转语音脚本

Biao-Bei_demo_2_hanzi.sh

其他

汉语转拼音

https://github.com/mozillazg/python-pinyin
https://pypinyin.readthedocs.io/zh_CN/master/

tf.py_func和shape

def my_func(x):
    return np.sinh(x).astype('float32')

inp = tf.convert_to_tensor(np.arange(5))
y = tf.py_func(my_func, [inp], tf.float32, False)

with tf.Session() as sess:
    with sess.as_default():
        print(inp.shape)
        print(inp.eval())
        print(y.shape)
        print(y.eval())
y.set_shape(inp.get_shape())
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值