在demo数据集上微调PaddleSpeech

最新推荐文章于 2024-06-12 21:30:30 发布

wengad

最新推荐文章于 2024-06-12 21:30:30 发布

阅读量1.3k

点赞数 21

分类专栏：语音处理大模型 paddlespeech 文章标签：人工智能语言模型

本文链接：https://blog.csdn.net/wengad/article/details/138238726

版权

大模型同时被 3 个专栏收录

13 篇文章 0 订阅

订阅专栏

语音处理

3 篇文章 0 订阅

订阅专栏

paddlespeech

2 篇文章 0 订阅

订阅专栏

在Demo数据集上微调PaddleSpeech及遇到的This dataset has no examples

背景
安装
运行微调
FAQ
- 问题
- 解决

背景

在centos7 (CentOS Linux release 7.6.1810 (Core))下，git clone paddlespeech项目，checkout r1.4.1，并安装微调中文环境，进行微调。

微调PaddleSpeech遇到的This dataset has no examples与解决。

安装

conda环境

在这里插入图片描述

#speech
#create conda enviroment

conda create -n speech python=3.10


#install package

pip install -r requirements.txt  -i https://mirror.baidu.com/pypi/simple/ --trusted-host mirror.baidu.com

requirements.txt

bashnumpy==1.23.5
paddlespeech_ctcdecoders
paddlepaddle==2.4.2
pytest-runner
paddlespeech
ipykernel
transformers

微调环境

参照clone下来的项目中的paddlespeech/examples/other/tts/README.md，搭建环境，具体的：
注：以下操作的根目录在paddlespeech/examples/other/tts/

下载预训练模型（如下是下载合成中文的）

mkdir -p pretrained_models && cd pretrained_models

预训练的fastspeech2模型

wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_aishell3_ckpt_1.1.0.zip 

unzip fastspeech2_aishell3_ckpt_1.1.0.zip

预训练的hifigan 模型

wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_aishell3_ckpt_0.2.0.zip
unzip hifigan_aishell3_ckpt_0.2.0.zip
cd ../

准备数据（这里将数据放在input下，如下使用了csmsc的200个数据集）

下载后解压后，有200个wav和lables.txt文件。
标签文件的格式是：utt_id|pronunciation
比如：
000001|ka2 er2 pu3 pei2 wai4 sun1 wan2 hua2 ti1

mkdir -p input && cd input

wget https://paddlespeech.bj.bcebos.com/datasets/csmsc_mini.zip

unzip csmsc_mini.zip

下载MFA 一个语音对齐工具和模型

工具

mkdir -p tools && cd tools

wget https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/releases/download/v1.0.1/montreal-forced-aligner_linux.tar.gz

tar xvf montreal-forced-aligner_linux.tar.gz

cp montreal-forced-aligner/lib/libpython3.6m.so.1.0 montreal-forced-aligner/lib/libpython3.6m.so

模型

mkdir -p aligner && cd aligner

wget https://paddlespeech.bj.bcebos.com/MFA/ernie_sat/aishell3_model.zip

unzip aishell3_model.zip

wget https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/simple.lexicon

cd ../../

以上准备好后，目录如下：
在这里插入图片描述

运行微调

根据需要调整conf/finetune.yaml

./run.sh
在这里插入图片描述

FAQ

问题

以上在运行run.sh的时候，报错This dataset has no examples
在这里插入图片描述

(speech) [datatech@join71 tts3]$ ./run.sh
check oov
get mfa result
align.py:60: YAMLLoadWarning: calling yaml.load() without Loader=… is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Setting up corpus information…
Number of speakers in corpus: 1, average number of utterances per speaker: 198.0
/home/datatech/proj/paddlespeech/examples/other/tts_finetune/tts3/tools/montreal-forced-aligner/lib/aligner/models.py:87: YAMLLoadWarning: calling yaml.load() without Loader=… is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
Creating dictionary information…
Using previous MFCCs
Number of speakers in corpus: 1, average number of utterances per speaker: 198.0
Done with setup.
100%|###########################################################################################################################| 2/2 [00:01<00:00, 1.05it/s]
Done! Everything took 2.914034843444824 seconds
generate durations.txt
extract feature
/home/datatech/anaconda3/envs/speech/lib/python3.10/site-packages/setuptools/sandbox.py:13: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
/home/datatech/anaconda3/envs/speech/lib/python3.10/site-packages/pkg_resources/init.py:2871: DeprecationWarning: Deprecated call to pkg_resources.declare_namespace('mpl_toolkits').
Implementing implicit namespace packages (as specified in PEP 420) is preferred to pkg_resources.declare_namespace. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
declare_namespace(pkg)
/home/datatech/anaconda3/envs/speech/lib/python3.10/site-packages/pkg_resources/init.py:2871: DeprecationWarning: Deprecated call to pkg_resources.declare_namespace('google').
Implementing implicit namespace packages (as specified in PEP 420) is preferred to pkg_resources.declare_namespace. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
declare_namespace(pkg)
/home/datatech/anaconda3/envs/speech/lib/python3.10/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: np.complex is a deprecated alias for the builtin complex. To silence this warning, use complex by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.complex128 here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
dtype=np.complex,
196 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 196/196 [00:00<00:00, 25790.86it/s]
Done
Traceback (most recent call last):
File “/home/datatech/proj/paddlespeech/examples/other/tts_finetune/tts3/local/extract_feature.py”, line 346, in
extract_feature(
File “/home/datatech/proj/paddlespeech/examples/other/tts_finetune/tts3/local/extract_feature.py”, line 266, in extract_feature
normalize(speech_scaler, pitch_scaler, energy_scaler, vocab_phones,
File “/home/datatech/proj/paddlespeech/examples/other/tts_finetune/tts3/local/extract_feature.py”, line 155, in normalize
dataset = DataTable(
File “/home/datatech/proj/paddlespeech/paddlespeech/t2s/datasets/data_table.py”, line 45, in init
assert len(data) > 0, “This dataset has no examples”
AssertionError: This dataset has no examples