Real-Time-Voice-Cloning的使用教程

实时语音克隆实践

最新推荐文章于 2025-08-27 01:21:28 发布

原创最新推荐文章于 2025-08-27 01:21:28 发布 · 1.9w 阅读

64 ·

CC 4.0 BY-SA版权

版权归个人所有，转载请告知，否则追求法律责任

3—DL(deep learning)深度学习专栏收录该内容

46 篇文章

订阅专栏

文章目录：

1 环境的搭建
2 如何使用Real-Time-Voice-Cloning
3 常见错误
- 3.1 错误1：`OSError: PortAudio library not found`
4 训练其他语言的数据集

1 环境的搭建

我安装版本：

在虚拟环境tf1中
tensorflow-gpu==1.15.0
torch==1.4.0

2 如何使用Real-Time-Voice-Cloning

2.1 下载预训练的模型

1、官方给出的预训练模型下载连接

2、下载打包好的预训练模型pretrained.zip模型，直接解压即可！

unzip pretrained.zip

在这里插入图片描述

解压后的文件会自动放置到如下的路径下：

encoder\saved_models\pretrained.pt
synthesizer\saved_models\logs-pretrained\taco_pretrained\checkpoint
synthesizer\saved_models\logs-pretrained\taco_pretrained\tacotron_model.ckpt-278000.data-00000-of-00001
synthesizer\saved_models\logs-pretrained\taco_pretrained\tacotron_model.ckpt-278000.index
synthesizer\saved_models\logs-pretrained\taco_pretrained\tacotron_model.ckpt-278000.meta
vocoder\saved_models\pretrained\pretrained.pt

2.2 先测试环境是否可用（optional）

你可以先用如下的命令进行测试：

python demo_cli.py

如果测试通过，没有报错则表示环境没有问题，当然该步骤是可选的，你也可以不测试！

2.3 下载数据集（optional）

对于仅使用工具箱的情况，仅仅建议下载LibriSpeech / train-clean-100。将内容提取为<datasets_root> / LibriSpeech / train-clean-100，其中<datasets_root>是您选择的目录。 toolbox中支持其他数据集，请参见此处。你可以自由地不下载任何数据集，但是您将需要自己的数据作为音频文件，或者必须使用工具箱记录下来。

LibriSpeech / train-clean-100：https://www.openslr.org/resources/12/train-clean-100.tar.gz

2.4 运行ToolBox

运行toolbox，如果你已经下载了数据集，可以用如下的命令

python demo_toolbox.py -d <datasets_root>

如果你没有下载数据集，直接运行

python demo_toolbox.py

在这里插入图片描述

3 常见错误

3.1 错误1：`OSError: PortAudio library not found`

1、错误：在运行python demo_toolbox.py的时候程序直接报错：OSError: PortAudio library not found

(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$ python demo_toolbox.py -h
/home/shl/shl_res/1_project/Real-Time-Voice-Cloning/encoder/audio.py:13: UserWarning: Unable to import 'webrtcvad'. This package enables noise removal and is recommended.
  warn("Unable to import 'webrtcvad'. This package enables noise removal and is recommended.")
Traceback (most recent call last):
  File "demo_toolbox.py", line 2, in <module>
    from toolbox import Toolbox
  File "/home/shl/shl_res/1_project/Real-Time-Voice-Cloning/toolbox/__init__.py", line 1, in <module>
    from toolbox.ui import UI
  File "/home/shl/shl_res/1_project/Real-Time-Voice-Cloning/toolbox/ui.py", line 10, in <module>
    import sounddevice as sd
  File "/home/shl/anaconda3/envs/tf1/lib/python3.6/site-packages/sounddevice.py", line 71, in <module>
    raise OSError('PortAudio library not found')
OSError: PortAudio library not found
(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$

2、解决方式（参考）：

sudo apt-get install libportaudio2

(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$ sudo apt-get install libportaudio2 
[sudo] password for shl:       
正在读取软件包列表... 完成
正在分析软件包的依赖关系树       
正在读取状态信息... 完成       
下列【新】软件包将被安装：
  libportaudio2
升级了 0 个软件包，新安装了 1 个软件包，要卸载 0 个软件包，有 394 个软件包未被升级。
需要下载 64.6 kB 的归档。
解压缩后会消耗 215 kB 的额外空间。
获取:1 http://mirrors.aliyun.com/ubuntu bionic/universe amd64 libportaudio2 amd64 19.6.0-1 [64.6 kB]
已下载 64.6 kB，耗时 0秒 (341 kB/s)     
正在选中未选择的软件包 libportaudio2:amd64。
(正在读取数据库 ... 系统当前共安装有 356909 个文件和目录。)
正准备解包 .../libportaudio2_19.6.0-1_amd64.deb  ...
正在解包 libportaudio2:amd64 (19.6.0-1) ...
正在设置 libportaudio2:amd64 (19.6.0-1) ...
正在处理用于 libc-bin (2.27-3ubuntu1.3) 的触发器 ...
/sbin/ldconfig.real: /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7 is not a symbolic link

(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$

3、再次运行错误解决

(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$ python demo_toolbox.py -h
/home/shl/shl_res/1_project/Real-Time-Voice-Cloning/encoder/audio.py:13: UserWarning: Unable to import 'webrtcvad'. This package enables noise removal and is recommended.
  warn("Unable to import 'webrtcvad'. This package enables noise removal and is recommended.")
/home/shl/anaconda3/envs/tf1/lib/python3.6/site-packages/umap/__init__.py:9: UserWarning: Tensorflow not installed; ParametricUMAP will be unavailable
  warn("Tensorflow not installed; ParametricUMAP will be unavailable")
usage: demo_toolbox.py [-h] [-d DATASETS_ROOT] [-e ENC_MODELS_DIR]
                       [-s SYN_MODELS_DIR] [-v VOC_MODELS_DIR] [--low_mem]
                       [--seed SEED] [--no_mp3_support]

Runs the toolbox

optional arguments:
  -h, --help            show this help message and exit
  -d DATASETS_ROOT, --datasets_root DATASETS_ROOT
                        Path to the directory containing your datasets. See
                        toolbox/__init__.py for a list of supported datasets.
                        (default: None)
  -e ENC_MODELS_DIR, --enc_models_dir ENC_MODELS_DIR
                        Directory containing saved encoder models (default:
                        encoder/saved_models)
  -s SYN_MODELS_DIR, --syn_models_dir SYN_MODELS_DIR
                        Directory containing saved synthesizer models
                        (default: synthesizer/saved_models)
  -v VOC_MODELS_DIR, --voc_models_dir VOC_MODELS_DIR
                        Directory containing saved vocoder models (default:
                        vocoder/saved_models)
  --low_mem             If True, the memory used by the synthesizer will be
                        freed after each use. Adds large overhead but allows
                        to save some GPU memory for lower-end GPUs. (default:
                        False)
  --seed SEED           Optional random number seed value to make toolbox
                        deterministic. (default: None)
  --no_mp3_support      If True, no mp3 files are allowed. (default: False)
(tf1) shl@zhihui-mint:~/shl_res/1_project/Real-Time-Voice-Cloning$