开源项目复现 | DreamTalk：让静态头像说话、唱歌

本文链接：https://blog.csdn.net/qq_34438629/article/details/135413622

DreamTalk

项目简介：让静态头像说话、唱歌。

项目地址：https://github.com/ali-vilab/dreamtalk

一、安装

conda create -n dreamtalk python=3.7.0
conda activate dreamtalk
pip install -r requirements.txt
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda update ffmpeg

pip install urllib3==1.26.6
pip install transformers==4.28.1
pip install dlib

二、下载检查点

1、进入：https://modelscope.cn/models/damo/dreamtalk/files
2、点击 checkpoints 文件夹，会看到checkpoints 文件夹下的两个文件，分别点击这两个文件，进入详情页，下载到本地
在这里插入图片描述

在这里插入图片描述
把这两个文件放到 checkpoints 文件夹中

三、开始推理

官方示例-英文

python inference_for_demo_video.py --wav_path data/audio/acknowledgement_english.m4a --style_clip_path data/style_clip/3DMM/M030_front_neutral_level1_001.mat --pose_path data/pose/RichardShelby_front_neutral_level1_001.mat --image_path data/src_img/uncropped/male_face.png --cfg_scale 1.0 --max_gen_len 30 --output_name acknowledgement_english@M030_front_neutral_level1_001@male_face

官方示例-中文

python inference_for_demo_video.py --wav_path data/audio/acknowledgement_chinese.m4a --style_clip_path data/style_clip/3DMM/M030_front_surprised_level3_001.mat --pose_path data/pose/RichardShelby_front_neutral_level1_001.mat --image_path data/src_img/cropped/zp1.png --disable_img_crop --cfg_scale 1.0 --max_gen_len 30 --output_name acknowledgement_english@M030_front_surprised_level3_001@zp1

四、运行推理代码时会报的错

1、连接不到huggingface

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like jonatasgrosman/wav2vec2-large-xlsr-53-english is not the path to a directory containing a file named preprocessor_config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

解决方案：
连上VPN就好了

2、No audio I/O backend is available

RuntimeError: No audio I/O backend is available.

解决方案：
没有可用的音频 I/O 后端，就安装。
windows电脑安装这个包即可：pip install soundfile

参考：https://github.com/ali-vilab/dreamtalk/issues/2
pip install soundfile (win)
pip install sox (linux)

3、没有找到jonatasgrosman/wav2vec2-large-xlsr-53-english相关文件

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like jonatasgrosman/wav2vec2-large-xlsr-53-english is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

解决方案：
1）进入：https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-english/tree/main
2）下载这四个文件
在这里插入图片描述
3）在项目根目录下创建文件夹“jonatasgrosman/wav2vec2-large-xlsr-53-english”，把这4个文件放进去

五、查看结果

最后几行是这样的，表示运行成功了：

...
video:183kB audio:142kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 3.590172%
[libx264 @ 000002851b21c840] frame I:2     Avg QP:18.42  size:  6248
[libx264 @ 000002851b21c840] frame P:111   Avg QP:22.18  size:  1185
[libx264 @ 000002851b21c840] frame B:301   Avg QP:25.02  size:   143
[libx264 @ 000002851b21c840] consecutive B-frames:  0.5%  7.2%  1.4% 90.8%
[libx264 @ 000002851b21c840] mb I  I16..4:  6.1% 67.0% 27.0%
[libx264 @ 000002851b21c840] mb P  I16..4:  0.0%  0.3%  0.0%  P16..4: 38.7% 20.8%  9.5%  0.0%  0.0%    skip:30.7%
[libx264 @ 000002851b21c840] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8: 30.0%  1.6%  0.2%  direct: 0.2%  skip:68.0%  L0:40.6% L1:55.1% BI: 4.3%
[libx264 @ 000002851b21c840] 8x8 transform intra:68.3% inter:67.6%
[libx264 @ 000002851b21c840] coded y,uvDC,uvAC intra: 77.6% 78.6% 48.3% inter: 6.8% 4.4% 0.1%
[libx264 @ 000002851b21c840] i16 v,h,dc,p: 44% 11% 31% 13%
[libx264 @ 000002851b21c840] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 23% 21% 18%  4%  7%  8%  6%  7%  6%
[libx264 @ 000002851b21c840] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 26% 21% 10%  6%  8%  7%  8%  6%  8%
[libx264 @ 000002851b21c840] i8c dc,h,v,p: 47% 20% 23% 10%
[libx264 @ 000002851b21c840] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 000002851b21c840] ref P L0: 59.0% 17.7% 16.2%  7.1%
[libx264 @ 000002851b21c840] ref B L0: 88.6%  8.1%  3.3%
[libx264 @ 000002851b21c840] ref B L1: 97.3%  2.7%
[libx264 @ 000002851b21c840] kb/s:90.41
[aac @ 000002851b2b4380] Qavg: 52903.660

去根目录下的output_video文件夹查看你生成的视频。

六、初步测试结果

试了几个，效果不太好，包括官方示例。只能上传256x256的正面人像，不上传256x256也行，它会给你截，我上传了一个半身照会报错找不到人脸。这还好，不用半身就行。但是照片是侧脸、或者就是正面头像，脸部也会变形、模糊不像照片里的样子，我试的照片就完全换了一个人。口型有时候对有时候不对。任重道远，加油。