wav2lib环境配置


前言

万丈高楼平地起,先搭建环境测试下wav2lib效果。


一、wav2lib简介

wav2lib 是一种基于深度学习的语音驱动面部动画生成算法。该算法的核心思想是将语音信号中的信息映射到面部动画参数中,从而生成逼真的面部动画。Wav2Lip算法主要包括两个阶段:特征提取阶段和动画生成阶段。在特征提取阶段,算法通过对输入的语音信号进行特征提取,得到与语音相关的特征表示。在动画生成阶段,算法利用提取的特征表示,预测面部动画参数,进而生成面部动画。

二、环境配置

1.下载源码

Wav2lib源码地址

git clone https://github.com/Rudrabha/Wav2Lip 
Cloning into 'Wav2Lip'...
remote: Enumerating objects: 381, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (3/3), done.
remote: Total 381 (delta 0), reused 0 (delta 0), pack-reused 378
Receiving objects: 100% (381/381), 538.67 KiB | 941.00 KiB/s, done.
Resolving deltas: 100% (209/209), done.

2.安装环境

conda create -n wav2lib python=3.7
conda activae wav2lib

//安装pytorch
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

cd wav2lib
pip install -r requirements.txt  //注释掉里面的pythorch 、torchvision版本

//结果如下
Collecting cffi>=1.0
  Using cached cffi-1.15.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (427 kB)
Collecting pycparser
  Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB)
Building wheels for collected packages: librosa
  Building wheel for librosa (setup.py) ... done
  Created wheel for librosa: filename=librosa-0.7.0-py3-none-any.whl size=1598349 sha256=a92ac1ebb2dac233b3fea89810abf827ff483be9980b3bfe426b9cc33c5f9fa8
  Stored in directory: /home/ps/.cache/pip/wheels/78/fc/20/f0576a7fe176fa34e400f46fd92ae9663cc65c2d01cddb85aa
Successfully built librosa
Installing collected packages: llvmlite, tqdm, threadpoolctl, six, pycparser, numpy, joblib, decorator, audioread, scipy, opencv-python, opencv-contrib-python, numba, cffi, soundfile, scikit-learn, resampy, librosa
  Attempting uninstall: numpy
    Found existing installation: numpy 1.21.6
    Uninstalling numpy-1.21.6:
      Successfully uninstalled numpy-1.21.6
Successfully installed audioread-3.0.1 cffi-1.15.1 decorator-5.1.1 joblib-1.3.2 librosa-0.7.0 llvmlite-0.31.0 numba-0.48.0 numpy-1.17.1 opencv-contrib-python-4.9.0.80 opencv-python-4.1.0.25 pycparser-2.21 resampy-0.3.1 scikit-learn-1.0.2 scipy-1.7.3 six-1.16.0 soundfile-0.12.1 threadpoolctl-3.1.0 tqdm-4.45.0

3.测试效果

python inference.py --checkpoint_path <ckpt> --face <video.mp4> --audio <an-audio-source> 

//查看结果文件
ffplay -autoexit filename.mp4 //ubuntu 下查看mp4文件

//整体效果
整体上看,效果一般。
需要优化的点:嘴型部分比较模糊,牙齿不清晰




实际生成的视频效果

result_voice

4.报错

4.1 AttributeError: module ‘cv2’ has no attribute ‘_registerMatType’

Traceback (most recent call last):
  File "inference.py", line 3, in <module>
    import scipy, cv2, os, sys, argparse, audio
  File "/devdata/anaconda3/envs/wav2lib/lib/python3.7/site-packages/cv2/__init__.py", line 181, in <module>
    bootstrap()
  File "/devdata/anaconda3/envs/wav2lib/lib/python3.7/site-packages/cv2/__init__.py", line 175, in bootstrap
    if __load_extra_py_code_for_module("cv2", submodule, DEBUG):
  File "/devdata/anaconda3/envs/wav2lib/lib/python3.7/site-packages/cv2/__init__.py", line 28, in __load_extra_py_code_for_module
    py_module = importlib.import_module(module_name)
  File "/devdata/anaconda3/envs/wav2lib/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/devdata/anaconda3/envs/wav2lib/lib/python3.7/site-packages/cv2/mat_wrapper/__init__.py", line 39, in <module>
    cv._registerMatType(Mat)
AttributeError: module 'cv2' has no attribute '_registerMatType'


//解决办法
pip install --upgrade opencv-python

pip install --upgrade opencv-contrib-python

pip install --upgrade opencv-python-headless

opencv-python 版本>= 4.5.4

4.2 EOFError: Ran out of input

Traceback (most recent call last):
  File "inference.py", line 283, in <module>
    main()
  File "inference.py", line 255, in main
    model = load_model(args.checkpoint_path)
  File "inference.py", line 174, in load_model
    checkpoint = _load(path)
  File "inference.py", line 165, in _load
    checkpoint = torch.load(checkpoint_path)
  File "/devdata/anaconda3/envs/wav2lib/lib/python3.7/site-packages/torch/serialization.py", line 595, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/devdata/anaconda3/envs/wav2lib/lib/python3.7/site-packages/

//解决办法
这个错误通常是由于pickle模块在读取文件时发现文件结束符(EOF)而引起的。可能的解决方案如下:

1.检查文件路径和文件名是否正确;下载的文件是否完整。(经过分析是权重没有下载完全)
2.确保文件存在并且您有足够的权限读取它。
3.如果您正在使用数据流,请确保数据流不为空,并且已经打开。您可以使用data_stream.readable()检查数据流是否可读。
4.尝试使用不同版本的Python或pickle协议重新生成您的pickle文件,以确保其与您当前使用的Python版本兼容。
5.如果pickle文件过大,可能会导致内存问题,建议使用pickle.load()的mmap_mode参数,这将启用内存映射模式,从而减少内存占用。
                       


三、初步总结

wav2lib 能够根据图片和语音生成视频,目前在嘴型的清晰度需要优化,牙齿部分存在缺失失真的情况。

  • 5
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值