最新数字人开源代码-echomimic-CSDN博客

本文链接：https://blog.csdn.net/m0_45267220/article/details/142562803

github地址：GitHub - BadToBest/EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

论文地址：

[2407.08136] EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions (arxiv.org)

项目运行：

1、安装环境

#拉取代码
git clone https://github.com/BadToBest/EchoMimic
cd EchoMimic

#创建环境
conda create -n echomimic python=3.8
conda activate echomimic

#安装依赖
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

2、下载并解压ffmpeg-static

#指向解压后ffmpeg-4.4-amd64-static文件夹的位置
export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static

3、模型下载

可以通过git clone从hugginface下载，也可以手动下载，地址为：https://huggingface.co/BadToBest/EchoMimic

git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weights

在这里附上模型的夸克资源：

点击链接即可保存。打开「夸克APP」
链接：https://pan.quark.cn/s/0b8ccdfda5f9
提取码：T8dy

至此，环境准备完毕，可以进行推理

4、音频加图片推理

使用自己的图片加音频推理，在./configs/prompts/animation.yaml文件test_cases添加进自己的图片和音频路径：

test_cases:
  "path/to/your/image":
    - "path/to/your/audio"

运行

  python -u infer_audio2vid.py

结果保存地址推理完会显示

5、说明

资源消耗情况：8g显存可运行

推理速度较慢