MuseTalk如何生成高质量视频（使用技巧）

玩人工智能的辣条哥

已于 2024-05-06 11:59:27 修改

阅读量707

点赞数 2

分类专栏：开源AI项目文章标签：音视频 Muse talk AI 声音驱动视频

于 2024-04-24 21:44:38 首次发布

本文链接：https://blog.csdn.net/weixin_42672685/article/details/137765836

版权

开源AI项目专栏收录该内容

9 篇文章 1 订阅 ¥199.90 ¥99.00

订阅专栏

超级会员免费看

环境：

MuseTalk 2024.4.2

GPU:英伟达4070 12G

问题描述：

MuseTalk如何生成高质量视频（使用技巧）

在这里插入图片描述

解决方案：

MuseTalk was trained in latent spaces, where the images were encoded by a freezed VAE. The audio was encoded by a freezed whisper-tiny model. The architecture of the generation network was borrowed from the UNet of the stable-diffusion-v1-4, where the audio embeddings were fused to the image embeddings by cross-attention.
MuseTalk在潜伏空间中进行训练，图像由冻结的VAE编码。音频由冻结 whisper-tiny 模型编码。生成网络的架构借鉴了 stable-diffusion-v1-4 的 UNet，其中音频嵌入通过交叉注意力融合到图像嵌入中。

Note that although we use a very similar architecture as Stable Diffusion, MuseTalk is distinct in that it is NOT a diffusion model. Instead, MuseTalk operates by inpainting in the latent space with a single

了解本专栏