1 ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit
(利用舞蹈动作单元为音乐生成舞蹈(music to dance),没开源,20年9月CVPR,code:[2009.07637] ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit (arxiv.org)
参考:基于舞蹈单元的音乐驱动舞蹈:《ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit》 - 知乎 (zhihu.com)【论文分享】ChoreoNet: 利用舞蹈动作单元为音乐生成舞蹈(music to dance) - 知乎 (zhihu.com))
2 Soundini: Sound-Guided Diffusion for Natural Video Editing
(用于自然视频编辑的声音引导扩散,没开源,23年8月CVPR,效果:Soundini: Sound-Guided Diffusion for Natural Video Editings (kuai-lab.github.io))
3 Dancing to Music
(GAN,开源,分析合成学习框架,从音乐中生成舞蹈,19年11月CVPR)
4 DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models
(没开源,基于生成扩散模型的高质量音频-视觉分离。23年7月CVPR)
5 AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion
(基于文本到图像扩散的音频对齐视频合成,没开源,23年5月CVPR)
6 Long-Term Rhythmic Video Soundtracker
(LORIS,长序列视频配乐模型,开源,23年5月)
7 Prompt-to-Prompt Image Editing with Cross Attention Control
(具有交叉注意力控制的提示对提示图像编辑,开源)
8 DreamPose
9 Style-Controllable Speech-Driven Gesture Synthesis Using Normalising Flows
(基于归一化流的风格可控语音驱动手势合成,2020年计算机图形论坛)
下面几篇都是基于diffusion做Talking Head Generation的文章:
10 Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation
paper:diffused_heads.pdf (mstypulkowski.github.io)
11 Speech Driven Video Editing via an Audio-Conditioned Diffusion Model
paper:[2301.04474] Speech Driven Video Editing via an Audio-Conditioned Diffusion Model (arxiv.org)
12 DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder
paper:arxiv.org/pdf/2303.17550.pdf
开源,code: DAE-Talker (daetalker.github.io)
13 DAE-Talker的前置文章DAE (Diffusion Autoencoder)
将DDIM的控制信息编码到类似StyleGAN的隐空间,借此实现与StyleGAN类似的可控图像生成。与StyleGAN不同的是它将latent code分为两部分,分别是有语义意义的线性隐码和捕捉随机细节的“Noise”。
14 MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
Submitted on 15 Dec 2022 (v1), last revised 27 Mar 2023 (this version, v3)
15 Diffusion Autoencoders: Toward a Meaningful and Decodable Representation
paper:[2111.15640] Diffusion Autoencoders: Toward a Meaningful and Decodable Representation (arxiv.org)
16 GFP-GAN: Towards Real-World Blind Face Restoration with Generative Facial Prior
paper:[2101.04061] Towards Real-World Blind Face Restoration with Generative Facial Prior (arxiv.org)
专注于高清细节恢复。基于生成对抗网络(GAN)的人脸图像生成模型,通过引入人脸先验指导,GFPGAN能够更好地控制生成图像的质量和多样性。很多说话头生成都会在后面直接加一个gfpgan做超分
17 Real-Time Intermediate Flow Estimation for Video Frame Interpolation
paper:[2011.06294] Real-Time Intermediate Flow Estimation for Video Frame Interpolation (arxiv.org)
RIFE,一种插帧方法,可以改善说话头生成效果
可以在我们的解决方案之上应用诸如[Learning trajectory-aware transformer for video superresolution]的视频超分辨率技术,以获得高分辨率的样本。