CVPR-2024 视频生成(Video Generation)相关论文

CVPR-2024 视频生成(Video Generation)相关论文

在这里插入图片描述

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

文章解读: http://www.studyai.com/xueshu/paper/detail/070dab0e1a

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Wang_A_Recipe_for_Scaling_up_Text-to-Video_Generation_with_Text-free_Videos_CVPR_2024_paper.html)

360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model

文章解读: http://www.studyai.com/xueshu/paper/detail/0f48413a14

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Wang_360DVD_Controllable_Panorama_Video_Generation_with_360-Degree_Video_Diffusion_Model_CVPR_2024_paper.html)

GenTron: Diffusion Transformers for Image and Video Generation

文章解读: http://www.studyai.com/xueshu/paper/detail/183a728a73

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Chen_GenTron_Diffusion_Transformers_for_Image_and_Video_Generation_CVPR_2024_paper.html)

Grid Diffusion Models for Text-to-Video Generation

文章解读: http://www.studyai.com/xueshu/paper/detail/2747e621ef

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Lee_Grid_Diffusion_Models_for_Text-to-Video_Generation_CVPR_2024_paper.html)

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

文章解读: http://www.studyai.com/xueshu/paper/detail/3dfb22d3bb

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Skorokhodov_Hierarchical_Patch_Diffusion_Models_for_High-Resolution_Video_Generation_CVPR_2024_paper.html)

DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation

文章解读: http://www.studyai.com/xueshu/paper/detail/556fcfdb50

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Wang_DiffPerformer_Iterative_Learning_of_Consistent_Latent_Guidance_for_Diffusion-based_Human_CVPR_2024_paper.html)

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

文章解读: http://www.studyai.com/xueshu/paper/detail/70ff96d9a4

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Wen_Panacea_Panoramic_and_Controllable_Video_Generation_for_Autonomous_Driving_CVPR_2024_paper.html)

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

文章解读: http://www.studyai.com/xueshu/paper/detail/85a87dc591

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Wang_MicroCinema_A_Divide-and-Conquer_Approach_for_Text-to-Video_Generation_CVPR_2024_paper.html)

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

文章解读: http://www.studyai.com/xueshu/paper/detail/8c2a58ed68

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Qing_Hierarchical_Spatio-temporal_Decoupling_for_Text-to-Video_Generation_CVPR_2024_paper.html)

VideoBooth: Diffusion-based Video Generation with Image Prompts

文章解读: http://www.studyai.com/xueshu/paper/detail/8f77a78d58

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Jiang_VideoBooth_Diffusion-based_Video_Generation_with_Image_Prompts_CVPR_2024_paper.html)

Make Pixels Dance: High-Dynamic Video Generation

文章解读: http://www.studyai.com/xueshu/paper/detail/8fabe1ebc1

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Zeng_Make_Pixels_Dance_High-Dynamic_Video_Generation_CVPR_2024_paper.html)

SimDA: Simple Diffusion Adapter for Efficient Video Generation

文章解读: http://www.studyai.com/xueshu/paper/detail/a6a98f9c07

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Xing_SimDA_Simple_Diffusion_Adapter_for_Efficient_Video_Generation_CVPR_2024_paper.html)

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

文章解读: http://www.studyai.com/xueshu/paper/detail/c2c282b646

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/He_Co-Speech_Gesture_Video_Generation_via_Motion-Decoupled_Diffusion_Model_CVPR_2024_paper.html)

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

文章解读: http://www.studyai.com/xueshu/paper/detail/c855174eb6

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Cai_Generative_Rendering_Controllable_4D-Guided_Video_Generation_with_2D_Diffusion_Models_CVPR_2024_paper.html)

LAMP: Learn A Motion Pattern for Few-Shot Video Generation

文章解读: http://www.studyai.com/xueshu/paper/detail/ca4f57a43a

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Wu_LAMP_Learn_A_Motion_Pattern_for_Few-Shot_Video_Generation_CVPR_2024_paper.html)

PEEKABOO: Interactive Video Generation via Masked-Diffusion

文章解读: http://www.studyai.com/xueshu/paper/detail/d7cb1ba6b5

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Jain_PEEKABOO_Interactive_Video_Generation_via_Masked-Diffusion_CVPR_2024_paper.html)

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

文章解读: http://www.studyai.com/xueshu/paper/detail/f6d2ec054b

文章链接: (https://openaccess.thecvf.com/content/CVPR2024/html/Liu_EvalCrafter_Benchmarking_and_Evaluating_Large_Video_Generation_Models_CVPR_2024_paper.html)

### 视频生成技术的相关研究 #### VideoGPT 的设计与实现 受 DALLE 工作的启发,加州大学伯克利分校于 2021 年推出了基于 VQ-VAE 和 GPT 架构的视频生成模型——VideoGPT。该模型的核心在于利用离散表示学习(Discrete Representation Learning),通过矢量量化变分自编码器(VQ-VAE)将连续的像素空间映射到离散的空间中,从而使得 Transformer 能够高效处理序列化后的帧数据并生成连贯的视频片段[^1]。 为了进一步提升生成质量,研究人员引入了姿态对齐机制。这种机制通过对参考帧进行特征增强,在保留原始外观信息的同时融入人体骨骼拓扑结构的信息。具体做法是将编码后的骨架图残差叠加至参考帧上,随后由参考网络完成最终的特征提取过程[^2]。 #### Animate Anyone 的创新贡献 阿里提出的 Animate Anyone 是一种专注于角色动画合成的技术框架,特别适用于从静态图片生成动态视频的任务场景。在 UBC 时尚视频数据集上的实验表明,相比传统方法,Animate Anyone 不仅能够在定量评估指标上取得更优的表现,而且还能更好地维持衣物纹理细节的一致性[^3]。这一特性对于虚拟试衣、影视特效制作等领域具有重要意义。 #### 多样化数据集下的挑战 尽管非条件生成对抗网络 (GANs) 在单一类别的图像生成任务中已经达到了较高的水准,但在面对多类别、高度多样化的内容时仍面临诸多困难。例如,当尝试构建像 ImageNet 这样的大规模综合数据库对应的生成模型时,往往需要额外的控制信号或者针对特定应用场景重新训练整个体系结构才能达到预期效果[^4]。 ```python import torch from torchvision import transforms, datasets def load_dataset(data_dir): transform = transforms.Compose([ transforms.Resize((64, 64)), transforms.ToTensor(), ]) dataset = datasets.ImageFolder(root=data_dir, transform=transform) dataloader = torch.utils.data.DataLoader(dataset, batch_size=32, shuffle=True) return dataloader ``` 上述代码展示了一个简单的 PyTorch 数据加载流程,可用于准备用于训练 GAN 或其他深度学习模型的数据集。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值