VideoGPT：使用VQ-VAE和Transformers的视频生成

最新推荐文章于 2025-05-12 10:21:59 发布

umbrellazg

最新推荐文章于 2025-05-12 10:21:59 发布

阅读量2.1k

点赞数 16

文章标签：算法计算机视觉

本文链接：https://blog.csdn.net/m0_51576139/article/details/135850561

版权

1 Title

VideoGPT: Video Generation using VQ-VAE and Transformers（Wilson Yan，Yunzhi Zhang ，Pieter Abbeel，Aravind Srinivas）

2 Conlusion

This paper present VideoGPT: a conceptually simple architecture for scaling likelihood based generative modeling to natural videos. VideoGPT uses VQ-VAE that learns downsampled discrete latent representations of a raw video by employing 3D convolutions and axial self-attention. A simple GPT-like architecture is then used to autoregressively model the discrete latents using spatio-temporal position encodings.

3 Good Sentences

1、High-fidelity natural videos is one notable modality that has not seen the same level of progress in generative modeling as compared to images, audio, and text. This is reasonable since the complexity of natural videos requires modeling correlations across both space and time with much higher input dimensions. Video modelin