论文阅读 [CVPR-2022] Autoregressive Image Generation using Residual Quantization
studyai.com
搜索论文: [Autoregressive Image Generation using Residual Quantization](http://www.studyai.com/search/whole-site/?q=Autoregressive Image+Generation+using+Residual+Quantization)
http://www.studyai.com/search/whole-site/?q=Autoregressive+Image+Generation+using+Residual+Quantization
摘要(Abstract)
For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes.
对于高分辨率图像的自回归(AR)建模,矢量量化(VQ)将图像表示为离散代码序列。
A short sequence length is important for an AR model to reduce its computational costs to consider long-range interactions of codes.
短序列长度是重要的AR模型,以减少其计算成本,以考虑长程相互作用的代码。
However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off.
然而,我们假设之前的VQ不能在率失真权衡方面缩短编码序列并同时生成高保真图像。
In this study, we propose the two-stage framework, which consists of Residual-Quantized VAE (RQ-VAE) and RQ-Transformer, to effectively generate high-resolution images.
在本研究中,我们提出了由残差量化VAE(RQ-VAE)和RQ变换器组成的两阶段框架,以有效地生成高分辨率图像。
Given a fixed codebook size, RQVAE can precisely approximate a feature map of an image and represent the image as a stacked map of discrete codes.
给定固定的码本大小,RQVAE可以精确地逼近图像的特征映射,并将图像表示为离散码的叠加映射。
Then, RQ-Transformer learns to predict the quantized feature vector at the next position by predicting the next stack of codes.
然后,RQ Transformer通过预测下一个代码堆栈来学习预测下一个位置的量化特征向量。
Thanks to the precise approximation of RQ-VAE, we can represent a 256×256 image as 8×8 resolution of the feature map, and RQ-Transformer can efficiently reduce the computational costs.
由于RQ-VAE的精确逼近,我们可以将256×256图像表示为8×8分辨率的特征图,RQ Transformer可以有效地降低计算成本。
Consequently, our framework outperforms the existing AR models on various benchmarks of unconditional and conditional image generation.
因此,我们的框架在无条件和条件图像生成的各种基准上优于现有的AR模型。
Our approach also has a significantly faster sampling speed than previous AR models to generate high-quality images.
我们的方法也比以前的AR模型具有更快的采样速度,以生成高质量的图像。