论文阅读 [CVPR-2022] Autoregressive Image Generation using Residual Quantization

论文阅读 [CVPR-2022] Autoregressive Image Generation using Residual Quantization

studyai.com

搜索论文: [Autoregressive Image Generation using Residual Quantization](http://www.studyai.com/search/whole-site/?q=Autoregressive Image+Generation+using+Residual+Quantization)

http://www.studyai.com/search/whole-site/?q=Autoregressive+Image+Generation+using+Residual+Quantization

摘要(Abstract)

For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes.

对于高分辨率图像的自回归(AR)建模,矢量量化(VQ)将图像表示为离散代码序列。

A short sequence length is important for an AR model to reduce its computational costs to consider long-range interactions of codes.

短序列长度是重要的AR模型,以减少其计算成本,以考虑长程相互作用的代码。

However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off.

然而,我们假设之前的VQ不能在率失真权衡方面缩短编码序列并同时生成高保真图像。

In this study, we propose the two-stage framework, which consists of Residual-Quantized VAE (RQ-VAE) and RQ-Transformer, to effectively generate high-resolution images.

在本研究中,我们提出了由残差量化VAE(RQ-VAE)和RQ变换器组成的两阶段框架,以有效地生成高分辨率图像。

Given a fixed codebook size, RQVAE can precisely approximate a feature map of an image and represent the image as a stacked map of discrete codes.

给定固定的码本大小,RQVAE可以精确地逼近图像的特征映射,并将图像表示为离散码的叠加映射。

Then, RQ-Transformer learns to predict the quantized feature vector at the next position by predicting the next stack of codes.

然后,RQ Transformer通过预测下一个代码堆栈来学习预测下一个位置的量化特征向量。

Thanks to the precise approximation of RQ-VAE, we can represent a 256×256 image as 8×8 resolution of the feature map, and RQ-Transformer can efficiently reduce the computational costs.

由于RQ-VAE的精确逼近,我们可以将256×256图像表示为8×8分辨率的特征图,RQ Transformer可以有效地降低计算成本。

Consequently, our framework outperforms the existing AR models on various benchmarks of unconditional and conditional image generation.

因此,我们的框架在无条件和条件图像生成的各种基准上优于现有的AR模型。

Our approach also has a significantly faster sampling speed than previous AR models to generate high-quality images.

我们的方法也比以前的AR模型具有更快的采样速度,以生成高质量的图像。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值