this paper proposed the Text2Scene model, instead of using the GAN, the researcher used tacombinationofanencoder-decodermodelwith a semi-parametric retrieval-based approach(基于半参数检索的编码器与解码器结合的方法)
从文本合成图像还需要一定程度的语言和视觉理解,这可以通过自然语言查询,文本表示学习以及自动计算机图形和图像编辑应用程序导致图像检索中的应用。
Text2Scene, a model to interpret important semantics in visually descriptive language in order to generate compositional scene representations.
Text2Scene模型,是将描述性的语言中重要的语义可视化地生成合成场景表示,尤其注重于生成包含一系列物体,以及他们的属性(位置,尺寸,纵横比,姿势,外表特征)的场景表示
本文改编并训练模型生成了三种类型的场景表示
Cartoon-like scenes from the Abstract Scenes dataset [41] where the objects include locations, sizes, aspect ratios, orientations, and poses (2) Object layouts for scenes in the COCO dataset [22] where the objects include locations, sizes, and aspect ratios, and (3) Synthetic image composites for scenes in the COCO dataset [22] where the objects include locat