一、背景解释
文字生成图片是一个基于深度学习的机器学习任务,其目的是从文本中学习如何将文本转换为图像,以构建具有自然语言描述性的图像。这类技术的基础是在语言和视觉概念之间建立联系,以便能够理解文本描述,并将其转换为图像。
三种文本转图像模型脱颖而出:Stable Diffusion、Midjourney 和 DALL·E 2
如果您正在寻找开源图像生成器,Stable Diffusion 是目前三者中唯一的选择。你可以在你的计算机上本地运行 Stable Diffusion,这意味着你有更多的控制权、更好的定制,甚至可以使用他们的深度学习文本转图像模型构建你自己的 AI 工具。
二、资料来源
1. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention,论文地址:https://arxiv.org/abs/1502.03044
2. DenseCap: Fully Convolutional Localization Networks for Dense Captioning,论文地址:https://arxiv.org/abs/1511.07571
3. Neural Baby Talk,论文地址:https://arxiv.org/abs/1508.06624
4. Generative Adversarial Text-to-Image Synthesis,论文地址:https://arxiv.org/abs/1605.05396
5. Image Generation from Text,论文地址:https://arxiv.org/abs/1511.02793
6. Text to Image Synthesis Using Thought Vectors,论文地址:https://arxiv.org/abs/1605.05396
7. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks,论文地址:https://arxiv.org/abs/1612.03242
8. Show, Control and Tell: A Definitive Image Captioning Framework,论文地址:https://arxiv.org/abs/1703.09137
9. Image Captioning With Semantic Attention,论文地址:https://arxiv.org/abs/1709.06309
10. Generative Adversarial Text-to-Image Synthesis,论文地址:htt