Pareidolia — AI的艺术教学-CSDN博客

探索AI理解艺术的能力，通过生成基于肖像的逼真照片，揭示AI对人类面孔和艺术的感知。项目展示了AI在模仿艺术风格方面的潜力，同时也暴露出其在处理复杂性和多样性时的局限。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

At Alien Intelligence, we explore our ability to teach art to AI; have it generate some evidence of its understanding, and then analyse and interpret its response. We start with a simple “lesson,” and plan to gradually develop its content and complexity in an iterative process. The quality of the interactions and the respective outcomes will depend on both the technical capabilities of AI, as well as our own human ingenuity/limitations in communicating with it.

在Alien Intelligence ，我们探索了向AI教授艺术的能力；让它产生一些对其理解的证据，然后分析和解释其响应。我们从一个简单的“课程”开始，并计划以迭代的方式逐步发展其内容和复杂性。交互的质量和相应的结果将取决于AI的技术能力，以及我们自己与之进行沟通时的人类独创性/局限性。

Pareidolia-看到人类是人类 (Pareidolia — it’s human to see humans)

Pareidolia is the tendency for incorrect perception of a stimulus as an object, pattern or meaning known to the observer, such as seeing shapes in clouds, seeing faces in inanimate objects or abstract patterns, or hearing hidden messages in music. Wikipedia

Pareidolia是一种趋势，人们对刺激的理解是对观察者已知的对象，模式或含义的错误认识，例如看到云中的形状，看到无生命的对象或抽象模式中的面Kong，或听到音乐中的隐藏信息。维基百科

In this project, our goal is to communicate to the AI that certain works of art are perception or interpretations of objects and ideas from the real world. As the first class of such objects, we chose the most human object out there — the human face.

在此项目中，我们的目标是向AI传达某些艺术品是对现实世界中对象和思想的感知或解释。作为此类物体的第一类，我们选择了那里最人性化的物体-人脸。

More specifically, we first start by showing the AI “real” human faces that are captured in photos. Next, we show it the artistic depiction of human faces as expressed in portrait paintings — ranging from realistic, all the way to abstract representation.

更具体地说，我们首先显示照片中捕获的AI“真实”人脸。接下来，我们向其展示肖像画中表达的人脸的艺术描绘-从现实到抽象。

We then probe the AI’s “understanding”: we show it new portrait paintings that it hasn’t seen before, and ask it to generate a realistic photo that captures the essence of the face in that art piece (yes, the opposite direction). We were curious to see what it would produce. We have started with realistic paintings, but our intention is to further expand to abstract, cubist, surrealist, as well as 3D works. Finally, in true “Pareidolia” fashion, we will give it photos of objects that are not human faces, and explore how it projects them as a realistic photo of a human face.

然后，我们探究AI的“理解”：向我们展示它从未见过的新肖像画，并要求它生成一张逼真的照片，以捕捉该艺术品中脸部的本质(是的，方向相反)。我们很好奇它会产生什么。我们从写实绘画开始，但我们的意图是进一步扩展到抽象，立体派，超现实主义以及3D作品。最后，我们将以真正的“ Pareidolia”方式为它提供非人脸对象的照片，并探索如何将其投影为人脸的逼真的照片。

We aim to explore the junction that doesn’t only stretch the AI’s capability of understanding and expressing art, but also our own human limitations in communicating our goals to the AI and collaborating with it.

我们的目标是探索一个结点，该结点不仅可以扩展AI的理解和表达艺术的能力，而且还可以扩展我们在将目标传达给AI以及与AI合作方面的人为局限性。

所有的魔术都有代价 (All magic comes with a price)

What makes teaching AI art magical, at this introductory level, is that there is no need to provide it with precise definitions and complex explanations of what a face is, what a portrait painting is, and how they relate to each other. Instead, we just give it (many) examples of both, and it somehow learns. Sounds exciting? Well, beware! All magic comes with a price.

在此入门级的水平上，教AI艺术的神奇之处在于，无需为面Kong，肖像画及其相互之间的关系提供精确的定义和复杂的解释。取而代之的是，我们仅给出两者的许多示例，并且以某种方式学习。听起来很令人兴奋？好吧，当心！所有的魔术都是有代价的。

In our case, this price originates from the AI’s lack of any prior knowledge about faces, portraits, or art. In fact, it lacks almost any prior knowledge about us, our histories, and our abortions. Nor of the world we live in. The only information it has, is whatever is stored in the images it is shown. Especially, it does not have access to the many concepts and facts we take for granted when WE look at portraits and photos.

在我们的案例中，这个价格是由AI缺乏关于面部，肖像或艺术的任何先验知识引起的。实际上，它几乎没有关于我们，我们的历史和堕胎的任何先验知识。我们所生活的世界也不是。它所拥有的唯一信息就是所显示图像中存储的任何内容。特别是，当我们查看肖像和照片时，它无法访问我们认为理所当然的许多概念和事实。

For example, the fact that faces are part of the human body, that there are certain universal commonalities such as the general shape of the head, the existence and positions of the eyes, ears, nose, and the mouth. The fact that humans come in different genders, races , and ages, as well as in a spectrum of genetic variability — and that all of these are visible attributes of the face. There are also more nuanced facts, like the variability of hair and facial hair, and facial expressions. Furthermore, the knowledge of what is the “natural” orientation of a human face, and how does it look from above, or from the profile. It is this prior knowledge that allows us humans to effortlessly identify and analyse a human face, as well as to distinguish between a real face, and something that just looks like it.

例如，面部是人体的一部分，存在某些普遍的共性，例如头部的一般形状，眼睛，耳朵，鼻子和嘴巴的存在和位置。人类具有不同的性别，种族和年龄以及一系列遗传变异的事实，而所有这些都是面Kong的可见属性。还有更多细微的事实，例如头发和面部毛发的变异性以及面部表情。此外，了解什么是人脸的“自然”方位，以及从上方或从侧面看的感觉。正是这些先验知识使我们人类能够轻松地识别和分析人脸，并区分真实的面Kong和看起来相似的事物。

Image for post — The ability to distinguish between a real face, and something that just looks like it

Similarly, there are concepts and facts relating to portraits. For example, the nuanced understanding that the portrait attempts to capture a face, but not necessarily in a direct and accurate manner, like a mirror does. Rather, there are built in constraints as well as intended adaptations — the technique used, the artistic statement and agenda, the time and location of the execution, and the composition and setup of the artwork.

同样，也有与肖像有关的概念和事实。例如，细微的理解是肖像试图捕获面部，但不一定像镜子一样直接且准确。而是内置了约束和预期的改编–使用的技术，艺术陈述和议程，执行的时间和位置以及艺术品的组成和设置。

These are all pieces of knowledge we take for granted, and which are critical to the task of learning the relationship between photos and portraits. Pieces of knowledge to which the AI has no access.

这些都是我们理所当然的知识，对于学习照片和肖像之间的关系至关重要。 AI无法获得的知识。

Admittedly, one could think of elaborate ways of communicating these to the AI. For example, provide labels for the images and portraits — explicitly detailing gender, race, age, expressions, and other facial attributes (bald, with beard, moustache, blonde hair, long nose, thick eyebrows, …). However, we made a conscious decision not to use these, and see how far we can get with merely the unlabelled photos and portraits. We wanted to keep our dialogue with the AI simple.

诚然，可以想到将这些信息传达给AI的复杂方法。例如，为图像和肖像提供标签-明确详细说明性别，种族，年龄，表情和其他面部属性(秃头，留着胡须，胡子，金发，长鼻子，浓密的眉毛，等等)。但是，我们有意识地决定不使用这些照片，然后看看仅使用未标记的照片和肖像就能拍摄多远。我们希望与AI的对话保持简单。

来自火星的肖像和来自金星的照片 (Portraits from Mars and photos from Venus)

By now, the importance of the images we use as examples for training must be obvious, as they encapsulate all the information the AI has access to. Let’s take a closer look at that then.

到目前为止，我们用作训练示例的图像的重要性必须显而易见，因为它们封装了AI可以访问的所有信息。然后，让我们仔细看看。

In an ideal world, we would provide the AI with pairs of images: A photo of a person’s face, and a matching portrait of that same person. Unfortunately, such datasets don’t exist. Most of the portraits we have are from before the camera was invented, and most of the photos we have, are of people who didn’t feel a need to get their portrait painted.

在理想的世界中，我们将为AI提供两副图像：一个人的脸部照片以及该人的相匹配肖像。不幸的是，这样的数据集不存在。我们拥有的大多数肖像都来自发明照相机之前的照片，而我们拥有的大多数照片都是不需要画肖像的人的照片。

Moreover, the publicly available datasets of photos of human faces are often based on images of celebrities from across the web. These are heavily biased towards young, white, good-looking, fashionable, smiling faces, that are captured from an optimally aligned frontal position. This is in great contrast to the distribution of ages, expressions, positions, and textures that we find in portraits (except that they are mostly of white objects). This point is best illustrated by examples :

此外，公开可用的人脸照片数据集通常基于网络上的名人图像。这些偏向于从最佳对齐的正面位置捕捉的年轻，白皙，漂亮，时尚，笑脸。这与我们在肖像中发现的年龄，表情，位置和纹理的分布形成了鲜明的对比(但它们大多是白色物体)。最好用示例说明这一点：

This seemingly simple difference introduces a HUGE challenge for our AI. Since the two datasets (photos vs portraits) actually represent two very different views of the human population. This indeed had very clear (and expected) affect on the learning, understanding, and output of the AI.

这个看似简单的差异为我们的AI带来了巨大的挑战。由于这两个数据集(照片与肖像)实际上代表了人类的两种截然不同的观点。这确实对AI的学习，理解和输出产生了非常明显的(并且是预期的)影响。

Again, there are ways to try and address this issue. Ranging from using a more representative dataset of photos (easier said than done, and often on the expense of quality), all the way to creating “synthetic portraits”. That is, algorithmically manufacture “artistic” portraits from photos, and using these as pairs.

同样，有多种方法可以尝试解决此问题。不再使用更具代表性的照片数据集 (说起来容易做起来难，而且往往以质量为代价)，一直到创建“合成人像”。也就是说，通过算法从照片中制造“艺术”肖像，并将它们作为对使用。

However, as before, we decided to stick with simplicity at this stage, and not to use synthetic portraits.

但是，和以前一样，我们决定在此阶段坚持简单性，而不使用合成人像。

Now that we are finally done with this long introduction, let’s have a look at our project, and what we actually produced.

既然我们终于完成了这篇冗长的介绍，那么让我们看一下我们的项目以及我们实际生产的产品。

Pareidolia项目 (Project Pareidolia)

We thought it would be insightful and fun to reverse the artistic process, and have our outputs be synthesised “realistic” photos based on a portrait, rather than the other way around. Moreover, we didn’t want the AI to “simply” apply a stylistics filter over the original portrait, and make it look more like a photo. Instead, we wanted to capture the semantics of the portrait, and recreate an “artistic projection” of it into what looks like a realistic photo.

我们认为逆转艺术过程将是有见识且有趣的，并且我们的输出将基于肖像而不是相反的方式合成为“逼真的”照片。此外，我们不希望AI“简单地”在原始肖像上应用样式过滤器，并使它看起来更像照片。取而代之的是，我们想要捕获肖像的语义，然后将其“艺术投影”重新创建为看起来逼真的照片。

In order to achieve that, we set to train the AI to be able to separate the semantics of the photos and portraits from their style, and to map these semantics to a shared “face” space.

为了实现这一目标，我们开始训练AI，使其能够将照片和肖像的语义与其样式分开，并将这些语义映射到共享的“面部”空间。

We then probed how well the AI succeeded in this task, by requesting it to generate a new synthetic photo, based on a portrait it has never seen before.

然后，我们要求AI根据从未见过的肖像生成新的合成照片，从而探究了AI在这项任务中的成功程度。

We thought this would be an insightful demonstration of “understanding” — both of the portrait, as well as of the human face.

我们认为这将是对肖像和人脸的“理解”的深刻见解。

简短的技术插曲(自费阅读) (Light technical interlude (read at your own risk))

While, as noted above, we wanted to keep the inputs as simple and as authentic as possible, we still had to apply some simple modifications. Specifically, face cropping. As humans, we are naturally drawn to the face in a portrait. However, as can be seen below, in reality — the face occupies only a small portion of the portrait. The rest is filled with the body, the mise-en-scène, and mostly — background. While we may not be bothered by it, it provides an abundance of distracting information for the AI that lacks any context and prior knowledge. So, in order to focus on what’s important — we cropped the faces for both collections.

如上所述，尽管我们希望保持输入内容尽可能简单和真实，但我们仍然必须进行一些简单的修改。具体来说，面部裁剪。作为人类，我们很自然地被肖像吸引到脸上。但是，从下面可以看出，实际上-脸只占肖像的一小部分。其余的充满了身体，场景和大部分背景。虽然我们可能不会对此感到困扰，但它为缺乏任何上下文和先验知识的AI提供了大量分散注意力的信息。因此，为了专注于重要的事物，我们裁剪了两个系列的面Kong。

Now that we had our input examples sorted, we had to choose the AI method that matched our goals.

既然我们已经对输入示例进行了排序，那么我们就必须选择符合我们目标的AI方法。

Since we didn’t want to apply a “simple” filter, we decided not to go with a style transfer approach. Nor did we want to use a Machine Learning model that looks at the pixel level similarity between the source portrait and the target synthetic photo. So we looked for GAN models that operate on the semantics of the image (GAN stands for Generative Adversarial Networks — a machine learning technique that builds on ideas from game theory to generate outputs that match a desired criteria). Since we had the additional constraint of having two independent distributions (meaning, we didn’t have pairs of photo-portrait to train on, but rather two separate collections), we experimented with different members of the broader Cycle-GAN family. We experimented with different options and modifications, and eventually landed on a slightly modified version of MUNIT (10 epochs * 100k iterations).

由于我们不想应用“简单”过滤器，因此我们决定不采用样式转换方法。我们也不想使用机器学习模型来查看源肖像和目标合成照片之间的像素级相似性。因此，我们寻找了基于图像语义的GAN模型(GAN表示“生成对抗网络”(Generative Adversarial Networks)，这是一种基于博弈论思想的机器学习技术，可以生成符合所需标准的输出)。由于我们还有两个独立的分布(也就是说，我们没有成对的照片肖像来训练，而是有两个单独的集合)，因此我们尝试了更大的Cycle-GAN系列的不同成员。我们尝试了不同的选项和修改，并最终使用了经过稍微修改的MUNIT版本(10个历元* 10万次迭代)。

鼓卷：结果和总结 (Drum roll: results and concluding remarks)

Here are the results! These are 3 palettes, each containing 24 pairs (4 in each row x 6 rows). Each pair is made of the original portrait on the left, and the synthesised photo on the right.

结果如下！这是3个调色板，每个调色板包含24对(每行4个x 6行)。每对都是由左侧的原始肖像和右侧的合成照片组成。

Some photos are surprisingly good, and some are awfully bad. One thing is clear: the AI is a victim of the bias we imposed on it through our “celebrity” photo training set. In its universe of photos, people are 20–30 years old, looking straight at the camera, smiling, with perfect skin, and straight hair. Too old or too young, facial hair, curly hair, or slightly unexpected angles — don’t pan out well. Still, it is thought provoking to consider this has been done with no context or explanations.

有些照片出奇的好，而有些则糟透了。一件事很明显：人工智能是我们通过“名人”照片训练集强加给它的偏见的受害者。在照片中，人们都是20–30岁，直视着相机，微笑着，有着完美的皮肤和直发。年龄太大或太小，面部毛发，卷发或稍有意外的角度-效果不佳。仍然认为，在没有上下文或解释的情况下认为这样做是令人发指的。

Does the AI understand art? Ours definitely doesn’t. Definitely not in the way humans do. However, what it produced is definitely interesting and encouraging. Some of the results seem to point at fundamental gaps in understanding, yet others are surprisingly good and exciting. Moreover, as we experimented with the project, we floated many ideas on how to make it even better. The key point though is whether this is a worthwhile journey. Can we learn new insights about our world, and about our perception of it through this dialogue — by trying to teach art to the AI? We plan to continue and explore this question.

AI懂艺术吗？我们的绝对不是。绝对不像人类那样。但是，它产生的结果绝对是有趣且令人鼓舞的。一些结果似乎指出了理解上的根本差距，而另一些却出奇的好和令人兴奋。此外，当我们对该项目进行实验时，我们就如何使其变得更好提出了许多想法。但是关键是这是否值得一游。通过尝试向AI传授美术知识，我们是否可以通过此次对话学习有关我们的世界以及对世界的新见解？我们计划继续并探讨这个问题。

翻译自: https://towardsdatascience.com/pareidolia-teaching-art-to-ai-d78889406bd1