ai人工智能使用的软件_MachineRay：使用AI创造抽象艺术-CSDN博客

ai人工智能使用的软件

For the past three months, I have been exploring the latest techniques in Artificial Intelligence (AI) and Machine Learning (ML) to create abstract art. During my investigation, I learned that three things are needed to create abstract paintings: (A) source images, (B) an ML model, and (C) a lot of time to train the model on a high-end GPU. Before I discuss my work, let’s take a look at some prior research.

在过去的三个月中，我一直在探索人工智能(AI)和机器学习(ML)的最新技术来创作抽象艺术。在调查过程中，我了解到创建抽象绘画需要三件事：(A)源图像，(B)ML模型，以及(C)很多时间在高端GPU上训练模型。在讨论我的工作之前，让我们看一下以前的研究。

背景 (Background)

人工神经网络 (Artificial Neural Networks)

Warren McCulloch and Walter Pitts created a computational model for Neural Networks (NNs) back in 1943[1]. Their work led to research of both the biological processing in brains and the use of NNs for AI. Richard Nagyfi discusses the differences between Artificial Neural Networks (ANNs) and biological brains in this post. He describes an apt analogy that I will summarize here: ANNs are to brains as planes are to birds. Although the development of these technologies was inspired by biology, the actual implementations are very different!

沃伦·麦卡洛克(Warren McCulloch)和沃尔特·皮茨(Walter Pitts)早在1943年就为神经网络(NN)建立了计算模型[1]。他们的工作导致研究大脑的生物加工以及将NN用于AI。 Richard Nagyfi在这篇文章中讨论了人工神经网络(ANN)与生物大脑之间的差异。他描述了一个恰当的类比，我将在这里总结一下： 人工神经网络是大脑的大脑，而飞机是鸟类的大脑 。尽管这些技术的发展是受生物学启发的，但实际实现却大不相同！

Image for post — **Visual Analogy** Neural Network chip artwork by mikemacmarketin CC BY 2.0, Brain model by biologycorner CC BY-NC 2.0, Plane photo by Moto@Club4AG CC BY 2.0, Bird photo by ksblack99 CC PDM 1.0 **视觉类比**神经网络芯片图稿，由mikemacmarketin CC BY 2.0提供，大脑模型由biologycorner CC BY-NC 2.0提供，平面照片由Moto @ Club4AG CC BY 2.0提供，鸟类照片由ksblack99 CC PDM 1.0提供

Both ANNs and biological brains learn from external stimuli to understand things and predict outcomes. One of the key differences is that ANNs work with floating-point numbers and not just binary firing of neurons. With ANNs it’s numbers in and numbers out.

人工神经网络和生物大脑都从外部刺激中学习以了解事物并预测结果。关键区别之一是ANN可以使用浮点数，而不仅仅是神经元的二进制触发。 对于人工神经网络，它是数字输入和数字输出。

The diagram below shows the structure of a typical ANN. The inputs on the left are the numerical values that contain the incoming stimuli. The input layer is connected to one or more hidden layers that contain the memory of prior learning. The output layer, in this case just one number, is connected to each of the nodes in the hidden layer.

下图显示了典型ANN的结构。左侧的输入是包含传入刺激的数值。输入层连接到包含先前学习记忆的一个或多个隐藏层。输出层(在这种情况下只有一个数字)连接到隐藏层中的每个节点。

Each of the internal arrows represents numerical weights that are used as multipliers to modify the numbers in the layers as they get processed in the network from left to right. The system is trained with a dataset of input values and expected output values. The weights are initially set to random values. For the training process, the system runs through the training set multiple times, adjusting the weights to achieve the expected outputs. Eventually, the system will not only predict the outputs correctly from the training set, but it will also be able to predict outputs for unseen input values. This is the essence of Machine Learning (ML). The intelligence is in the weights. A more detailed discussion of the training process for ANNs can be found in Conor McDonald’s post, here.

每个内部箭头代表数字权重，当它们在网络中从左到右进行处理时，它们用作乘数来修改图层中的数字。使用输入值和预期输出值的数据集训练系统。权重最初设置为随机值。对于训练过程，系统会多次运行训练集，并调整权重以实现预期的输出。最终，该系统不仅可以从训练集中正确预测输出，而且还可以为看不见的输入值预测输出。这是机器学习(ML)的本质。 才智在于权重 。培训过程中的人工神经网络进行更详细的讨论可以在康纳尔麦当劳后发现，这里。

生成对抗网络 (Generative Adversarial Networks)

In 2014, Ian Goodfellow and seven coauthors at the Université de Montréal presented a paper on Generative Adversarial Networks (GANs)[2]. They came up with a way to train two ANNs that effectively compete with each other to create content like photos, songs, prose, and yes, paintings. The first ANN is called the Generator and the second is called the Discriminator. The Generator is trying to create realistic output, in this case, a color painting. The Discriminator is trying to discern real paintings from the training set as opposed to fake paintings from the generator. Here’s what a GAN architecture looks like.

在2014年，伊恩古德费洛和蒙特利尔大学合作者7呈现的纸张上剖成对抗性网络(甘斯)[2]。 他们想出了一种训练两个人工神经网络的方法，这些人工神经网络可以有效地竞争以创建照片，歌曲，散文和绘画等内容。 第一个ANN被称为生成器，第二个被称为鉴别器。生成器正在尝试创建逼真的输出，在这种情况下为彩色绘画。鉴别器试图从训练集中识别真实的绘画，而不是生成器中的假绘画。这是GAN架构的样子。

A series of random noise is fed into the Generator, which then uses its trained weights to generate the resultant output, in this case, a color image. The Discriminator is trained by alternating between processing real paintings, with an expected output of 1 and fake paintings, with an expected output of -1. After each painting is sent to the Discriminator, it sends back detailed feedback about why the painting is not real, and the Generator adjusts its weights with this new knowledge to try and do better the next time. The two networks in the GAN are effectively trained together in an adversarial fashion. The Generator gets better at trying to pass off a fake image as real, and the Discriminator gets better at determining which input is real, and which is fake. Eventually, the Generator gets pretty good at generating realistic-looking images. You can read more about GANs, and the math they use, in Shweta Goyal’s post here.

一系列随机噪声被馈入生成器，生成器随后使用其受过训练的权重生成结果输出，在这种情况下为彩色图像。鉴别器是通过在处理实际输出(预期输出为1)和伪造图形(预期输出为-1)之间进行交替训练的。在将每幅绘画发送给鉴别器之后，它会发送回回详细信息，说明为什么绘画不是真实的，并且生成器会使用这一新知识来调整其权重，以在下一次尝试做得更好。 GAN中的两个网络以对抗的方式有效地一起训练 。生成器会更好地尝试假冒真实的图像，而鉴别器会更好地确定哪个输入是真实的，哪个是假的。最终，生成器非常擅长生成逼真的图像。您可以在Shweta Goyal 在这里的帖子中阅读有关GAN及其使用的数学的更多信息。

针对大图像的改进GAN (Improved GANs for Large Images)

Although the basic GAN described above works well with small images (i.e. 64x64 pixels), there are issues with larger images (i.e. 1024x1024 pixels). The basic GAN architecture has difficulty converging on good results for large images due to the unstructured nature of the pixels. It can’t see the forest from the trees. Researchers at NVIDIA developed a series of improved methods that allow for the training of GANs with larger images. The first is called “Progressive Growing of GANs” [3].

尽管上述基本GAN适用于较小的图像(即64x64像素)，但较大的图像(即1024x1024像素)仍存在问题。由于像素的非结构化性质，基本的GAN架构很难在大图像上收敛良好的结果。从树上看不到森林。 NVIDIA研究人员开发了一系列改进的方法，可用于训练具有较大图像的GAN。第一个称为“ GAN的渐进式增长 ” [3]。

The key idea is to grow both the generator and discriminator progressively: starting from a low resolution, we add new layers that model increasingly fine details as training progresses. This both speeds the training up and greatly stabilizes it, allowing us to produce images of unprecedented quality. — Tero Karras et. al., NVIDIA

关键思想是逐步增加生成器和鉴别器：从低分辨率开始，我们添加新层，以随着训练的进行对越来越细的细节建模。这既加快了训练速度，又大大稳定了训练速度，使我们能够产生前所未有的图像质量。 — Tero Karras等。等，NVIDIA

The team at NVIDIA continued their work on using GANs to generate large, realistic images, naming their architecture StyleGAN [4]. They started with their Progressive Growing of GANs as a base model and added a Style Mapping Network, which injects style information at various resolutions into the Generator Network.

NVIDIA的团队继续使用GAN来生成大型逼真的图像的工作，并为其架构命名为StyleGAN [4]。他们从GAN的渐进式增长作为基本模型开始，并添加了样式映射网络，该网络将各种分辨率的样式信息注入到生成器网络中。

The team further improved the image creation results with StyleGAN2, allowing the GAN to efficiently create high-quality images with fewer unwanted artifacts [5]. You can read more about these developments in Akria’s post, “From GAN basic to StyleGAN2”.

该团队使用StyleGAN 2进一步改善了图像创建结果，从而使GAN可以有效创建高质量图像，并减少不必要的假象[5]。您可以在Akria的帖子“ 从GAN基础到StyleGAN2 ”中了解有关这些开发的更多信息。

使用GAN创作艺术的先前工作 (Prior Work to Create Art with GANs)

Researchers have been looking to use GANs to create art since the GAN was introduced in 2014. A description of a system called ArtGAN was published in 2017 by Wei Ren Tan et. al. from Shinshu University, Nagano, Japan [6]. Their paper proposes to extend GANs…

自GAN于2014年问世以来，研究人员一直在寻求使用GAN来创作艺术。WeiTan Tan等人于2017年发表了有关名为ArtGAN的系统的描述。等来自日本长野县信州大学[6]。他们的论文提出扩展GAN。

… to synthetically generate more challenging and complex images such as artwork that have abstract characteristics. This is in contrast to most of the current solutions that focused on generating natural images such as room interiors, birds, flowers and faces. — Wei Ren Tan et. al., Shinshu University

…以综合方式生成更具挑战性和更复杂的图像，例如具有抽象特征的艺术品。这与当前大多数专注于生成自然图像(例如室内，鸟，花和脸)的解决方案形成鲜明对比。 - 韦仁谭等。信州大学

A broader survey of using GANs to create art was conducted by Drew Flaherty for his Masters Thesis at the Queensland University of Technology in Brisbane, Australia [7]. He experimented with various GANs including basic GANs, CycleGAN [8], BigGAN [9], Pix2Pix, and StyleGAN. Of everything he tried, he liked StyleGAN the best.

Drew Flaherty在澳大利亚布里斯班的昆士兰科技大学为其硕士论文进行了更广泛的使用GAN创作艺术的调查[7]。他尝试了各种GAN，包括基本GAN， CycleGAN [8]， BigGAN [9]， Pix2Pix和StyleGAN。在他尝试过的所有产品中，他最喜欢StyleGAN。

The best visual result from the research came from StyleGAN. … Visual quality of the outputs were relatively high considering the model was only partially trained, with progressive improvements from earlier iterations showing more defined lines, textures and forms, sharper detail, and more developed compositions overall. — Drew Flaherty, Queensland University of Technology

研究获得的最佳视觉效果来自StyleGAN。 …考虑到该模型只是部分训练的结果，输出的视觉质量相对较高，较早的迭代进行了逐步改进，显示出更清晰的线条，纹理和形式，更清晰的细节以及整体上更发达的合成。 —昆士兰科技大学的Drew Flaherty

For his experiments, Flaherty used a large library of artwork gleaned from various sources, including WikiArt.org, the Google Arts Project, Saatchi Art, and Tumblr blogs. He noted that not all of the source images are in the public domain, but he discusses the doctrine of fair use and its implications on ML and AI.

对于他的实验，弗莱厄蒂(Flaherty)使用了一个庞大的艺术品库，这些艺术品库来自各种来源，包括WikiArt.org ， Google Arts Project ， Saatchi Art和Tumblr博客。他指出，并非所有源图像都属于公共领域，但他讨论了合理使用的原则及其对ML和AI的影响。

机器雷 (MachineRay)

总览 (Overview)

For my experiment, named MachineRay, I gathered images of abstract paintings from WikiArt.org, processed them, and fed them into StyleGAN2 at the size of 1024x1024. I trained the GAN for three weeks on a GPU using Google Colab. I then processed the output images by adjusting the aspect ratio and running them through another ANN for a super-resolution resize. The resultant images are 4096 pixels wide or tall, depending on the aspect ratio. Here’s a diagram of the components.

对于名为MachineRay的实验，我从WikiArt.org收集了抽象绘画的图像，对其进行了处理，然后将它们以1024x1024的尺寸输入到StyleGAN2中。我使用Google Colab在GPU上训练了GAN三个星期。然后，我通过调整纵横比并通过另一个ANN运行它们以进行超分辨率调整来处理输出图像。最终的图像宽高为4096像素，具体取决于宽高比。这是组件图。

收集源图像 (Gathering Source Images)

To gather the source images, I wrote a Python script to scrape abstract paintings from WikiArt.org. Note that I filtered the images to only get paintings that were labeled in the “Abstract” genre, and only images that are labeled as being in the Public Domain. These include images that were published before 1925 or images that were created by artists who died before 1950. The top artists represented in the set are Wassily Kandinsky, Theo van Doesburg, Paul Klee, Kazimir Malevich, Janos Mattis-Teutsch, Giacomo Balla, and Piet Mondrian. A snippet of the Python code is below, and the full source file is here.

为了收集源图像，我编写了一个Python脚本来从WikiArt.org刮取抽象画。请注意，我对图像进行了过滤，以仅获取标记为“抽象”类型的绘画，并且仅获取标记为处于“公共领域”的图像。这些图片包括1925年之前发布的图像或1950年之前去世的艺术家创作的图像。集合中代表的顶级艺术家是Wassily Kandinsky，Theo vanDoesburg，Paul Klee，Kazimir Malevich，Janos Mattis-Teutsch，Giacomo Balla和皮特·蒙德里安(Piet Mondrian)。下面是Python代码的片段，完整的源文件在这里。

I gathered about 900 images, but I removed images that had representational components or ones that were too small, cutting the number down to 850. Here is a random sampling of the source images.

我收集了约900张图像，但删除了具有代表性成分或过小的图像，将数量减少到850。这是对源图像的随机采样。

卸下框架 (Removing Frames)

As you can see above, some of the paintings retain their wooden frames in the images, but some of them have the frames cropped out. For example, you can see the frame in Arthur Dove’s Storm Clouds. To make the source images consistent, and to allow the GAN to focus on the content of the paintings, I automatically removed the frames using a Python script. A snippet is below, and the full script is here.

如您在上方看到的，有些画在图像中保留了木制框架，但有些画却被裁剪掉了。例如，您可以在Arthur Dove的Storm Clouds中看到框架。为了使源图像一致，并使GAN专注于绘画的内容，我使用Python脚本自动删除了框架。下面是一个代码段，完整的脚本在这里。

The code opens each image and looks for square regions around the edges that have a different color from most of the painting. Once the edges are found, the image is cropped to omit the frame. Here are some pictures of source paintings before and after the frame removal.

代码打开每个图像，并在边缘周围寻找与大多数绘画颜色不同的正方形区域。一旦找到边缘，便会裁剪图像以忽略帧。这是移除画框之前和之后的源画的一些图片。

图像增强 (Image Augmentation)

Although 850 images may seem like a lot, it’s not really enough to properly train a GAN. If there isn’t enough variety of images, the GAN may overfit the model which will yield poor results, or, worse yet, fall into the dreaded state of “model collapse”, which will yield nearly identical images.

尽管850张图像可能看起来很多，但不足以正确训练GAN。如果没有足够多的图像，则GAN可能会过度拟合模型，从而产生较差的结果，或者更糟的是，陷入可怕的“模型崩溃”状态，这将产生几乎相同的图像。

StyleGAN2 has a built-in feature to randomly mirror the source images left-to-right. So this will effectively double the number of sample images to 1,700. This is better, but still not great. I used a technique called Image Augmentation to increase the number of images by a factor of 7, making it 11,900 images. Below is a code snippet for the Image Augmentation I used. The full source file is here.

StyleGAN2具有内置功能，可以从左到右随机镜像源图像。因此，这将有效地将样本图像的数量增加一倍，达到1,700。这是更好的，但仍然不是很好。我使用一种称为“图像增强”的技术将图像数量增加了7倍，使其达到11,900张图像。以下是我使用的图像增强的代码段。完整的源文件在这里。

The augmentation uses random rotation, scaling, cropping, and mild color correction to create more variety in the image samples. Note that I resize the images to 1024 by 1024 before applying the Image Augmentation. I will discuss the aspect ratio further down in this post. Here are some examples of Image Augmentation. The original is on the left, and there are six additional variations to the right.

增强使用随机旋转，缩放，裁切和轻微的色彩校正来在图像样本中创建更多种类。请注意，在应用“图像增强”之前，我将图像大小调整为1024×1024。我将在本文中进一步讨论长宽比。这是图像增强的一些示例。原稿在左侧，右侧还有六个其他变体。

训练GAN (Training the GAN)

I ran the training using Google Colab Pro. Using that service I could run for up to 24 hours on a high-end GPU, an NVIDIA Tesla P10 with 16 GB of memory. I also used Google Drive to retain the work in progress between runs. It took about 13 days to train the GAN, sending 5 million source images through the system. Here is a random sample of the results.

我使用Google Colab Pro进行了培训。使用该服务，我可以在高端GPU(具有16 GB内存的NVIDIA Tesla P10)上运行长达24小时。我还使用Google云端硬盘保留了两次运行之间正在进行的工作。训练GAN大约花了13天，通过系统发送了500万张源图像。这是结果的随机样本。

You can see from the sample of 28 images above that MachineRay produced paintings in a variety of styles, although there are some visual commonalities between them. There are hints to the styles in the source images, but no exact copies.

您可以从上面的28张图像样本中看到，MachineRay产生了多种样式的绘画，尽管它们之间存在一些视觉上的共性。在源图像中有样式提示，但没有确切的副本。

调整宽高比 (Adjusting the Aspect Ratio)

Although the original source images had various aspect ratios, ranging from a thinner portrait shape to a wider landscape shape, I made them all dead square to help with the training of the GAN. In order to have a variety of aspect ratios for the output images, I imposed a new aspect ratio prior to the upscaling. Instead of just choosing a purely random aspect ratio, I created a function that chooses an aspect ratio that is based on the statistical distribution of aspect ratios in the source images. Here’s what the distribution looks like.

尽管原始的源图像具有各种长宽比，从较细的人像形状到较宽的风景形状，但我还是将它们全都变成死角以帮助GAN训练。为了使输出图像具有各种宽高比，我在放大之前设置了新的宽高比。我创建了一个函数，该函数根据源图像中纵横比的统计分布来选择纵横比，而不仅仅是选择纯粹的随机纵横比。这是分布的样子。

The graph above plots the aspect ratio of all 850 source images. It ranges from about 0.5, which is a thin 1:2 ratio to about 2.0, which is a wide 2:1 ratio. The chart shows four of the source images to indicate where they are on the chart horizontally. Here’s my Python code that maps a random number from 0 to 850 into an aspect ratio based on the distribution of the source images.

上图绘制了所有850个源图像的纵横比。它的范围是从约0.5(约1：2的比率)到约2.0(约2：1的比率)。该图表显示了四个源图像，以指示它们在水平位置上的位置。这是我的Python代码，它根据源图像的分布将0到850之间的随机数映射为宽高比。

I adjusted the MachineRay output from above to have varying aspect ratios in the pictures below. You can see that the images seem a bit more natural and less homogenous with just this small change.

我从上方调整了MachineRay输出，以使其在下图中具有不同的宽高比。您可以看到，仅进行了这一微小的更改，图像看起来就显得更自然，更不均匀。

超分辨率调整大小 (Super Resolution Resizing)

The images generated from MachineRay have a maximum height or width of 1024 pixels, which is OK for viewing on a computer, but not OK for printing. At 300 DPI it would only print at a size of about 3.5 inches. The images could be resized up, but it would look very soft if printed at 12 inches. There is a technique that uses ANNs to resize images that maintain crisp features called Image Super-Resolution (ISR). For more information on Super-Resolution check out Bharath Raj’s post here.

从MachineRay生成的图像最大高度或宽度为1024像素，可以在计算机上查看，但不能打印。在300 DPI时，它只能以大约3.5英寸的尺寸打印。可以调整图像的尺寸，但是如果以12英寸的尺寸打印，则图像看起来会非常柔和。有一种使用人工神经网络调整保持清晰特征的图像大小的技术称为图像超分辨率(ISR)。有关超级分辨率的更多信息，请在此处查看Bharath Raj的文章。

There is a nice open-source ISR system with pre-trained models available from Idealo, a German company. Their GANs model does a 4x resize using a GAN trained on photographs. I found that adding a little bit of random noise to the image prior to the ISR creates a painterly effect. Here is the Python code I used to post-process the images.

有一个很好的开源ISR系统，带有德国公司Idealo提供的预训练模型。他们的GAN模型使用训练有素的GAN进行4倍大小调整。我发现在ISR之前向图像添加一点随机噪声会产生绘画效果。这是我用来对图像进行后处理的Python代码。

You can see the results of adding noise and Image Super-Resolution resizing here. Note that the texture detail looks a bit like brushstrokes.

您可以在此处查看添加噪点和调整图像超分辨率的结果。请注意，纹理细节看起来有点像笔触。

Check out the gallery in Appendix A to see high-resolution output samples from MachineRay.

查看附录A中的图库，以查看MachineRay的高分辨率输出样本。

下一步 (Next Steps)

Additional work might include running the GAN at sizes greater than 1024x1024. Porting the code to run on Tensor Processing Units (TPUs) instead of GPUs would make the training run faster. Also, the ISR GAN from Idealo could be trained using paintings instead of photos. This may add a more realistic painterly effect to the images.

其他工作可能包括以大于1024x1024的大小运行GAN。移植代码以在Tensor处理单元(TPU)而不是GPU上运行将使训练运行得更快。此外，Idealo的ISR GAN可以使用绘画代替照片来进行训练。这可以为图像添加更逼真的绘画效果。

致谢 (Acknowledgments)

I would like to thank Jennifer Lim and Oliver Strimpel for their help and feedback on this project.

我要感谢Jennifer Lim和Oliver Strimpel对这个项目的帮助和反馈。

源代码 (Source Code)

All source code for this project is available on GitHub. A Google Colab for generating images is available here. The sources are released under the CC BY-NC-SA license.

该项目的所有源代码都可以在GitHub上找到。可在此处使用Google Colab生成图像。来源根据CC BY-NC-SA许可发布。