gan神经网络_神经联觉：当艺术遇见GAN

最新推荐文章于 2025-04-01 09:47:31 发布

weixin_26752765

最新推荐文章于 2025-04-01 09:47:31 发布

阅读量950

点赞数

文章标签：神经网络

原文链接：https://medium.com/datadriveninvestor/neural-synesthesia-when-art-meets-gans-6453c7c0c5b8

版权

gan神经网络

Neural Synesthesia is an AI art project that aims to create new and unique audiovisual experiences with artificial intelligence. It does this through collaborations between humans and generative networks. The results feel almost like organic art. Swirls of color and images blend together as faces, scenery, objects, and architecture transform to music. There’s a sense of things swinging between feeling unique and at the same time oddly familiar.

神经联觉是一个AI艺术项目，旨在通过人工智能创造新的独特视听体验。它通过人类与生成网络之间的协作来做到这一点。结果几乎就像是有机艺术。当面Kong，风景，物体和建筑转变成音乐时，色彩和图像的漩涡融合在一起。在感觉独特和陌生熟悉之间，事物之间会摇摆不定。

Neural Synesthesia was created by Xander Steenbrugge, an online content creator who made his start in data science while working on brain-computer interfaces. During his master thesis, he helped build a system that classified imagined movement through brain signals. This system allowed patients suffering from Locked-in syndrome to manipulate physical objects with their minds. The experience impressed upon Steenbrugge the importance of machine learning, and the potential for AI technology to build amazing things.

神经联觉是由在线内容创建者Xander Steenbrugge创建的，他在研究脑机接口的同时就开始了数据科学的研究。在硕士论文期间，他帮助建立了一个通过大脑信号对想象的运动进行分类的系统。该系统使患有锁定症候群的患者能够用自己的思想操纵身体。这次经历使Steenbrugge深刻地意识到了机器学习的重要性以及AI技术构建惊人事物的潜力。

Outside of Neural Synesthesia, Steenbrugge works with a startup using machine learning for drug discovery and runs a popular YouTube channel. He’s also working on wzrd.ai, a platform that augments audio with immersive video through the work of AI. In this interview, we talk about Neural Synesthesia’s inspiration, how it works, and discuss AI and creativity.

在神经通感之外，Steenbrugge与一家使用机器学习进行药物发现的初创公司合作，并经营着一个受欢迎的YouTube频道。他还在wzrd.ai上工作，该平台通过AI的工作通过沉浸式视频增强音频。在这次采访中，我们讨论了神经联觉的灵感，它的工作原理，并讨论了AI和创造力。

神经联觉的灵感是什么？ (What were the inspirations for Neural Synesthesia?)

I’ve always had a fascination for aesthetics. Examples are mountain panoramas, indie game design, scuba diving in coral reefs, psychedelic experiences, and films by Tarkovsky. Beautiful visual scenes have the power to convey meaning without words. It’s almost like a primal, visual language we all speak intuitively.

我一直对美学着迷。例如山峰全景，独立游戏设计，在珊瑚礁中进行水肺潜水，迷幻体验以及塔可夫斯基的电影。美丽的视觉场景可以传达无言的意义。几乎就像我们都直觉地说的原始视觉语言一样。

When I saw the impressive advances in generative models (especially GANs), I started imagining where this could lead. Just like the camera and the projector brought about the film industry, I wondered what narratives could be built on top of the deep learning revolution. To get hands on with this, my first idea was to simply tweak the existing codebases for GANs to allow for direct visualization of audio. This was how Neural Synesthesia was born.

当我看到生成模型(尤其是GAN)令人印象深刻的进步时，我开始想象这可能会导致什么。就像照相机和放映机带动了电影业一样，我想知道在深度学习革命的基础上可以建立什么样的叙述。为此，我的第一个想法是简单地调整GAN的现有代码库，以实现音频的直接可视化。这就是神经联觉的诞生方式。

您为第一个神经联觉疗法做了多少工作？您面临任何独特的挑战吗？ (How much work did you do for the first Neural Synesthesia piece? Did you face any unique challenges?)

I think coding for the first rendered video took over six months because I was doing it in my spare time. The biggest challenge was how to manipulate the GANs latent input space using features extracted from the audio track. I wanted to create a satisfying match between visual and auditory perception for viewers.

我认为为第一个渲染视频编码需要花费六个多月的时间，因为我在业余时间进行编码。最大的挑战是如何使用从音轨中提取的特征来操纵GANs潜在输入空间。我想为观众在视觉和听觉感知之间创造令人满意的匹配。

Here’s a little insight into what I do: I apply a Fourier Transform to extract time varying frequency components from the audio. I also perform harmonic/percussive decomposition, which basically separates the melody from the rhythmic components of the track. These three signals (instantaneous frequency content, melodic energy, and beats) are then combined to manipulate the GANs latent space, resulting in visuals that are directly controlled by the audio.

以下是我的操作的一些见解：我应用了傅里叶变换从音频中提取时变频率分量。我还执行谐音/打击乐分解，从本质上将旋律与曲目的节奏成分分开。然后将这三个信号(瞬时频率含量，旋律能量和节拍)组合起来，以操纵GAN的潜在空间，从而产生由音频直接控制的视觉效果。

每个图像数据集是否唯一？您如何收集这些数据集的图像，以及需要多少图像？ (Is every image dataset unique? How do you collect images for these datasets, and how many images do you need?)

I spent a lot of time collecting large and diverse image training data to create interesting generative models. These datasets have aesthetics as their primary goal rather than realism, like most GANs. Experimenting with various blends of image collections is time consuming, since GAN training requires lots of compute and I don’t exactly have a data center at my disposal.

我花了大量时间收集大量多样的图像训练数据，以创建有趣的生成模型。像大多数GAN一样，这些数据集以美学为主要目标，而不是现实主义。由于GAN训练需要大量计算，而且我没有一个完全可用的数据中心，因此尝试各种混合的图像收集非常耗时。

Most of the datasets I use are image sets I’ve encountered over the years. I saved them because I knew one day I’d have a use for them. I’ve always had an interest in aesthetics so when I discover something that triggers that sixth sense, I save it.

我使用的大多数数据集都是我多年来遇到的图像集。我保存了它们，是因为我知道有一天我会用到它们。我对美学一直很感兴趣，因此当我发现触发第六感的东西时，我就保存下来。

Most GAN papers use datasets of more than 50,000 images, but in practice you can get away with fewer examples. The first step is to start from a pre-trained GAN model that has already been trained on a large dataset. This means the convolutional filters in the model are already well-shaped and contain useful information about the visual world. Secondly, there’s data augmentation, which is basically flipping or rotating an image to effectively increase the amount of training data. Since I don’t really care about sample realism, I can actually afford to do very aggressive image augmentation. This results in many more training images than actual source images. For example, the model I used for a recent performance at Tate Modern had only 3,000 real images, aggressively augmented to a training set of around 70,000.

大多数GAN论文使用的数据集超过50,000张图像，但实际上，您可以减少实例的数量。第一步是从已经在大型数据集上进行训练的预训练GAN模型开始。这意味着模型中的卷积滤波器已经成形，并且包含有关视觉世界的有用信息。其次，存在数据增强，它基本上是翻转或旋转图像以有效地增加训练数据量。由于我并不真正关心样本现实性，因此我实际上可以负担得起非常积极的图像增强。这导致训练图像比实际源图像多得多。例如，我在泰特现代美术馆(Tate Modern)最近演出时使用的模型只有3,000张真实图像，积极地扩充到约70,000张训练集。

Recently, a lot of new research explicitly addresses the low-data regime for GANs (such as what you can find here, here, and here). My current codebase leverages these techniques to train GANs with as little as a few hundred images.

最近，许多新的研究明确地解决了GAN的低数据机制(例如，您可以在此处，此处和此处找到的内容 )。我当前的代码库利用这些技术来训练仅用几百个图像的GAN。

您将神经通感说成是您和AI之间的协作。您对利用AI技术的创意项目的未来有什么样的潜力？ (You talk about Neural Synesthesia as a collaboration between yourself and AI. What kind of potential do you see for the future of creative projects utilizing AI technology?)

This is actually the most interesting part of the entire project. I usually set out with specific intentions as to what type of visual I want to create. I then curate my dataset, tune the parameters of the training script, and start training the model. A full training run usually requires a few days to converge. Very quickly though, the model starts returning samples that are often unexpected and surprising. This sets an intriguing feedback loop into motion, where I change the code of the model, the model responds with different samples, I react, and it goes on. The creative process is no longer fully under my control; I am effectively collaborating with an AI system to create these works.

这实际上是整个项目中最有趣的部分。对于要创建哪种视觉效果，我通常会有明确的意图。然后，我整理数据集，调整训练脚本的参数，然后开始训练模型。一次完整的训练通常需要几天才能收敛。不过，模型很快就会开始返回通常出乎意料且令人惊讶的样本。这将一个引人入胜的反馈循环置于运动中，在该过程中，我更改了模型的代码，模型对不同的样本做出响应，我做出了React，然后继续进行。创作过程不再完全由我控制。我正在与AI系统有效地合作来创作这些作品。

I truly believe this is the biggest strength of this approach: you are not limited by your own imagination. There’s an entirely alien system that is also influencing the same space of ideas, often in unexpected and interesting ways. This leads you as a creator into areas you never would have wandered by yourself.

我真正相信这是此方法的最大优势：您不受自己的想象力限制。有一个完全陌生的系统也经常以意想不到的有趣的方式影响着相同的思想空间。这将使您作为创作者进入您自己永远不会徘徊的领域。

Looking at the tremendous pace of progress in the field of AI strongly motivates me to imagine what might be possible 10 years from now. After all, modern Deep Learning is only 8 years old! I expect that Moore’s law will continue to bring more powerful computing capabilities, that AI models will continue to scale with more compute, and that the possibilities of this medium will follow this exponential trend.

纵观AI领域的巨大进步，我很想像一下十年后可能发生的事情。毕竟，现代深度学习只有8年的历史了！我希望摩尔定律将继续带来更强大的计算功能，人工智能模型将随着更多的计算而继续扩展，并且这种媒介的可能性将遵循这一指数趋势。

Neural Synesthesia in its current form is a prototype. It’s a version 0.1 of a grander idea to leverage deep learning as the core component of the advanced interactive media experiences of the future.

当前形式的神经联觉是一个原型。这是一个伟大的想法的版本0.1，它将深度学习作为未来高级交互式媒体体验的核心组成部分。

您为神经联觉的未来计划了什么样的创意作品？您有什么目标或未来计划吗？ (What kind of creative works do you have planned for the future of Neural Synesthesia? Do you have any goals or future plans?)

I’ve always been fascinated by the overview effect, where astronauts describe how seeing the Earth in its entirety from space profoundly changes their worldview, kindling the awareness that we are all part of the same, fragile ecosystem, suspended in the blackness of space.

我一直着迷于概述效应，在那儿，宇航员描述了如何从太空看到地球的整体而深刻地改变了他们的世界观，激发了我们都属于一个脆弱的生态系统，悬浮在黑暗中的意识。

To me, this is great evidence that profound, alienating experiences can have spectacular effects on people’s choices and behaviors. And what we need is a shift in perception away from tribal feelings of us versus them. We need to move towards a global society with common goals and common challenges.

对我而言，这是充分的证据，表明深刻而疏远的经历会对人们的选择和行为产生巨大影响。而我们需要的是将观念从我们与他们的部落感觉转移开来。我们需要朝着具有共同目标和共同挑战的全球社会迈进。

Our world is increasingly facing global issues that are deeply rooted in our locally-centered world views. These views are deeply rooted in our genes; we evolved in small tribes that only needed to attend to their local environments. However, the world is evolving towards a globally connected web of events, where the present can no longer be disconnected from the system as a whole. For example, look at climate change, and people fighting over artificially drawn borders of nationality, race, or even gender.

我们的世界正日益面临根深蒂固于我们以当地为中心的世界观的全球问题。这些观点深深植根于我们的基因。我们演变成只需要照顾当地环境的小部落。但是，世界正在朝着全球连接的事件网络演进，在这里，现在的事物不再与整个系统脱节。例如，看一下气候变化，人们在人为地划定国籍，种族甚至性别的边界上进行斗争。

As such, my long-term vision is to create rich, immersive experiences with the power to shift perspectives. Cinema 2.0, if you will. I imagine an interactive experience, where a group of people can enter an AI-generated world (e.g. using Virtual Reality headsets) where the visual scenery is so utterly alien and breathtaking that it forces the mind to temporarily halt its usual narrative of describing what’s going on. This is essentially the goal of meditation: to experience the world as it is, emphasizing the experience of the present moment rather than the narrative we construct around it.

因此，我的长期愿景是创造丰富的，身临其境的体验，并具有改变观点的能力。如果可以的话，Cinema 2.0。我想象一种交互式的体验，一群人可以进入AI生成的世界(例如使用虚拟现实耳机)，视觉风景是如此的陌生和令人惊叹，以至于迫使人们暂时停止通常的叙事方式来描述正在发生的事情上。本质上，这是冥想的目标：按原样体验世界，强调当前时刻的体验，而不是我们围绕其构建的叙事。

The goal then, is to mimic the perceptual shift one can experience from a positive psychedelic experience, meditative insight, or a trip to space. To realize that our ‘normal’ world view is just a tiny sliver of what it is possible to experience. I believe this perceptual shift is probably the most unique human characteristic. It allows the great wonder of imagination to power our world, and is the most powerful tool we have to tackle the world’s largest challenges.

然后的目标是模仿人们从积极的迷幻经历，冥想见解或太空旅行中可以经历的感知转变。意识到我们的“正常”世界观只是可能体验到的一小部分。我相信这种知觉转变可能是人类最独特的特征。它使想象力成为世界的强大动力，也是我们应对世界上最大挑战的最有力工具。

从技术角度来看，我们离创建这些基本的“ cinema 2.0”体验还有多远？ (From a technology standpoint, how far away are we from creating these basic “cinema 2.0” experiences?)

I would say that from a technical point of view, we’re getting very close. The latest Generative models (e.g. StyleGANv2 or BigGanDeep) are able to create very realistic samples and allow for very high diversity. What is lacking at present are creative tools that let non-coders use this technology to get creative. The main challenge, at least for me, is to create a compelling narrative.

我要说的是，从技术角度来看，我们已经很接近了。最新的Generative模型(例如StyleGANv2或BigGanDeep)能够创建非常逼真的样本并具有很高的多样性。当前缺乏使非编码者使用该技术进行创作的创作工具。至少对我而言，主要的挑战是创造引人注目的叙事。

You can see more of Steenbrugge’s Neural Synesthesia work at its dedicated homepage, and try out wzrd.ai here. He’s also active on YouTube and Twitter, and open to collaborating with other creatives who have similar ideas and aspirations. You can contact him at neuralsynesthesia@gmail.com.

您可以在其专用主页上查看Steenbrugge的神经通感的更多作品，并在此处尝试wzrd.ai。他还活跃于YouTube和Twitter上，并愿意与具有类似想法和抱负的其他创意者进行合作。您可以通过Neurosynesthesia@gmail.com与他联系。

Original article reposted with permission.

原始文章经许可重新发布。

翻译自: https://medium.com/datadriveninvestor/neural-synesthesia-when-art-meets-gans-6453c7c0c5b8

gan神经网络

gan神经网络_神经联觉：当艺术遇见GAN

神经联觉的灵感是什么？ (What were the inspirations for Neural Synesthesia?)

您为第一个神经联觉疗法做了多少工作？ 您面临任何独特的挑战吗？ (How much work did you do for the first Neural Synesthesia piece? Did you face any unique challenges?)

每个图像数据集是否唯一？ 您如何收集这些数据集的图像，以及需要多少图像？ (Is every image dataset unique? How do you collect images for these datasets, and how many images do you need?)

您将神经通感说成是您和AI之间的协作。 您对利用AI技术的创意项目的未来有什么样的潜力？ (You talk about Neural Synesthesia as a collaboration between yourself and AI. What kind of potential do you see for the future of creative projects utilizing AI technology?)

您为神经联觉的未来计划了什么样的创意作品？ 您有什么目标或未来计划吗？ (What kind of creative works do you have planned for the future of Neural Synesthesia? Do you have any goals or future plans?)

从技术角度来看，我们离创建这些基本的“ cinema 2.0”体验还有多远？ (From a technology standpoint, how far away are we from creating these basic “cinema 2.0” experiences?)

您为第一个神经联觉疗法做了多少工作？您面临任何独特的挑战吗？ (How much work did you do for the first Neural Synesthesia piece? Did you face any unique challenges?)

每个图像数据集是否唯一？您如何收集这些数据集的图像，以及需要多少图像？ (Is every image dataset unique? How do you collect images for these datasets, and how many images do you need?)

您将神经通感说成是您和AI之间的协作。您对利用AI技术的创意项目的未来有什么样的潜力？ (You talk about Neural Synesthesia as a collaboration between yourself and AI. What kind of potential do you see for the future of creative projects utilizing AI technology?)

您为神经联觉的未来计划了什么样的创意作品？您有什么目标或未来计划吗？ (What kind of creative works do you have planned for the future of Neural Synesthesia? Do you have any goals or future plans?)