循环神经网络递归神经网络_了解递归神经网络中的注意力

最新推荐文章于 2024-08-02 18:29:29 发布

weixin_26752765

最新推荐文章于 2024-08-02 18:29:29 发布

阅读量363

点赞数

文章标签：神经网络 python leetcode 算法动态规划

原文链接：https://medium.com/dataseries/understanding-attention-in-recurrent-neural-networks-d8ae05f558f4

版权

本文深入探讨了循环神经网络（RNN）和递归神经网络，特别关注了RNN中的注意力机制。通过翻译自《Understanding Attention in Recurrent Neural Networks》的文章，解释了这种机制如何增强模型在处理序列数据时的表现。

摘要由CSDN通过智能技术生成

循环神经网络递归神经网络

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

我最近开始了一份有关AI教育的新时事通讯。 TheSequence是无BS(意味着没有炒作，没有新闻等)，它是专注于AI的新闻通讯，需要5分钟的阅读时间。目标是让您了解机器学习项目，研究论文和概念的最新动态。请通过以下订阅尝试一下：

In 2017, the Google Brain team published the uber-famous paper “Attention is all You Need” which started the transformers, pre-trained model revolution. Before that paper, Google had been exploring attention-based models for a few years. Today, I would like to revisit an earlier Google paper from 2016 that was the first paper I read about the attention subject.

2017年，Google Brain团队发表了著名的论文“ Attention is all You Need” ，该论文开始了变形金刚，预训练的模型革命。在此之前，谷歌已经探索基于注意力的模型已有几年了。今天，我想回顾一下2016年以来Google的早期论文，这是我阅读的有关注意力主题的第一篇论文。

Attention is a cognitive ability that we rely on all the time. Just trying to read this article is a complicated task from the neuroscientific standpoint. At this time you are probably bombarded with emails, news, notifications on our phone, the usual annoying coworker interrupting and other distractions that cause your brain to spin on many directions. In order to read this tiny article or perform many other cognitive tasks, you need to focus, you need attention.

注意是我们一直依赖的一种认知能力。从神经科学的角度来看，仅尝试阅读本文是一项复杂的任务。这时，您可能会受到电子邮件，新闻，我们手机上的通知的轰炸，平时烦人的同事打扰以及其他使您的大脑向多个方向旋转的干扰。为了阅读这篇小文章或执行许多其他认知任务，您需要集中精力，需要关注。

Attention is a cognitive skill that is pivotal to the formation of knowledge. However, the dynamics of attention have remained a mystery to neuroscientists for centuries and, just recently, that we have had major breakthroughs that help to explain how attention works. In the context of deep learning programs, building attention dynamics seems to be an obvious step in order to improve the knowledge of models and adapt them to different scenarios. Building attention mechanisms into deep learning systems is a very nascent and active area of research. In 2016, researchers from the Google Brain team published a paper that detailed some of the key models that can be used to simulate attention in deep neural networks.

注意是一种认知技能，对知识的形成至关重要。但是，注意力的动态变化一直是神经科学家一个世纪以来的谜，而就在最近，我们已经取得了重大突破，有助于解释注意力的作用。在深度学习程序的背景下，建立注意力动态似乎是显而易见的一步，以提高模型的知识并使模型适应不同的场景。将注意力机制构建到深度学习系统中是一个非常活跃的新兴研究领域。 2016年，Google Brain团队的研究人员发表了一篇论文，详细介绍了可用于模拟深度神经网络中注意力的一些关键模型。

注意如何工作？ (How Attention Works?)

In order to understand attention in deep learning systems it might be useful to take a look at how this cognitive phenomenon takes place in the human brain. From the perspective of neuroscience, attention is the ability of the brain to selectively concentrate on one aspect of the environment while ignoring other things. The current research identifies two main types of attention both related to different areas of the brain. Object-based attention is often referred to the ability of the brain to focus on specific objects such as a picture of a section in this article. Spatial-based attention is mostly related to the focus on specific locations. Both types of attention are relevant in deep learning models. While object-based attention can be used in systems such as image recognition or machine translation, spatial-attention is relevant in deep reinforcement learning scenarios such as self-driving vehicles.

为了了解深度学习系统中的注意力，研究一下这种认知现象在人脑中如何发生可能是有用的。从神经科学的角度来看，注意力是大脑选择性地专注于环境的一方面而忽略其他事物的能力。当前的研究确定了两种主要的注意力类型，它们都与大脑的不同区域有关。基于对象的注意力通常是指大脑专注于特定对象的能力，例如本文中某个部分的图片。基于空间的注意力主要与对特定位置的关注有关。两种注意力都与深度学习模型相关。尽管基于对象的注意力可以用于诸如图像识别或机器翻译之类的系统中，但空间注意力与诸如自动驾驶汽车之类的深度强化学习场景相关。

深度神经网络中的注意界面 (Attentional Interfaces in Deep Neural Networks)

When comes to deep learning systems, there are different techniques that have been created in order to simulate different types of attention. The Google research paper focuses on four fundamental models that are relevant to recurrent neural networks(RNNs). Why RNNs specifically? Well, RNNs are a type of network that is mostly used to process sequential data and obtain higher-level knowledge. As a result, RNNs are often used as a second step to refine the work of other neural network models such as convolutional neural networks(CNNs) or generative interfaces. Building attention mechanisms into RNNs can help improve the knowledge of different deep neural models. The Google Brain team identified the following four techniques for building attention into RNNs models:

当涉及深度学习系统时，为了模拟不同类型的注意力，已经创建了不同的技术。 Google的研究论文集中在与循环神经网络(RNN)相关的四个基本模型上。为什么要使用RNN？好吧，RNN是一种网络，主要用于处理顺序数据并获得更高级的知识。结果，RNN通常被用作改进其他神经网络模型(例如卷积神经网络(CNN)或生成接口)的工作的第二步。在RNN中建立注意力机制可以帮助提高对不同深度神经模型的了解。 Google Brain团队确定了以下四种将注意力吸引到RNN模型中的技术：

· Neural Turing Machines: One of the simplest attentional interfaces, Neural Turing Machines(NTMs) add a memory structure to traditional RNNs. Using a memory structure allows ATM to specify an “attention distribution” section that describes the area that the model should focus on. Implementations of NTMs can be found in many of the popular deep learning frameworks such as TensorFlow and PyTorch.

神经图灵机：神经图灵机(NTM)是最简单的关注界面之一，为传统RNN添加了内存结构。使用内存结构使ATM可以指定“注意力分布”部分，以描述模型应关注的区域。 NTM的实现可以在许多流行的深度学习框架中找到，例如TensorFlow和PyTorch。

Image for post — Source: https://distill.pub/2016/augmented-rnns/

· Adaptive Computation Time: This is a brand-new technique that allows RNNs to perform multiple steps of computation for each time step. How is this related to attention? Very simply, standard RNNs perform the same amount of computation of each step. Adaptive computation time techniques used an attention distribution model to the number of steps to run each time allowing to put more emphasis on specific parts of the model.

·自适应计算时间：这是一种崭新的技术，允许RNN在每个时间步长执行多个计算步长。这与注意力有什么关系？很简单，标准RNN在每个步骤中执行相同的计算量。自适应计算时间技术使用注意力分布模型来确定每次运行的步骤数，从而可以将更多的重点放在模型的特定部分上。

· Neural Programmer: A fascinating new area in the deep learning space, neural programmer models focus on learning to create programs in order to solve a specific task. In fact, it learns to generate such programs without needing examples of correct programs. It discovers how to produce programs as a means to the end of accomplishing some task. Conceptually, neural programmer techniques try to bridge the gap between neural networks and traditional programming techniques that can be used to develop attention mechanisms in deep learning models.

·神经编程器：神经编程器模型是深度学习领域一个引人入胜的新领域，专注于学习创建程序以解决特定任务。实际上，它学会了生成此类程序而无需正确程序的示例 。它发现了如何制作程序作为完成某些任务的一种手段。从概念上讲，神经程序员技术试图弥合神经网络与传统编程技术之间的鸿沟，而传统编程技术可用于开发深度学习模型中的注意力机制。

· Attentional Interfaces: Attentional interfaces uses an RNN model to focus on specific sections of another neural network. A classic example of this technique can be found in image recognition models using a CNN-RNN duplex. In this architecture, the RNN will focus on specific parts of the images generated by the CNN in order to refine it and improve the quality of the knowledge.

·注意接口：注意接口使用RNN模型来关注另一个神经网络的特定部分。可以在使用CNN-RNN双工的图像识别模型中找到此技术的经典示例。在这种体系结构中，RNN将专注于CNN生成的图像的特定部分，以对其进行完善和提高知识质量。

Attention is becoming one of the most important elements of modern neural network architectures but, at the same time, we are just getting started in that area. The fascinating feature of attention is that is not a brand new neural network architecture but a way to augment existing architectures with new capabilities. Attention-based architectures like Transformers have become one of the most important developments in the recent years of deep learning and we can’t wait to see what’s next.

注意正在成为现代神经网络体系结构中最重要的元素之一，但是与此同时，我们才刚刚开始涉足这一领域。注意的引人入胜的特征在于，它不是全新的神经网络架构，而是一种通过新功能扩展现有架构的方法。诸如Transformers之类的基于注意力的架构已成为近年来深度学习中最重要的发展之一，我们迫不及待地想看到下一步。