Uber Thomas 论文整理

Thomas Miconi

Working

Neural networks with differentiable structure

虽然梯度下降在学习神经网络的连接权值方面已经被证明是非常成功的,但是这些网络的实际结构通常是由人工或其他优化算法来确定的。这里我们描述了一个简单的方法,使网络结构可微,从而可以得到梯度下降。我们在应用于简单序列预测问题的递归神经网络上测试了这种方法。从只包含一个节点的初始网络开始,该方法自动构建成功解决任务的网络。最终网络中的节点数与任务难度有关。该方法可以动态地增加网络规模以响应任务的突然复杂化;但是,对于合理的元参数,由于任务简化而导致的网络规模减小并不明显。对于这些测试任务,该方法不会影响网络性能:可变大小的网络实际上比更高、更低或相同大小的固定大小的网络获得更好的性能。最后,我们讨论了如何将这种方法应用于更复杂的网络,如前向分层网络或任意形状的多区域网络。

The impossibility of “fairness”: a generalized impossibility result for decisions

2020

Enabling Continual Learning with Differentiable Hebbian Plasticity.

Continual learning is the problem of sequentiallylearning new tasks or knowledge while protecting previouslyacquired knowledge. However, catastrophic forgetting poses agrand challenge for neural networks performing such learningprocess. Thus, neural networks that are deployed in the realworld often struggle in scenarios where the data distribution isnon-stationary (concept drift), imbalanced, or not always fullyavailable, i.e., rare edge cases. We propose a Differentiable Heb-bian Consolidation model which is composed of a DifferentiableHebbian Plasticity (DHP) Softmax layer that adds a rapid learn-ing plastic component (compressed episodic memory) to the fixed(slow changing) parameters of the softmax output layer; enablinglearned representations to be retained for a longer timescale. Wedemonstrate the flexibility of our method by integrating well-known task-specific synaptic consolidation methods to penalizechanges in the slow weights that are important for each targettask. We evaluate our approach on the Permuted MNIST, SplitMNIST and Vision Datasets Mixture benchmarks, and introducean imbalanced variant of Permuted MNIST — a dataset thatcombines the challenges of class imbalance and concept drift.Our proposed model requires no additional hyperparameters andoutperforms comparable baselines by reducing forgetting.

持续学习是在保护先前获得的知识的同时顺序学习新任务或知识的问题。 然而,灾难性的遗忘给执行这种学习过程的神经网络带来了巨大的挑战。 因此,部署在现实世界中的神经网络通常会在数据分布不稳定,概念失衡或不总是完全可用的情况下(即极少数情况)遇到困难。 我们提出了一个可微的Hebbian合并模型,该模型由一个可微的Hebbian可塑性(DHP)Softmax层组成,该模型将快速学习的塑料成分(压缩的情节记忆)添加到softmax输出层的固定(缓慢变化)参数中; 使学习的表示形式可以保留更长的时间范围。 通过集成众所周知的特定于任务的突触合并方法来惩罚我们的方法的灵活性,以惩罚对每个目标任务都很重要的慢权重的变化。 我们在Permuted MNIST,SplitMNIST和Vision数据集混合物基准测试中评估了我们的方法,并引入了Permuted MNIST的不平衡变体-结合了类不平衡和概念漂移的挑战的数据集。我们提出的模型不需要额外的超参数,并且通过减少遗忘而优于可比较的基线。

Learning to continually learn

Continual lifelong learning requires an agent or model tolearn many sequentially ordered tasks, building on previous knowl-edge without catastrophically forgetting it. Much work has gone to-wards preventing the default tendency of machine learning mod-els to catastrophically forget, yet virtually all such work involvesmanually-designed solutions to the problem. We instead advocatemeta-learning a solution to catastrophic forgetting, allowing AI tolearn to continually learn. Inspired by neuromodulatory processesin the brain, we propose A Neuromodulated Meta-Learning Algo-rithm (ANML). It differentiates through a sequential learning pro-cess to meta-learn an activation-gating function that enables context-dependent selective activation within a deep neural network. Specif-ically, a neuromodulatory (NM) neural network gates the forwardpass of another (otherwise normal) neural network called the predic-tion learning network (PLN). The NM network also thus indirectlycontrols selective plasticity (i.e. the backward pass of) the PLN.ANML enables continual learning without catastrophic forgetting atscale: it produces state-of-the-art continual learning performance, se-quentially learning as many as 600 classes (over 9,000 SGD updates)

持续的终身学习需要代理或模型在先前知识的基础上学习许多顺序排列的任务,而不会灾难性地忘记它。 已经进行了许多工作,以防止机器学习模型的默认趋势造成灾难性的遗忘,但是实际上所有这些工作都涉及手动设计的解决方案。 相反,我们提倡元学习来解决灾难性遗忘的解决方案,从而使AI能够不断学习。 受大脑中神经调节过程的启发,我们提出了一种神经调节元学习算法(ANML)。 它通过顺序学习过程与众不同,以元学习激活门控功能,从而可以在深度神经网络中实现上下文相关的选择性激活。 具体而言,神经调节(NM)神经网络控制着另一个称为预测学习网络(PLN)的神经网络的前向通行。 NM网络因此也间接控制了PLN的选择性可塑性(即反向传递).ANML使持续学习而不会造成灾难性的大规模忘却:它产生了最先进的持续学习表现,连续学习了多达600个课程( 超过9,000个SGD更新)

2019

Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity

The impressive lifelong learning in animal brains is primarily enabled by plasticchanges in synaptic connectivity. Importantly, these changes are not passive, butare actively controlled by neuromodulation, which is itself under the control ofthe brain. The resulting self-modifying abilities of the brain play an importantrole in learning and adaptation, and are a major basis for biological reinforcementlearning. Here we show for the first time that artificial neural networks withsuch neuromodulated plasticity can be trained with gradient descent. Extendingprevious work on differentiable Hebbian plasticity, we propose a differentiableformulation for the neuromodulation of plasticity. We show that neuromodulatedplasticity improves the performance of neural networks on both reinforcementlearning and supervised learning tasks. In one task, neuromodulated plastic LSTMswith millions of parameters outperform standard LSTMs on a benchmark languagemodeling task (controlling for the number of parameters). We conclude thatdifferentiable neuromodulation of plasticity offers a powerful new framework fortraining neural networks.

动物大脑中令人印象深刻的终身学习主要是通过突触连接的可塑性变化来实现的。重要的是,这些变化不是被动的,而是由神经调节主动控制的,而神经调节本身就在大脑的控制之下。由此产生的大脑自我调节能力在学习和适应中起着重要作用,是生物强化学习的主要基础。本文首次证明了具有这种神经调节可塑性的人工神经网络可以用梯度下降法训练。在前人关于可微Hebbian可塑性的研究基础上,我们提出了一个可塑性神经调节的可微公式。我们发现神经调节可塑性提高了神经网络在强化学习和监督学习任务中的性能。在一个任务中,具有数百万个参数的神经调制塑料lstm在基准语言建模任务(控制参数数量)上的表现优于标准lstm。我们的结论是,可塑性的可微神经调节为神经网络的训练提供了一个强有力的新框架。

2018

Differentiable plasticity: training plastic neural networks with backpropagation

在这里插入图片描述
How can we build agents that keep learning fromexperience, quickly and efficiently, after their ini-tial training? Here we take inspiration from themain mechanism of learning in biological brains:synaptic plasticity, carefully tuned by evolutionto produce efficient lifelong learning. We showthat plasticity, just like connection weights, canbe optimized by gradient descent in large (mil-lions of parameters) recurrent networks with Heb-bian plastic connections. First, recurrent plasticnetworks with more than two million parameterscan be trained to memorize and reconstruct setsof novel, high-dimensional (1,000+ pixels) nat-ural images not seen during training. Crucially,traditional non-plastic recurrent networks fail tosolve this task. Furthermore, trained plastic net-works can also solve generic meta-learning taskssuch as the Omniglot task, with competitive re-sults and little parameter overhead. Finally, inreinforcement learning settings, plastic networksoutperform a non-plastic equivalent in a maze ex-ploration task. We conclude that differentiableplasticity may provide a powerful novel approachto the learning-to-learn problem.

我们怎样才能建立一个在初始培训之后,不断从经验中学习、快速而有效的代理人?这里我们从生物学的主要学习机制中得到启发大脑:突触可塑性,经过进化的精心调整,以产生高效的终身学习。我们证明了在具有Heb-bian塑性连接的大型(mil-lions参数)递归网络中,可以通过梯度下降来优化塑性,就像连接权重一样。首先,训练具有超过200万个参数的周期性塑性网络,以记忆和重建训练过程中看不到的新颖、高维(1000+像素)自然图像集。关键的是,传统的非塑性递归网络无法解决这一问题。此外,经过训练的塑料网络也可以解决一般的元学习任务,例如作为全知任务,具有竞争性的结果和较小的参数开销。最后,在强化学习环境中,塑料网络可以在迷宫探索任务中执行非塑料等效物。我们的结论是微分可塑性可能为解决学习问题提供了一种新的方法。

2017

Biologically plausible learning in recurrent neural networks reproduces neural dynamics observed during cognitive tasks

Neural activity during cognitive tasks exhibits complex dynamics that flexibly encode task-relevant variables. Recurrent neural networks operating in the near-chaotic regime, which spontaneously generate rich dynamics, have been proposed as a model of cortical computation during cognitive tasks. However, existing methods for training these networks are either biologically implausible, and/or require a continuous, real-time error signal to guide the learning process. Here we show that a biologically plausible learning rule can train such recurrent networks, guided solely by delayed, phasic rewards at the end of each trial. Networks operating under this learning rule successfully learn nontrivial tasks requiring flexible (context-dependent) associations, memory maintenance, nonlinear mixed selectivities, and coordination among multiple outputs. Furthermore, applying this method to learn various tasks from the experimental literature, we show that the resulting networks replicate complex dynamics previously observed in animal cortex, such as dynamic encoding of task features, switching from stimulus-specific to response-specific representations, and selective integration of sensory input streams. The rule also successfully trains networks with nonnegative responses and separate excitatory and inhibitory neurons observing Dale’s law. We conclude that recurrent neural networks offer a plausible model of cortical dynamics during both learning and performance of flexible behavior.

认知任务期间的神经活动表现出复杂的动力学,可以灵活地编码与任务相关的变量。 已经提出了在近乎混沌状态下运行的循环神经网络,该网络自发地产生丰富的动力学,作为认知任务期间皮层计算的模型。 但是,用于训练这些网络的现有方法在生物学上是不可行的,和/或需要连续的实时错误信号来指导学习过程。 在这里,我们表明,生物学上可行的学习规则可以训练这种循环网络,仅在每次试验结束时以延迟的阶段性奖励为指导。 在此学习规则下运行的网络成功地学习了需要灵活(依赖于上下文)的关联,内存维护,非线性混合选择性以及多个输出之间的协调的非平凡任务。 此外,将这种方法应用于从实验文献中学习各种任务的过程中,我们证明了所产生的网络能够复制先前在动物皮层中观察到的复杂动力学,例如任务特征的动态编码,从刺激特定的表示转换为响应特定的表示以及选择性 感觉输入流的整合。 该规则还成功地训练了具有非阴性反应的网络,并遵守了戴尔定律,将兴奋性和抑制性神经元分开。 我们得出结论,在学习和灵活行为的过程中,递归神经网络为皮层动力学提供了一个合理的模型。

Learning: Neural networks subtract and conquer

如何从反馈中学习

Two theoretical studies reveal how networks of neurons may behave during reward-based learning.

To thrive in their environments, animals must learn how to process lots of inputs and take appropriate actions (Figure 1A). This sort of learning is thought to involve changes in the ability of synapses (the junctions between neurons) to transmit signals, with these changes being facilitated by rewards such as food. However, reward-based learning is difficult because reward signals do not provide specific instructions for individual synapses on how they should change. Moreover, while the latest algorithms for reinforcement learning achieve human-level performance on many problems (see, for example, Mnih et al., 2015), we still do not fully understand how brains learn from rewards. Now, in eLife, two independent theoretical studies shed new light on the neural mechanisms of learning.

The studies address two complementary aspects of reward-based learning in recurrent neuronal networks – artificial networks of neurons that exhibit dynamic, temporally-varying activity. In both studies, actions are generated by a recurrent network (the “decision network”) that is composed of hundreds of interconnected neurons that continuously influence each others’ activity (Figure 1). The decision network integrates sensory information about the state of the environment and responds with an action that may or may not result in a reward. The network can also change the ability of individual synapses to transmit signals, referred to as synapse strength. Over a period of time, increasing the strength of synapses that promote an action associated with a reward leads to the network choosing actions that receive rewards more often, which results in learning.
At the core of both studies lies a classic algorithm for reinforcement learning known as REINFORCE, which aims to maximize the expected reward in such scenarios (Figure 1A; Williams, 1992). In this algorithm, the strength of a synapse that connects neuron j to neuron i, Wij, changes to Wij + αEij(t) x (R(t) − Rb), where α is a constant, Eij is a quantity called the eligibility, t is time, R is the reward and Rb is a quantity called the reward baseline. The eligibility Eij(t) expresses how much a small change of Wij affects the action taken by the decision network at time t.

The conceptual simplicity of REINFORCE and the fact that it can be applied to the tasks commonly studied in neuroscience labs make it an attractive starting point to study the neural mechanisms of reward-based learning. Yet, this algorithm raises two fundamental questions. Firstly, how can a synapse estimate its own eligibility, using only locally-available information? Indeed, in a recurrent network, a change in synapse strength can influence a third neuron, implying that the eligibility depends on the activity of that third neuron, which the synapse will have never seen. Perhaps more importantly, in scenarios where the reward arrives after the network has produced long sequences of actions, the synapse must search the stream of recently experienced electrical signals for those that significantly influenced the action choice, so that the corresponding synapses can be reinforced. Secondly, how can the network com- pute an adequate reward baseline Rb?

In one of the papers Thomas Miconi of the Neurosciences Institute in La Jolla reports, somewhat surprisingly, that simply accumulating over time a superlinear function (such as f(x) = x3) of the product of the electrical signals on both sides of the synapse, returns a substitute for the optimal synapse eligibility that works well in practice (Miconi, 2017). This form of eligibility turns REINFORCE into a rule for the ability of synapses to strengthen or weaken (a property known as synaptic plasticity) that is more biologically realistic than the original optimal REINFORCE algorithm (Figure 1B) and is similar in spirit to models of synaptic plasticity involving neuromodulators such as dopamine or acetylcholine (Frémaux and Gerstner, 2016).

Miconi’s practical use of a superlinear function seems key to successful learning in the presence of delayed rewards. This nonlinearity tends to discard small (and likely inconsequential) co-fluctuations in electrical activity on both sides of the synapse, while amplifying the larger ones. While a full understanding of the success of this rule will require more analysis, Miconi convincingly demonstrates successful training of recurrent networks on a variety of tasks known to rely on complex internal dynamics. Learning also promotes the emergence of collective dynamics similar to those observed in real neural circuits (for example, Stokes et al., 2013; Mante et al., 2013).

As predicted by the theory of REINFORCE (Peters and Schaal, 2008), Miconi found it essential to subtract a baseline reward (Rb) from the actual reward ® obtained at the end of the trial. While Miconi simply assumes that such predictions are available, Francis Song, Guangyu Yang and Xiao-Jing Wang of New York University and NYU Shanghai wondered how the brain could explicitly learn such detailed, dynamic reward predictions (Song et al., 2017). Alongside the main decision network, they trained a second recurrent network, called the “value network”, to continuously predict the total future reward on the basis of past activity in the decision network (including past actions; Figure 1C). These reward predictions were then subtracted from the true reward to guide learning in the decision network. Song et al. were also able to train networks on an impressive array of diverse cognitive tasks, and found compelling similarities between the dynamics of their decision networks and neural recordings.

Importantly, although Song et al. used synapse eligibilities (with a few other machine learning tricks) that are not biologically plausible to train both networks optimally, their setup now makes it possible to ask other questions related to how neurons represent uncertainty and value. For example, when it is only possible to observe part of the surrounding environment, optimal behavior often requires individuals to take their own internal uncertainty about the state of the world into account (e.g. allowing an animal to opt for lower, but more certain rewards). Networks trained in such contexts are indeed found to select actions on the basis of an internal sense of uncertainty on each trial. Song et al. tested their model in a simple economic decision-making task where in each trial the network is offered a choice of two alternatives carrying different amounts of rewards. They found that there are neurons in the value network that exhibit selectivity to offer value, choice and value, or choice alone. This is in agreement with recordings from the brains of monkeys performing the same task.

The complementary findings of these two studies could be combined into a unified model of reward-based learning in recurrent networks. To be able to build networks that not only behave, but also learn, like animals promises to bring us closer to understanding the neural basis of behavior. However, progress from there will rely critically on our ability to analyze the time-dependent strategies used by trained networks (Sussillo and Barak, 2013), and to identify neural signatures of such strategies.

为了在环境中壮成长,动物必须学习如何处理大量输入并采取适当行动(图1A)。 人们认为这种学习涉及突触(神经元之间的连接)传递信号的能力的变化,这些变化可以通过诸如食物之类的奖励而得到促进。 但是,基于奖励的学习很困难,因为奖励信号无法为各个突触提供如何改变的具体指令。 此外,尽管最新的强化学习算法在许多问题上都达到了人类水平的表现(例如,参见Mnih等人,2015),但我们仍不能完全理解大脑如何从奖励中学习。 现在,在eLife中,两项独立的理论研究为学习的神经机制提供了新的思路。

这些研究解决了递归神经元网络中基于奖励的学习的两个互补方面–表现出动态的,随时间变化的活动的人工神经元网络。 在这两项研究中,动作都是由循环网络(“决策网络”)产生的,该循环网络由数百个相互连接的神经元组成,这些神经元不断影响彼此的活动(图1)。 决策网络整合有关环境状态的感官信息,并以可能会或不会导致奖励的行动做出响应。 网络还可以更改单个突触传输信号的能力,称为突触强度。 在一段时间内,增加促进与奖励相关的动作的突触的强度会导致网络选择接收奖励频率更高的动作,从而导致学习。

两项研究的核心都在于一种称为REINFORCE的经典强化学习算法,该算法旨在在这种情况下最大限度地提高预期回报(图1A; Williams,1992)。 在此算法中,将神经元j与神经元i连接的突触的强度Wij变为Wij +αEij(t)x(R(t)-Rb),其中α是常数,Eij是称为合格性的量 ,t是时间,R是奖励,Rb是称为奖励基准的数量。 资格Eij(t)表示Wij的微小变化会在时间t上影响决策网络采取的行动。

REINFORCE的概念简单性以及可以将其应用于神经科学实验室通常研究的任务的事实,使其成为研究基于奖励的学习的神经机制的诱人起点。 然而,该算法提出了两个基本问题。 首先,仅使用本地可用信息,突触如何估计其自身资格? 实际上,在递归网络中,突触强度的变化会影响第三神经元,这表明资格取决于该第三神经元的活性,而这是突触从未见过的。 也许更重要的是,在网络产生较长的动作序列后奖励到达的情况下,突触必须在最近经历的电信号流中搜索对动作选择有重大影响的信号,以便可以增强相应的突触。 其次,网络如何计算适当的奖励基准Rb?

拉霍亚大学神经科学研究所的论文之一托马斯·米科尼(Thomas Miconi)报告,出乎意料的是,随着时间的推移,简单地累积了突触两侧电信号乘积的超线性函数(例如f(x)= x3) ,可以替代在实践中运行良好的最佳突触资格(Miconi,2017年)。 这种形式的资格将REINFORCE变成突触增强或减弱的能力的规则(一种称为突触可塑性的特性),它比原始的最佳REINFORCE算法(图1B)在生物学上更现实,并且在本质上与突触模型相似 涉及神经调节剂(如多巴胺或乙酰胆碱)的可塑性(Frémaux和Gerstner,2016)。

Miconi实际使用超线性函数似乎是在存在延迟奖励的情况下成功学习的关键。 这种非线性趋向于抛弃突触两侧电活动中的小(且可能无关紧要)的共同波动,同时放大较大的波动。 尽管对此规则的成功有一个完整的了解将需要更多的分析,但Miconi令人信服地证明了对经常性网络的成功训练,涉及到各种已知的依赖复杂内部动态的任务。 学习还促进了集体动力学的出现,类似于在真实的神经回路中观察到的(例如,斯托克斯等,2013;曼特等,2013)。

正如REINFORCE的理论所预测的那样(Peters和Schaal,2008年),Miconi发现从试验结束时获得的实际奖励(R)中减去基线奖励(Rb)至关重要。 尽管Miconi只是简单地假设了这样的预测,但纽约大学和纽约大学上海的Francis Song,Yangyu Yu和Wang Xiao-Jing却在想,大脑如何才能明确学习这种详细的动态奖励预测(Song等,2017)。 除主要决策网络外,他们还训练了第二个经常性网络,称为“价值网络”,以根据决策网络中的过去活动(包括过去的行动;图1C)不断预测未来的总回报。 然后从真实奖励中减去这些奖励预测,以指导决策网络中的学习。 宋等。 他们还能够在一系列令人印象深刻的各种认知任务上训练网络,并发现其决策网络和神经记录的动态之间令人信服的相似之处。

重要的是,尽管宋等。 由于使用了在生物学上似乎不可行的最佳方式训练两个网络的突触可分辨性(以及其他一些机器学习技巧),它们的设置现在使得可以提出与神经元如何表示不确定性和价值有关的其他问题。 例如,当只能观察周围环境的一部分时,最佳行为常常要求个人考虑到自己对世界状况的内部不确定性(例如,允许动物选择较低但更确定的奖励) 。 确实发现在这种情况下受过训练的网络会根据每次试验的内部不确定性来选择行动。 宋等。 在一个简单的经济决策任务中测试了他们的模型,在每次试验中,都向网络提供了两种带有不同奖励金额的选择。 他们发现,价值网络中存在神经元,这些神经元对提供价值,选择和价值或仅选择具有选择性。 这与执行相同任务的猴子大脑的录音一致。

可以将这两项研究的互补发现组合成递归网络中基于奖励的学习的统一模型。 为了能够像动物一样建立不仅行为而且学习的网络,有望使我们更加了解行为的神经基础。 然而,从那里取得进展将严重依赖于我们分析受过训练的网络所使用的时间相关策略的能力(Sussillo和Barak,2013),并确定此类策略的神经特征。

The impossibility of “fairness”: a generalized impossibility result for decisions

Various measures can be used to estimate bias or unfairness in a predictor. Previous work has already established that some of these measures are incompatible with each other. Here we show that, when groups differ in prevalence of the predicted event, several intuitive, reasonable measures of fairness (probability of positive prediction given occurrence or non-occurrence; probability of occurrence given prediction or non-prediction; and ratio of predictions over occurrences for each group) are all mutually exclusive: if one of them is equal among groups, the other two must differ. The only exceptions are for perfect, or trivial (always-positive or always-negative) predictors. As a consequence, any non-perfect, non-trivial predictor must necessarily be “unfair” under two out of three reasonable sets of criteria. This result readily generalizes to a wide range of well-known statistical quantities (sensitivity, specificity, false positive rate, precision, etc.), all of which can be divided into three mutually exclusive groups. Importantly, The results applies to all predictors, whether algorithmic or human. We conclude with possible ways to handle this effect when assessing and designing prediction methods.
可以使用各种度量来估计预测变量中的偏差或不公平。 先前的工作已经确定,其中一些措施彼此不兼容。 在这里,我们显示出,当各组的预测事件的发生率不同时,可以采用几种直观,合理的公平度量(给定发生或不发生的肯定预测的概率;给定预测或不预测的发生概率;预测与发生的比率) 对于每个组)都是互斥的:如果组中的一个相等,则其他两个必须不同。 唯一的例外是完美的或微不足道的(始终为正或始终为负)预测变量。 结果,在三套合理的标准中的两套下,任何不完美,不平凡的预测变量都必然是“不公平的”。 该结果很容易推广到各种众所周知的统计量(敏感性,特异性,假阳性率,精确度等),所有这些统计量都可以分为三个互斥的组。 重要的是,结果适用于所有预测器,无论是算法预测还是人工预测。 在评估和设计预测方法时,我们总结了可能的方法来处理这种影响。

2016

Backpropagation of Hebbian plasticity for Continual learning.

Spontaneous emergence of fast attractor dynamics in a model of developing primary visual cortex
:在发展初级视觉皮层模型中快速吸引子动力学的自发出现。
Recent evidence suggests that neurons in primary sensory cortex arrange into competitive groups, representing stimuli by their joint activity rather than as independent feature analysers. A possible explanation for these results is that sensory cortex implements attractor dynamics, although this proposal remains controversial. Here we report that fast attractor dynamics emerge naturally in a computational model of a patch of primary visual cortex endowed with realistic plasticity (at both feedforward and lateral synapses) and mutual inhibition. When exposed to natural images (but not random pixels), the model spontaneously arranges into competitive groups of reciprocally connected, similarly tuned neurons, while developing realistic, orientation-selective receptive fields. Importantly, the same groups are observed in both stimulus-evoked and spontaneous (stimulus-absent) activity. The resulting network is inhibition-stabilized and exhibits fast, non-persistent attractor dynamics. Our results suggest that realistic plasticity, mutual inhibition and natural stimuli are jointly necessary and sufficient to generate attractor dynamics in primary sensory cortex.
最近的证据表明,初级感觉皮层中的神经元排列成竞争性组,通过其联合活动而不是作为独立的特征分析器来表示刺激。 这些结果的可能解释是感觉皮层实现了吸引子动力学,尽管这一提议仍存在争议。 在这里我们报告快速的吸引子动力学自然地出现在具有实际可塑性(在前馈和横向突触处)和相互抑制的初级视觉皮层的计算模型中。 当暴露于自然图像(而不是随机像素)时,该模型自发地排列成相互竞争的,类似调谐的神经元的竞争性组,同时形成了逼真的,方向选择性的接受场。 重要的是,在刺激诱发和自发(无刺激)活动中观察到相同的组。 所得的网络具有抑制稳定性,并显示出快速,非持久的吸引子动力学。 我们的结果表明,现实的可塑性,相互抑制和自然刺激是共同必要的,并且足以在初级感觉皮层中产生吸引子动力学。

A Feedback Model of Attention Explains the Diverse Effects of Attention on Neural Firing Rates and Receptive Field Structure

注意反馈模型解释了注意对神经发射率和感受野结构的不同影响

Visual attention has many effects on neural responses, producing complex changes in firing rates, as well as modifying the structure and size of receptive fields, both in topological and feature space. Several existing models of attention suggest that these effects arise from selective modulation of neural inputs. However, anatomical and physiological observations suggest that attentional modulation targets higher levels of the visual system (such as V4 or MT) rather than input areas (such as V1). Here we propose a simple mechanism that explains how a top-down attentional modulation, falling on higher visual areas, can produce the observed effects of attention on neural responses. Our model requires only the existence of modulatory feedback connections between areas, and short-range lateral inhibition within each area. Feedback connections redistribute the top-down modulation to lower areas, which in turn alters the inputs of other higher-area cells, including those that did not receive the initial modulation. This produces firing rate modulations and receptive field shifts. Simultaneously, short-range lateral inhibition between neighboring cells produce competitive effects that are automatically scaled to receptive field size in any given area. Our model reproduces the observed attentional effects on response rates (response gain, input gain, biased competition automatically scaled to receptive field size) and receptive field structure (shifts and resizing of receptive fields both spatially and in complex feature space), without modifying model parameters. Our model also makes the novel prediction that attentional effects on response curves should shift from response gain to contrast gain as the spatial focus of attention drifts away from the studied cell.

视觉注意力对神经反应有许多影响,会引起发射速率的复杂变化,并会改变拓扑空间和特征空间中感受野的结构和大小。 几种现有的关注模型表明,这些影响源自神经输入的选择性调制。 但是,解剖和生理观察表明,注意力调节的目标是视觉系统(例如V4或MT)的更高级别,而不是输入区域(例如V1)。 在这里,我们提出了一种简单的机制,解释了自上而下的注意力调制如何落在较高的视觉区域上,从而产生观察到的注意力对神经反应的影响。 我们的模型仅要求区域之间存在调制反馈连接,并且每个区域内都存在短距离横向抑制。 反馈连接将自上而下的调制重新分配到较低的区域,这又更改了其他较高区域的单元的输入,包括那些未接收初始调制的单元。 这会产生发射速率调制和接收场偏移。 同时,相邻细胞之间的短程侧向抑制会产生竞争效应,该竞争效应会自动缩放到任何给定区域中的感受野大小。 我们的模型再现了观察到的注意力对响应率(响应增益,输入增益,有偏向的竞争,会自动缩放到接收域大小)和接收域结构(空间和复杂特征空间中接收域的移动和大小调整)的影响,而无需修改模型参数 。 我们的模型还做出了新颖的预测,即随着注意力的空间焦点从研究的单元漂移而来,注意力对响应曲线的影响应该从响应增益转移到对比度增益。

Defining and simulating open-ended novelty: requirements, guidelines, and challenges.

Neural networks with differentiable structure
While gradient descent has proven highly successful in learning connection weights for neural networks, the actual structure of these networks is usually determined by hand, or by other optimization algorithms. Here we describe a simple method to make network structure differentiable, and therefore accessible to gradient descent. We test this method on recurrent neural networks applied to simple sequence prediction problems. Starting with initial networks containing only one node, the method automatically builds networks that successfully solve the tasks. The number of nodes in the final network correlates with task difficulty. The method can dynamically increase network size in response to an abrupt complexification in the task; however, reduction in network size in response to task simplification is not evident for reasonable meta-parameters. The method does not penalize network performance for these test tasks: variable-size networks actually reach better performance than fixed-size networks of higher, lower or identical size. We conclude by discussing how this method could be applied to more complex networks, such as feedforward layered networks, or multiple-area networks of arbitrary shape.
虽然梯度下降已证明在学习神经网络的连接权重方面非常成功,但这些网络的实际结构通常是由人工或其他优化算法确定的。 在这里,我们描述了一种使网络结构可区分的简单方法,因此可用于梯度下降。 我们在应用于简单序列预测问题的递归神经网络上测试了该方法。 从仅包含一个节点的初始网络开始,该方法会自动构建成功解决任务的网络。 最终网络中的节点数与任务难度相关。 该方法可以响应于任务中的突然复杂化而动态地增加网络大小。 但是,对于合理的元参数,响应于任务简化而减少网络规模的做法并不明显。 该方法不会对这些测试任务造成网络性能的损失:可变大小的网络实际上比大小较大,较小或相同的固定大小网络具有更好的性能。 最后,我们讨论了如何将该方法应用于更复杂的网络,例如前馈分层网络或任意形状的多区域网络。

2015

There’s Waldo! A Normalization Model of Visual Search Predicts Single-Trial Human Fixations in an Object Search Task

在场景中搜索对象时,大脑如何决定下一步要看的地方? 视觉搜索理论表明,存在一个全球“优先级地图”,该地图将自下而上的视觉信息与自上而下的特定于目标的信号集成在一起。 我们提出了一种视觉搜索的机制模型,该模型与最近的神经生理学证据相符,可以在混乱的图像中定位目标,并预测搜索任务中的单次尝试行为。 该模型假定,对形状特征有选择性的高级视网膜局部区域将接受全局的,针对特定目标的调节,并通过分裂抑制实现局部归一化。 规范化步骤对于防止自下而上的显着特征垄断注意力至关重要。 所得的活动模式构成一个优先级映射,该映射跟踪本地输入和目标特征之间的相关性。 选择该优先级图的最大值作为关注点。 然后,视觉输入在所选位置周围在空间上得到增强,从而允许对象选择视觉区域确定目标是否存在于此位置。 此模型可以在阵列图像中以及在自然场景中粘贴对象时定位对象。 该模型还可以在涉及复杂对象的搜索任务中预测单次试验的人类注视,包括那些有错误和无目标的试验。

2011

A Feedback Model of Attentional Effects in the Visual Cortex.
视觉皮质中注意力效应的反馈模型。

2010
The Gamma slideshow: object-based perceptual cycles in a model of the visual cortex.

Gamma幻灯片:视觉皮层模型中基于对象的感知周期。
While recent studies have shed light on the mechanisms that generate gamma (>40 Hz) oscillations, the functional role of these oscillations is still debated. Here we suggest that the purported mechanism of gamma oscillations (feedback inhibition from local interneurons), coupled with lateral connections implementing “Gestalt” principles of object integration, naturally leads to a decomposition of the visual input into object-based “perceptual cycles,” in which neuron populations representing different objects within the scene will tend to fire at successive cycles of the local gamma oscillation. We describe a simple model of V1 in which such perceptual cycles emerge automatically from the interaction between lateral excitatory connections (linking oriented cells falling along a continuous contour) and fast feedback inhibition (implementing competitive firing and gamma oscillations). Despite its extreme simplicity, the model spontaneously gives rise to perceptual cycles even when faced with natural images. The robustness of the system to parameter variation and to image complexity, together with the paucity of assumptions built in the model, support the hypothesis that perceptual cycles occur in natural vision.
尽管最近的研究揭示了产生伽马(> 40 Hz)振荡的机制,但这些振荡的功能作用仍在争论中。 在这里,我们建议,所谓的伽马振荡机制(来自局部中间神经元的反馈抑制),再加上实现对象整合的“格式塔”原理的横向连接,自然会导致视觉输入分解为基于对象的“感知周期”。 代表场景中不同对象的哪些神经元种群将倾向于在局部伽马振荡的连续周期中激发。 我们描述了一个简单的V1模型,其中这样的知觉周期从横向兴奋性连接(将定向细胞沿连续轮廓下降连接)与快速反馈抑制(实现竞争性发射和伽马振动)之间的相互作用中自动出现。 尽管模型极其简单,但是即使面对自然图像,该模型也会自发地产生感知周期。 系统对参数变化和图像复杂性的鲁棒性以及模型中建立的假设的匮乏,支持了自然视觉中发生感知周期的假设。

2009

Why Coevolution Doesn’t “Work”: Superiority and Progress in Coevolution.

2008

Evolution and Complexity: The Double-Edged Sword.
Fitness Transmission: A Genealogic Signature of Adaptive Evolution.
Evosphere: evolutionary dynamics in a population of fighting virtual creatures.
In Silicon No One Can Hear You Scream: Evolving Fighting Creatures.

2006

The N-Strikes-Out algorithm: A steady-state algorithm for coevolution.

An improved system for artificial creatures evolution.

2005

Analysing coevolution among artificial 3D creatures.

A virtual creatures model for studies in artificial evolution.

2003
When evolving populations is better than coevolving individuals: the Blind Mice problem.

2001

A collective genetic algorithm. In L. Spector et al. (Eds.): Proceedings of the Genetic and Evolutionary Computation Conference

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值