最近阅读注意力池化的论文《Attentional Pooling for Action Recognition》,其中提到了关于池化的自上而下与自下而上,网上资料较少,记录在此。
自上而下(top-down)与自下而上(bottom-up)的注意力,这一名词的提出可搜到的来源最早提出于神经学论文《Bottom-up and top-down attention: different processes and overlapping neural systems》,定义为
Attention can be categorized into two distinct functions: bottom-up attention, referring to attentional guidance purely by externally driven factors to stimuli that are salient because of their inherent properties relative to the background; and top-down attention, referring to internal guidance of attention based on prior knowledge, willful plans, and current goals.
也就是说,自上而下(top-down)的注意力是内源因素驱动的,而自下而上(bottom-up)的注意力是由外源因素驱动的。
CVPR2018关于视觉问答的论文《Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering》,将top-down与bottom-up 注意力进行了综合,投出了一种视觉问答的新模型,其中对这两种注意力也有着相呼应的描述:
In the human visual system, attention can be focused volitionally by top-down signals determined by the current task (e.g., looking for something), and automatically by bottom-up signals associated with unexpected, novel or salient stimuli [3, 6].
总的来说,top-down这种自上而下的注意力是由系统本身主动驱动的,这种驱动力可能来自系统自发,也有可能是来自其他任务刺激下产生的高阶信号;而bottom-down这种自下而上的注意力是由外界驱动,驱动力来自外界事物的刺激下产生的被动触发。
具体到注意力池化,作者通过矩阵的低秩近似将二阶池化作分解:
score_{attention}(X) = (Xw_{ak})^T (Xw_b)
其中,第一项与类别相关,可以看作是top-down attention;第二项与类别无关,可以看作是bottom-up attention。这样一来,二阶池化就被分解为两种注意力的乘积(交集)。从实验结果来看,激活图和理论与预期都是一致的。