《Learning Visual Knowledge Memory Networks for Visual Question Answering》论文笔记

目录

 

abstract

introduction


abstract

由于在VQA中无法在visual content中直接或者清楚地回答问题,需要在结构化的知识中推理。

本文提出了visual knowledge memory network(VKMN),可以使用一个 end-to-end的学习框架,无缝连接结构化人类知识和深度视觉特征,将它们放到memory networks。

与其他方法(利用external knowledge来支撑VQA),本文强调了两种缺失的机制:1.将visual content和knowledge fact结合。VKMN对知识三元组(subject、relation、target)和visual features同时编码,作为visual knowledge features。  2.处理扩展自问答对儿的一些knowledge facts。VKMN使用key-value pair structure来存储 joint embedding,从而可以处理多个facts。

 

introduction

VQA的3个question objectives:1.apparent objective:答案在query image中的答案可以直接从recognition results(objects、attributes、captions ...)中获得。2.indiscernible objective:答案目标通常在query image中太小或者不清晰,因此需要为正确的answers提供supporting facts。 3.invisible objective:需要对常识进行推理、topic-specific 甚至 百科全书式的知识  (关于image中的内容)。

受memory networks based text QA方法的影响,本文提出VKMN来使用pre-built visual knowlege base 进行精确推理。

VKMN在问题中提取多个related knowledge facts,同时将knowledge facts和visual attentive features进行编码,将它们编码到key-value对中。

主要贡献:1.VKMN解决了目前基于知识方法的inaccuracy limitation。 2.建立了一个visual-question specific knowledge base,不像Freebase那样包含不相关的knowledge实体作为一般知识。3.accuracy比之前的高 VQA v1.0 和VQA 2.0

Emergence of Data Science placed knowledge discovery, machine learning, and data mining in multidimensional data, into the forefront of a wide range of current research, and application activities in computer science, and many domains far beyond it. Discovering patterns, in multidimensional data, using a combination of visual and analytical machine learning means are an attractive visual analytics opportu- nity. It allows the injection of the unique human perceptual and cognitive abilities, directly into the process of discovering multidimensional patterns. While this opportunity exists, the long-standing problem is that we cannot see the n-D data with a naked eye. Our cognitive and perceptual abilities are perfected only in the 3-D physical world. We need enhanced visualization tools (“n-D glasses”) to represent the n-D data in 2-D completely, without loss of information, which is important for knowledge discovery. While multiple visualization methods for the n-D data have been developed and successfully used for many tasks, many of them are non-reversible and lossy. Such methods do not represent the n-D data fully and do not allow the restoration of the n-D data completely from their 2-D represen- tation. Respectively, our abilities to discover the n-D data patterns, from such incomplete 2-D representations, are limited and potentially erroneous. The number of available approaches, to overcome these limitations, is quite limited itself. The Parallel Coordinates and the Radial/Star Coordinates, today, are the most powerful reversible and lossless n-D data visualization methods, while suffer from occlusion. There is a need to extend the class of reversible and lossless n-D data visual representations, for the knowledge discovery in the n-D data. A new class of such representations, called the General Line Coordinate (GLC) and several of their specifications, are the focus of this book. This book describes the GLCs, and their advantages, which include analyzing the data of the Challenger disaster, World hunger, semantic shift in humorous texts, image processing, medical computer-aided diag- nostics, stock market, and the currency exchange rate predictions. Reversible methods for visualizing the n-D data have the advantages as cognitive enhancers, of the human cognitive abilities, to discover the n-D data patterns. This book reviews the state of the vii viii Preface art in this area, outlines the challenges, and describes the solutions in the framework of the General Line Coordinates. This book expands the methods of the visual analytics for the knowledge dis- covery, by presenting the visual and hybrid methods, which combine the analytical machine learning and the visual means. New approaches are explored, from both the theoretical and the experimental viewpoints, using the modeled and real data. The inspiration, for a new large class of coordinates, is twofold. The first one is the marvelous success of the Parallel Coordinates, pioneered by Alfred Inselberg. The second inspiration is the absence of a “silver bullet” visualization, which is perfect for the pattern discovery, in the all possible n-D datasets. Multiple GLCs can serve as a collective “silver bullet.” This multiplicity of GLCs increases the chances that the humans will reveal the hidden n-D patterns in these visualizations. The topic of this book is related to the prospects of both the super-intelligent machines and the super-intelligent humans, which can far surpass the current human intelligence, significantly lifting the human cognitive limitations. This book is about a technical way for reaching some of the aspects of super-intelligence, which are beyond the current human cognitive abilities. It is to overcome the inabilities to analyze a large amount of abstract, numeric, and high-dimensional data; and to find the complex patterns, in these data, with a naked eye, supported by the analytical means of machine learning. The new algorithms are presented for the reversible GLC visual representations of high-dimensional data and knowledge discovery. The advantages of GLCs are shown, both mathematically and using the different datasets. These advantages form a basis, for the future studies, in this super-intelligence area. This book is organized as follows. Chapter 1 presents the goal, motivation, and the approach. Chapter 2 introduces the concept of the General Line Coordinates, which is illustrated with multiple examples. Chapter 3 provides the rigorous mathematical definitions of the GLC concepts along with the mathematical state- ments of their properties. A reader, interested only in the applied aspects of GLC, can skip this chapter. A reader, interested in implementing GLC algorithms, may find Chap. 3 useful for this. Chapter 4 describes the methods of the simplification of visual patterns in GLCs for the better human perception. Chapter 5 presents several GLC case studies, on the real data, which show the GLC capabilities. Chapter 6 presents the results of the experiments on discovering the visual features in the GLCs by multiple participants, with the analysis of the human shape perception capabilities with over hundred dimensions, in these experiments. Chapter 7 presents the linear GLCs combined with machine learning, including hybrid, automatic, interactive, and collaborative versions of linear GLC, with the data classification applications from medicine to finance and image pro- cessing. Chapter 8 demonstrates the hybrid, visual, and analytical knowledge dis- covery and the machine learning approach for the investment strategy with GLCs. Chapter 9 presents a hybrid, visual, and analytical machine learning approach in text mining, for discovering the incongruity in humor modeling. Chapter 10 describes the capabilities of the GLC visual means to enhance evaluation of accuracy and errors of machine learning algorithms. Chapter 11 shows an approach, Preface ix to how the GLC visualization benefits the exploration of the multidimensional Pareto front, in multi-objective optimization tasks. Chapter 12 outlines the vision of a virtual data scientist and the super-intelligence with visual means. Chapter 13 concludes this book with a comparison and the fusion of methods and the dis- cussion of the future research. The final note is on the topics, which are outside of this book. These topics are “goal-free” visualizations that are not related to the specific knowledge discovery tasks of supervised and unsupervised learning, and the Pareto optimization in the n-D data. The author’s Web site of this book is located at http://www.cwu.edu/*borisk/visualKD, where additional information and updates can be found. Ellensburg, USA Boris Kovalerchuk
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值