【翻译】 《Generalizing from a Few Examples: A Survey on Few-Shot》


Machine learning has been highly successful in data-intensive applications, but is often hampered when the data set is small. Recently, Few-Shot Learning (FSL) is proposed to tackle this problem. Using prior knowledge, FSL can rapidly generalize to new tasks containing only a few samples with supervised information. In this paper, we conduct a thorough survey to fully understand FSL. Starting from a formal definition of FSL, we distinguish FSL from several relevant machine learning problems. We then point out that the core issue in FSL is that the empirical risk minimizer is unreliable. Based on how prior knowledge can be used to handle this core issue, we categorize FSL methods from three perspectives: (i) data, which uses prior knowledge to augment the supervised experience; (ii) model, which uses prior knowledge to reduce the size of the hypothesis space; and (iii) algorithm, which uses prior knowledge to alter the search for the best hypothesis in the given hypothesis space. With this taxonomy, we review and discuss the pros and cons of each category. Promising directions, in the aspects of the FSL problem setups, techniques, applications and theories, are also proposed to provide insights for future research.


“Can machines think?” This is the question raised in Alan Turing’s seminal paper entitled “Computing Machinery and Intelligence” in 1950. He made the statement that “The idea behind digital computers may be explained by saying that these machines are intended to carry out any operations which could be done by a human computer”. In other words, the ultimate goal of machines is to be as intelligent as humans. In recent years, due to the emergence of powerful computing devices (e.g., GPU and distributed platforms), large data sets (e.g., ImageNet data with 1000 classes), advanced models and algorithms (e.g., convolutional neural networks (CNN) and long short-term memory (LSTM)), AI speeds up its pace to be like humans and defeats humans in many fields. To name a few, AlphaGo defeats human champions in the ancient game of Go; and residual network (ResNet) obtains better classification performance than humans on ImageNet. AI also supports the development of intelligent tools in many aspects of daily life, such as voice assistants, search engines, autonomous driving cars, and industrial robots.


Albeit its prosperity, current AI techniques cannot rapidly generalize from a few examples.The aforementioned successful AI applications rely on learning from large-scale data. In contrast, humans are capable of learning new tasks rapidly by utilizing what they learned in the past. For example, a child who learned how to add can rapidly transfer his knowledge to learn multiplication given a few examples (e.g., 2 × 3 = 2 + 2 + 2 and 1 × 3 = 1 + 1 + 1). Another example is that given a few photos of a stranger, a child can easily identify the same person from a large number of photos.


Bridging this gap between AI and humans is an important direction. It can be tackled by machine learning, which is concerned with the question of how to construct computer programs that automatically improve with experience. In order to learn from a limited number of examples with supervised information, a new machine learning paradigm called Few-Shot Learning (FSL) is proposed. A typical example is character generation, in which computer programs are asked to parse and generate new handwritten characters given a few examples. To handle this task, one can decompose the characters into smaller parts transferable across characters, and then aggregate these smaller components into new characters. This is a way of learning like human. Naturally, FSL can also advance robotics, which develops machines that can replicate human actions. Examples include one-shot imitation, multi-armed bandits, visual navigation, and continuous control.

弥合人工智能和人类之间的鸿沟是一个重要的方向。它可以通过机器学习来解决,机器学习关注的是如何构造能够随着经验自动改进的计算机程序的问题。为了从有限的有监督信息的样本中学习,一种被称为 小样本学习 的新的机器学习范式被提出。一个典型的例子是字符生成,在这个例子中,计算机程序被要求解析并生成新的手写字符。为了处理这个任务,人们可以将字符分解成更小的部分,然后将这些更小的部分聚合成新的字符。这是一种像人类一样的学习方式。当然,FSL也可以推进机器人技术的发展,后者开发出可以复制人类动作的机器。例如,看一眼模仿、多臂老虎机、视觉导航和连续控制。

Another classic FSL scenario is where examples with supervised information are hard or impossible to acquire due to privacy, safety or ethic issues. A typical example is drug discovery, which tries to discover properties of new molecules so as to identify useful ones as new drugs.Due to possible toxicity, low activity, and low solubility, new molecules do not have many real biological records on clinical candidates. Hence, it is important to learn effectively from a small number of samples. Similar examples where the target tasks do not have many examples include FSL translation, and cold-start item recommendation. Through FSL, learning suitable models for these rare cases can become possible.


FSL can also help relieve the burden of collecting large-scale supervised data. For example, although ResNet outperforms humans on ImageNet, each class needs to have sufficient labeled images which can be laborious to collect. FSL can reduce the data gathering effort for data-intensive applications. Examples include image classification, image retrieval, object tracking, gesture recognition, image captioning, visual question answering, video event detection, language modeling, and neural architecture search.


Driven by the academic goal for AI to approach humans and the industrial demand for inexpensive learning, FSL has drawn much recent attention and is now a hot topic. Many related machine learning approaches have been proposed, such as meta-learning, embedding learning and generative modeling. However, currently, there is no work that provides an organized taxonomy to connect these FSL methods, explains why some methods work while others fail, nor discusses the pros and cons of different approaches. Therefore, in this paper,we conduct a survey on the FSL problem. In contrast, the survey in only focuses on concept learning and experience learning for small samples.


Contributions of this survey can be summarized as follows:
• We give a formal definition on FSL, which naturally connects to the classic machine learning definition in[92,94]. The definition is not only general enough to include existing FSL works, but also specific enough to clarify what the goal of FSL is and how we can solve it. This definition is helpful for setting future research targets in the FSL area.
• We list the relevant learning problems for FSL with concrete examples, clarifying their relatedness and differences with respect to FSL. These discussions can help better discriminate and position FSL among various learning problems.
• We point out that the core issue of FSL supervised learning problem is the unreliable empirical risk minimizer, which is analyzed based on error decomposition in machine learning. This provides insights to improve FSL methods in a more organized and systematic way.
• We perform an extensive literature review, and organize them in an unified taxonomy from the perspectives of data, model and algorithm. We also present a summary of insights and a discussion on the pros and cons of each category. These can help establish a better understanding of FSL methods.
• We propose promising future directions for FSL in the aspects of problem setup, techniques, applications and theories. These insights are based on the weaknesses of the current development of FSL, with possible improvements to make in the future.


Organization of the Survey

The remainder of this survey is organized as follows. Section 2 provides an overview for FSL, including its formal definition, relevant learning problems, core issue, and a taxonomy of existing works in terms of data, model and algorithm. Section 3 is for methods that augment data to solve FSL problem. Section 4 is for methods that reduce the size of hypothesis space so as to make FSL feasible. Section 5 is for methods that alter the search strategy of algorithm to deal with the FSL problem. In Section 6, we propose future directions for FSL in terms of problem setup, techniques, applications and theories. Finally, the survey closes with conclusion in Section 7.


Notation and Terminology

Consider a learning task T , FSL deals with a data set D = {D_{train},D_{test}} consisting of a training set D_{train} = {(x_{i},y_{i})}^I _i=1 where I is small, and a testing set D_{test}= {xtest}. Let p(x,y) be the ground-truth joint probability distribution of input x and output y, and hˆ be the optimal hypothesis from x to y. FSL learns to discover hˆ by fitting Dtrain and testing on Dtest. To approximate hˆ, the FSL model determines a hypothesis space H of hypotheses h(·; θ)’s, where θ denotes all the parameters used by h. Here, a parametric h is used, as a nonparametric model often requires large data sets, and thus not suitable for FSL. A FSL algorithm is an optimization strategy that searches H in order to find the θ that parameterizes the best h∗ ∈ H. The FSL performance is measured by a loss function ℓ(yˆ,y) defined over the prediction yˆ = h(x; θ) and the observed output y.



In this section, we first provide a formal definition of the FSL problem in Section 2.1 with concrete examples. To differentiate the FSL problem from relevant machine learning problems, we discuss their relatedness and differences in Section 2.2. In Section 2.3, we discuss the core issue that makes FSL difficult. Section 2.4 then presents a unified taxonomy according to how existing works handle the core issue.


Problem Definition

As FSL is a sub-area in machine learning, before giving the definition of FSL, let us recall how machine learning is defined in the literature.

Definition 2.1 (Machine Learning [92, 94]). A computer program is said to learn from experience E with respect to some classes of task T and performance measure P if its performance can improve with E on T measured by P.



For example, consider an image classification task (T), a machine learning program can improve its classification accuracy (P) through E obtained by training on a large number of labeled images (e.g., the ImageNet data set ). Another example is the recent computer program AlphaGo, which has defeated the human champion in playing the ancient game of Go (T). It improves its winning rate (P) against opponents by training on a database (E) of around 30 million recorded moves of human experts as well as playing against itself repeatedly. These are summarized in Table 1.



表1 基于定义2.1的机器学习问题示例


Typical machine learning applications, as in the examples mentioned above, require a lot of examples with supervised information. However, as mentioned in the introduction, this may be difficult or even not possible. FSL is a special case of machine learning, which targets at obtaining good learning  performance given limited supervised information provided in the training set D_{train},which consists of examples of inputs x_{i} ’s along with their corresponding output y_{i} ’s. Formally,we define FSL in Definition 2.2.

Definition 2.2. Few-Shot Learning (FSL) is a type of machine learning problems (specified by E,T and P), where E contains only a limited number of examples with supervised information for the target T .

定义2.2 小样本学习(FSL)是一类机器学习问题(由E,T和P指定),其中E只包含有限数量的示例,其中目标T具有监督信息。









