1. Introduction (about machine learning)
2. Concept Learning and the General-to-Specific Ordering
3. Decision Tree Learning
4. Artificial Neural Networks
5. Evaluating Hypotheses
6. Bayesian Learning
7. Computational Learning Theory
8. Instance-Based Learning
9. Genetic Algorithms
10. Learning Sets of Rules
11. Analytical Learning
12. Combining Inductive and Analytical Learning
13. Reinforcement Learning
7. Computational Learning Theory
This theory seeks to answer questions such as "Under what conditions is successful learning possible and impossible?" and "Under what conditions is a particular learning algorithm assured of learning successfully?' Two specific frameworks for analyzing learning algorithms are considered. Within the probably approximately correct (PAC) framework(可能近似正确框架), we identify classes of hypotheses that can and cannot be learned from a polynomial number(多项式数量) of training examples and we define a natural measure of complexity for hypothesis spaces that allows bounding the number of training examples required for inductive learning. Within the mistake bound framework(出错界限框架), we examine the number of training errors that will be made by a learner before it determines the correct hypothesis.
7.1 INTRODUCTION
Our goal is to answer questions such as:
Sample complexity. How many training examples are needed for a learner to converge (with high probability) to a successful hypothesis?
Computational complexity. How much computational effort is needed for a learner to converge (with high probability) to a successful hypothesis?
Mistake bound. How many training examples will the learner misclassify before converging to a successful hypothesis?
As we might expect, the answers to the above questions depend on the particular setting, or learning model, we have in mind.
7.2 PROBABLY LEARNING AN APPROXIMATELY CORRECT HYPOTHESIS(可能学习近似正确假设)
In this section we consider a particular setting for the learning problem, called the probably approximately correct (PAC) learning model(可能近似正确学习模型). We begin by specifying the problem setting that defines the PAC learning model, then consider the questions of how many training examples and how much computation are required in order to learn various classes of target functions within this PAC model.
For the sake of simplicity, we restrict the discussion to the case of learning boolean-valued concepts from noise-free training data. However, many of the results can be extended to the more general scenario of learning real-valued target functions (see, for example, Natarajan 1991), and some can be extended to learning from certain types of noisy data (see, for example, Laird 1988; Kearns and Vazirani 1994).
7.2.1 The Problem Setting
X refer to the set of all possible instances over which target functions may be defined.
C refer to some set of target concepts that our learner might be called upon to learn. Each target concept c in C corresponds to some subset of X, or equivalently to some boolean-valued function c : X -> {0, 1}.
We assume instances are generated at random from X according to some probability distribution D. In general, D may be any stationary distribution( not change over time), and it will not generally be known to the learner.
Training examples are generated by drawing an instance x at random according to D, then presenting x along with its target value, c(x), to the learner.
The learner L considers some set H of possible hypotheses when attempting to learn the target concept. For example, H might be the set of all hypotheses describable by conjunctions of the attributes age and height.
After observing a sequence of training examples of the target concept c, L must output some hypothesis h from H, which is its estimate of c. To be fair, we evaluate the success of L by the performance of h over new instances drawn randomly from X according to D, the same probability distribution used to generate the training data.
7.2.2 Error of a Hypothesis