《Machine Learning（Tom M. Mitchell）》读书笔记——8、第七章

最新推荐文章于 2019-02-17 10:46:05 发布

mmc2015

最新推荐文章于 2019-02-17 10:46:05 发布

阅读量1.8k

点赞数

分类专栏：《MachineLearning，Tom Mitchell》

本文链接：https://blog.csdn.net/mmc2015/article/details/41488433

版权

本文是《Machine Learning》一书中关于PAC（可能近似正确）学习模型的读书笔记，探讨了在何种条件下学习是可能的，学习算法成功的确保条件，以及PAC学习模型中的样本复杂度、错误率和PAC可学习性。重点介绍了PAC学习的概念，包括训练误差、真实误差、样本复杂度以及有限和无限假设空间的学习问题，并举例说明了布尔文字合取的PAC可学习性。

摘要由CSDN通过智能技术生成

1. Introduction (about machine learning)

2. Concept Learning and the General-to-Specific Ordering

3. Decision Tree Learning

4. Artificial Neural Networks

5. Evaluating Hypotheses

6. Bayesian Learning

7. Computational Learning Theory

8. Instance-Based Learning

9. Genetic Algorithms

10. Learning Sets of Rules

11. Analytical Learning

12. Combining Inductive and Analytical Learning

13. Reinforcement Learning

7. Computational Learning Theory

This theory seeks to answer questions such as "Under what conditions is successful learning possible and impossible?" and "Under what conditions is a particular learning algorithm assured of learning successfully?' Two specific frameworks for analyzing learning algorithms are considered. Within the probably approximately correct (PAC) framework(可能近似正确框架), we identify classes of hypotheses that can and cannot be learned from a polynomial number(多项式数量) of training examples and we define a natural measure of complexity for hypothesis spaces that allows bounding the number of training examples required for inductive learning. Within the mistake bound framework(出错界限框架), we examine the number of training errors that will be made by a learner before it determines the correct hypothesis.

7.1 INTRODUCTION

Our goal is to answer questions such as:

Sample complexity. How many training examples are needed for a learner to converge (with high probability) to a successful hypothesis?

Computational complexity. How much computational effort is needed for a learner to converge (with high probability) to a successful hypothesis?

Mistake bound. How many training examples will the learner misclassify before converging to a successful hypothesis?

As we might expect, the answers to the above questions depend on the particular setting, or learning model, we have in mind.

7.2 PROBABLY LEARNING AN APPROXIMATELY CORRECT HYPOTHESIS(可能学习近似正确假设)

In this section we consider a particular setting for the learning problem, called the probably approximately correct (PAC) learning model(可能近似正确学习模型). We begin by specifying the problem setting that defines the PAC learning model, then consider the questions of how many training examples and how much computation are required in order to learn various classes of target functions within this PAC model.

For the sake of simplicity, we restrict the discussion to the case of learning boolean-valued concepts from noise-free training data. However, many of the results can be extended to the more general scenario of learning real-valued target functions (see, for example, Natarajan 1991), and some can be extended to learning from certain types of noisy data (see, for example, Laird 1988; Kearns and Vazirani 1994).

7.2.1 The Problem Setting

X refer to the set of all possible instances over which target functions may be defined.

C refer to some set of target concepts that our learner might be called upon to learn. Each target concept c in C corresponds to some subset of X, or equivalently to some boolean-valued function c : X -> {0, 1}.

We assume instances are generated at random from X according to some probability distribution D. In general, D may be any stationary distribution( not change over time), and it will not generally be known to the learner.

Training examples are generated by drawing an instance x at random according to D, then presenting x along with its target value, c(x), to the learner.

The learner L considers some set H of possible hypotheses when attempting to learn the target concept. For example, H might be the set of all hypotheses describable by conjunctions of the attributes age and height.

After observing a sequence of training examples of the target concept c, L must output some hypothesis h from H, which is its estimate of c. To be fair, we evaluate the success of L by the performance of h over new instances drawn randomly from X according to D, the same probability distribution used to generate the training data.

7.2.2 Error of a Hypothesis