机器学习基本概念-1-CSDN博客

本文链接：https://blog.csdn.net/Gavin__Zhou/article/details/52337714

Learning algorithm

ML中的算法无疑都是学习型的算法，那么什么才是学习型算法(learning algorithm)呢？
机器学习大牛Bengio给出的解释是:

A machine learning algorithm is an algorithm that is able to learn from data.

这里的learn，Mitchell(1997)给出的定义是:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks in T , as measured by P , improves with experience(实践) E.

由此我们可以看出：

学习型的算法需要有能力从给定的数据中学习出能够有效地代表此数据的特征(feature).

所以一个ML system的基本构成就是:

A learning algorithm
Tasks
Performance measure
Experience
Data

Task

ML出现的基本需求就是：
需要解决的任务过难，以至于无法使用一个固定的程序来解决它

Machine learning allows us to tackle tasks that are too difficult to solve with fixed programs written and designed by human beings.

那么什么是ML中的Task呢？
首先理解什么是我们在ML中常说的特征也就是feature，通俗点来说大概就是：

特征就是从some object或者event中抽取出来的可以定量表示和衡量的数学表达.

通常使用矩阵的形式来进行表达
再来说Task，Bengio给出的解释是:

Machine learning tasks are usually described in terms of how the machine learning system should process an example. An example is a collection of features that have been quantitatively measured from some object or event that we want the machine learning system to process.

说的有点抽象，实际就是我们需要解决什么问题，比如把我们的图片进行分类或者给定数据进行聚类之类的，这就是ML中的Task.
常见的比如:

Classification
Regression
Transcription
Machine translation
Semantic Segemention
Object Detection
Denoising
………….

非常多，就不一列举了

Performance Measure

对于不同的learning algorithm，其ability不同，所以我们需要有个能够量化的衡量措施来检验之.

比如对于常见的classification来说，我们衡量某个算法的好坏的标准就是分类的准确率或者错误率.
ML中我们更加关心的是model的泛化能力(generalization),也就是对于未见过的example的能力.

we care more about the performance of the model on new, previously unseen examples

但在一个具体的ML的task中，有时会存在两种困难:

difficult to choose a performance measure that corresponds well to the desired behavior of the system.
we know what quantity we would ideally like to measure, but measuring it is impractical.

所以在这种困难的情况下，我们通常采用的做法是：

design an alternative criterion
design a good approximation

Experience

ML的学习型算法广义上分为两类:

supervised
unsupervised

两者之间的界限是模糊的，大部分的学习型算法需要在某个数据集(dataset)上进行experience(实践).
那什么又是dataset呢？

A dataset is a collection of many examples.

dataset就是example的集合，比如像数字集合(0-9)的mnist数据集和多用途的VOC数据集等等，在计算中通常dataset会被表示为一个大的矩阵.

unsupervised的算法和supervised的算法在不同的dataset上进行experience:

Unsupervised learning algorithms experience a dataset containing many features, then learn useful properties of the structure of this dataset.
Supervised learning algorithms experience a dataset containing features, but each example is also associated with a label or target.

就写到这吧，下篇继续