Machine Learning introduction_15 machine learning introduction question 1: perce-CSDN博客

本文链接：https://blog.csdn.net/Alan_EE/article/details/88075296

Mathematics
- 最大似然(Maximum Likelihood)&最小二乘(Least Square Method)
basic knowledge
Basic Algorithms

Mathematics

最大似然(Maximum Likelihood)&最小二乘(Least Square Method)

一直觉得"二乘"这个翻译很奇葩,英文名更容易理解.
1. 理解最大似然&最小二乘

最大似然估计和最小二乘法还有一大区别就是，最大似然估计是需要有分布假设的，属于参数统计，如果连分布函数都不知道，又怎么能列出似然函数呢？而最小二乘法则没有这个假设.

2.理解最大似然

根据"事实"推断参数, 叫做"似然-likelihood";
求取最可能的参数, 叫做"最大似然估计".

basic knowledge

subcategories of Machine learning

在这里插入图片描述
Regression / classification / clustering

“supervised” -----“对于输入数据X能预测变量Y”
unsupervised ------ “从数据X中能发现什么”
if the predicted value is continuous (e.g. share price)---- regression.
if the predicted value is discrete (good/bad, male/female) ---- classification
类别没有预先定义 ---- clustering .

bias & variance

1. 理解bias与varance之间的权衡
 2. Understanding the bias -variance tradeoff

Roc & Confusion Matrix

大约10年前在machine learning文献中一统天下的标准：分类精度；在信息检索(IR)领域中常用的recall和precision; 近年来,人们从医疗分析领域引入了一种新的分类模型performance评判方法——ROC分析。

机器学习分类问题的评价指标——Confusion Matrix、ROC、AUC的概念以及理解。

混淆矩阵（confusion matrix）是一种评价分类模型好坏的形象化展示工具
混淆矩阵比模型的精度的评价指标更能够详细地反映出模型的”好坏”。模型的精度指标，在正负样本数量不均衡的情况下，会出现容易误导的结果。

ROC曲线与AUC值

对某个分类器而言，我们可以根据其在测试样本上的表现得到一个TruePositiveRating和FalsePositiveRating点对。这样，此分类器就可以映射成ROC平面上的一个点。调整这个分类器分类时候使用的阈值，我们就可以得到一个经过(0, 0)，(1, 1)的曲线，这就是此分类器的ROC曲线.
Area Under Curve, 越大->performance better.

Discriminative model & generative model

1.生成模型和判别模型
 2.通俗但是不知准不准

Basic Algorithms

Binary Decision Trees

“猜猜看” 游戏 ---- 蕴含思想.
Decision Node：判定节点，该节点的数据会继续根据数据属性继续进行判定。
Branch：从Decision Node迭代生成的子树是子树根节点的一个属性判断
End Node：也称为Leaf Node，该节点实际上是做出决定的节点，对于样本属性的判断到Leaf Node结束。

1. Binary Decision From Wikipedia

A binary decision is a choice between two alternatives.
2. decision tree learning

Decision tree learning uses a decision tree (as a predictive model) to go from observations about an item (represented in the branches) to conclusions about the item’s target value (represented in the leaves).

Decision trees includes classification and regression trees:

Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels.
Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees.

Boosting

As implemented In Opencv, boosting is a two-class classifier, unlike random tree
1. Boosting Algorithm 详解之Adaboost

从弱学习器出发，反复训练，得到一系列弱学习器，然后组合这些弱学习器，构成一个强学习器的算法。
大多数boost方法会改变数据的概率分布（改变数据权值），具体而言就是提高前一轮训练中被错分类的数据的权值，降低正确分类数据的权值，使得被错误分类的数据在下轮的训练中更受关注；然后根据不同分布调用弱学习算法得到一系列弱学习器实现的，再将这些学习器线性组合，具体组合方法是误差率小的学习器会被增大权值，误差率大的学习器会被减小权值.

Adaboost

1. 基于AdaBoost的分类问题

当做重要决定时，大家可能都会考虑吸取多个专家而不只是一个人的意见。
这一生活问题映射到计算机世界就变成了**元算法（meta-algorithm）**或者集成方法（ensemble method）。这种集成可以是不同算法的集成，也可以是同一算法在不同设置下的集成，还可以是数据集不同部分分配给不同分类器之后的集成。AdaBoost就是一种最流行的元算法。

2.Adaboost 算法流程

Adaptive Boost 的自适应在于：前一个弱分类器分错的样本的权值（样本对应的权值）会得到加强，权值更新后的样本再次被用来训练下一个新的弱分类器。在每轮训练中，用总体（样本总体）训练新的弱分类器，产生新的样本权值、该弱分类器的话语权，一直迭代直到达到预定的错误率或达到指定的最大迭代次数。

Boosting algorithms are used to train ${N}_{w}$ weak classifiers ${h}_{w}$ . These classifiers are generally very simple individually. In most cases these classifiers are decision trees with only one split (called decision stumps) or at most a few levels of splits(perhaps up to three).

Random Trees & Random Forests

随机森林，指的是利用多棵树对样本进行训练并预测的一种分类器。简单来说，随机森林就是由多棵CART（Classification And Regression Tree）构成的。对于每棵树，它们使用的训练集是从总的训练集中有放回采样出来的.

Expectation-Maximization Algorithm

1. wikipedia:

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find maximum likelihood(最大似然) or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.

已知某个随机样本满足某种概率分布，但是其中具体的参数不是很清楚，参数估计就是通过若干次的实验，观察每一次的结果，利用得到的结果去分析、推测出参数的大概的值。

K-Nearest Neighbor

1. KNN 不错的简介

KNN最邻近规则，主要应用领域是对未知事物的识别，即推断未知事物属于哪一类，推断思想是，基于欧几里得定理，推断未知事物的特征和哪一类已知事物的的特征最接近
该方法的思路是：假设一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别，则该样本也属于这个类别。
KNN算法中，所选择的邻居都是已经正确分类的对象

Multi-layer Perceptron

wikipedia

A multilayer perceptron (MLP) is a class of feedforward artificial neural network. A MLP consists of, at least, three layers of nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training Its multiple layers and non-linear activation distinguish MLP from a linear perceptron.