模式识别技术漫谈（4）

最新推荐文章于 2020-01-01 13:57:55 发布

dznlong

最新推荐文章于 2020-01-01 13:57:55 发布

阅读量4.9k

点赞数 1

分类专栏：模式识别文章标签： classification algorithm 算法 training distance each

本文链接：https://blog.csdn.net/dznlong/article/details/5531513

版权

模式识别专栏收录该内容

15 篇文章 0 订阅

订阅专栏

模式识别技术漫谈（4）

------------关于机器学习

一提到机器学习，首先大家会想到的一定是神经网络，其实机器学习方法很多，这里借用“Learning OpenCV”（Gary Bradski and Adrian Kaehler）中第十三章的关于机器学习算法总结的那个表来说明一下机器学习有哪些算法：

ML Algorithm

Comment

Mahalanobis

A distance measure that accounts for the “stretchiness” of the data space by dividing out the covariance of the data. If the covariance is the identity matrix (identical variance), then this measure is identical to the Euclidean distance measure.

K-means

An unsupervised clustering algorithm that represents a distribution of data using K centers, where K is chosen by the user. The difference between this algorithm and expectation maximization is that here the centers are not Gaussian and the resulting clusters look more like soap bubbles, since centers (in effect) compete to “own” the closest data points. These cluster regions are often used as sparse histogram bins to represent the data. Invented by Steinhaus [Steinhaus56], as used by Lloyd.

Normal/Naїve Bayes classifier

A generative classifier in which features are assumed to be Gaussian distributed and statistically independent from each other, a strong assumption that is generally not true. For this reason, it’s often called a “naїve Bayes” classifier. However, this method often works surprisingly well. Original mention.

Decision trees

A discriminative classifier. The tree finds one data feature and a threshold at the current node that best divides the data into separate classes. The data is split and we recursively repeat the procedure down the left and right branches of the tree. Though not often the top performer, it’s often the first thing you should try because it is fast and has high functionality.

Boosting

A discriminative group of classifiers. The overall classification decision is made from the combined weighted classification decisions of the group of classifiers. In training, we learn the group of classifiers one at a time. Each classifier in the group is a “weak”classifier (only just above chance performance). These weak classifiers are typically composed of single-variable decision trees called “stumps”. In training, the decision stump learns its classification decisions from the data and also learns a weight for its “vote” from its accuracy on the data. Between training each classifier one by one, the data points are re-weighted so that more attention is paid to data points where errors were made. This process continues until the total error over the data set, arising from the combined weighted vote of the decision trees, falls below a set threshold. This algorithm is often effective when a large amount of training data is available.

Random trees

A discriminative forest of many decision trees, each built down to a large or maximal splitting depth. During learning, each node of each tree is allowed to choose splitting variables only from a random subset of the data features. This helps ensure that each tree becomes a statistically independent decision maker. In run mode, each tree gets an unweighted vote. This algorithm is often very effective and can also perform regression by averaging the output numbers from each tree.

Face detector /

Haar classifier

An object detection application based on a clever use of boosting. The OpenCV distribution comes with a trained frontal face detector that works remarkably well. You may train the algorithm on other objects with the software provided. It works well for rigid objects and characteristic views.

Expectation maximization (EM)

A generative unsupervised algorithm that is used for clustering. It will fit N multidimensional Gaussians to the data, where N is chosen by the user. This can be an effective way to represent a more complex distribution with only a few parameters (means and variances). Often used in segmentation. Compare with K-means listed previously.

K-nearest neighbors

The simplest possible discriminative classifier. Training data are simply stored with labels. Thereafter, a test data point is classified according to the majority vote of its K nearest other data points (in a Euclidean sense of nearness). This is probably the simplest thing you can do. It is often effective but it is slow and requires lots of memory.

Neural networks /

Multilayer perceptron (MLP)

A discriminative algorithm that (almost always) has “hidden units” between output and input nodes to better represent the input signal. It can be slow to train but is very fast to run. Still the top performer for things like letter recognition.

Support vector machine (SVM)

A discriminative classifier that can also do regression. A distance function between any two data points in a higher-dimensional space is defined. (Projecting data into higher dimensions makes the data more likely to be linearly separable.) The algorithm learns separating hyperplanes that maximally separate the classes in the higher dimension. It tends to be among the best with limited data, losing out to boosting or random trees only when large data sets are available.

对于这么多的算法，如果想很好地学习它，运用OpenCV开源程序确实是一个很不错的选择，当然，在学习某个算法之前，还是得应该认真研究一下相关理论知识，否则不一定能够看懂其实现代码。

这里有个“没有免费的午餐”的理论，提出没有最好的算法，每种算法总有它的优势和缺陷，即让我们不要太迷信某一个算法的绝对优势，在采用某一个算法时要明白“得”与“失”，所以单一的算法总不能满足实际需要，往往需要采用多个算法来提高识别性能。还有一个“剃刀原理”，就是尽量不要把问题复杂化，要尽力把没用的、会引起问题复杂化的因素剔除掉，对于识别算法也一样，并不是越复杂的算法越有用，有时简单的算法也能够达到较好的性能。

机器学习中大部分的算法是为了寻找（或者拟合）一个分隔曲线，如果是线性可分，则这条分隔线即是一条直线（如果是高维空间中则是一个超平面），如果是非线性可分，则这要分隔线是一条曲线（如果是高维空间中则是一个超曲面），如果理解了这一点，你可能就会很好理解神经网络中为什么每个结点采用函数大都是正余弦函数：通过傅立叶变换原理我们可以知道用正余弦函数来拟合一条曲线是最好的选择。

对于模式识别算法的研发，我个人觉得创新思维和扎实的技术积累最为重要，对于某个识别技术的开发，也许会有很多的识别方法让你选择，也许会没有一种方法可以适合你，这时就需要你创新地应用某个识别方法，甚至由此产生新的独特的识别方法，有了既定的算法方向，剩下的就是长期的技术积累，要把一个好的识别算法应用到能用的产品中，一般都需要有一个长期的完善过程，任何浮躁、短视的心态都无法成功。