关闭

Machine learning in 10 pictures

标签: 机器学习
296人阅读 评论(0) 收藏 举报
分类:

Machine learning in 10 pictures

I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating.

1. Test and training error: Why lower training error is not always a good thing: ESL Figure 2.11. Test and training error as a function of model complexity.欠拟合与过拟合,模型越复杂,训练例子的误差越低,测试的误差先下降后上升   high variance(高方差) low bias(低偏差),【所以倾向选择中间模型的复杂度】

2. Under and overfitting: PRML Figure 1.4. Plots of polynomials having various orders M, shown as red curves, fitted to the data set generated by the green curve(曲线).【欠拟合与过拟合

3. Occam's razor: ITILA Figure 28.3. Why Bayesian inference embodies Occam’s razor. This figure gives the basic intuition for why complex models can turn out to be less probable. The horizontal axis represents the space of possible data sets D. Bayes’ theorem rewards models in proportion to how much they predicted the data that occurred. These predictions are quantified by a normalized probability distribution on D. This probability of the data given model Hi, P (D | Hi), is called the evidence for Hi. A simple model H1 makes only a limited range of predictions, shown by P(D|H1); a more powerful model H2, that has, for example, more free parameters than H1, is able to predict a greater variety of data sets. This means, however, that H2 does not predict the data sets in region C1 as strongly as H1. Suppose that equal prior probabilities have been assigned to the two models. Then, if the data set falls in region C1, the less powerful model H1 will be the more probable model.【奥卡姆剃刀定律(Occam's Razor, Ockham'sRazor)又称“奥康的剃刀”,是由14世纪逻辑学家、圣方济各会修士奥卡姆的威廉(William ofOccam,约1285年至1349年)提出。这个原理称为“如无必要,勿增实体”,即“简单有效原理”。正如他在《箴言书注》2卷15题说“切勿浪费较多东西去做,用较少的东西,同样可以做好的事情。

4. Feature combinations: (1) Why collectively relevant features may look individually irrelevant, and also (2) Why linear methods may fail. From Isabelle Guyon's feature extraction slides.

5. Irrelevant features: Why irrelevant features hurt kNN, clustering, and other similarity based methods. The figure on the left shows two classes well separated on the vertical axis. The figure on the right adds an irrelevant horizontal axis which destroys the grouping and makes many points nearest neighbors of the opposite class.

6. Basis functions: How non-linear basis functions turn a low dimensional classification problem without a linear boundary into a high dimensional problem with a linear boundary. From SVM tutorial slides by Andrew Moore: a one dimensional non-linear classification problem with input x is turned into a 2-D problem z=(x, x^2) that is linearly separable.

7. Discriminative vs. Generative: Why discriminative learning may be easier than generative: PRML Figure 1.27. Example of the class-conditional densities for two classes having a single input variable x (left plot) together with the corresponding posterior probabilities (right plot). Note that the left-hand mode of the class-conditional density p(x|C1), shown in blue on the left plot, has no effect on the posterior probabilities. The vertical green line in the right plot shows the decision boundary in x that gives the minimum misclassification rate.

8. Loss functions: Learning algorithms can be viewed as optimizing different loss functions: PRML Figure 7.5. Plot of the ‘hinge’ error function used in support vector machines, shown in blue, along with the error function for logistic regression, rescaled by a factor of 1/ln(2) so that it passes through the point (0, 1), shown in red. Also shown are the misclassification error in black and the squared error in green.

9. Geometry of least squares: ESL Figure 3.2. The N-dimensional geometry of least squares regression with two predictors. The outcome vector y is orthogonally projected onto the hyperplane spanned by the input vectors x1 and x2. The projection yˆ represents the vector of the least squares predictions.

10. Sparsity: Why Lasso (L1 regularization or Laplacian prior) gives sparse solutions (i.e. weight vectors with more zeros): ESL Figure 3.11. Estimation picture for the lasso (left) and ridge regression (right). Shown are contours of the error and constraint functions. The solid blue areas are the constraint regions |β1| + |β2| ≤ t and β12 + β22 ≤ t2, respectively, while the red ellipses are the contours of the least squares error function. 
0
0
查看评论

Machine learning in 10 pictures

Machine learning in 10 pictures I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list...
  • chuminnan2010
  • chuminnan2010
  • 2014-03-21 09:51
  • 1234

用10张图来看机器学习Machine learning in 10 pictures

I find myself coming back to the same few pictures when explaining basic machine learning concepts. Below is a list I find most illuminating.   ...
  • GarfieldEr007
  • GarfieldEr007
  • 2016-03-29 12:21
  • 1459

【机器学习Machine Learning】资料大全

昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^)   推荐几本好书: 1.Pattern Recognition and Machine Learning (by Hastie, Tibshirani, and F...
  • u011596455
  • u011596455
  • 2016-11-14 13:02
  • 2121

Machine Learning 课程学习笔记

在Mac OS 环境中安装Octave 首先安装XCode 一般MacOS中已经安装这个开发工具,如果没有或者版本太低的话,需要在App Store中更新一下。紧接着在Preferance中按转Command Tool Line(CTL) 然后安装Homebrew,参照https://githu...
  • Ididcan
  • Ididcan
  • 2013-05-01 17:20
  • 2732

Week 10:Large Scale Machine Learning课后习题解答

大家好,我是Mac Jiang,今天和大家分享Coursera-Stanford University-Machine Learning-Week 10:Large Scale Machine Learning的课后习题解答。虽然我的答案通过了系统测试,但是我的分析不一定是正确的,如果各位博友发现错...
  • a1015553840
  • a1015553840
  • 2016-03-19 10:01
  • 3275

Stanford 机器学习笔记 Week10 Large Scale Machine Learning

Gradient Descent with Large DatasetsLearning With Large Datasets在处理海量数据时对算法会有更高的要求。比如在计算偏导数时,当m很大时对m个元素求和的开销会很大。因此在将算法应用于海量数据时最好先确定算法没有high-bias,方法就是绘...
  • Baoli1008
  • Baoli1008
  • 2016-03-30 14:46
  • 775

Software and Language about Machine Learning

经典的开源机器学习软件: 编程语言:搞实验个人认为当然matlab最灵活了(但是正版很贵),但是更为前途的是python(numpy+scipy+matplotlib)和C/C++,这样组合既可搞研究,也可搞商业开发,易用性不比matlab差,功能组合更为强大,个人认为,当然R和java也不错. 1...
  • shenjiangrong
  • shenjiangrong
  • 2015-04-20 13:18
  • 315

Machine Learning in Action_CH2_1_kNN

from numpy import * import operator def createDataBase(): group = array([[1.0, 1.1], [1.0, 1.0], [0, 0], [0, 0.1]]) labels = ['A', &#...
  • qq_33765907
  • qq_33765907
  • 2017-04-19 21:30
  • 139

Machine Learning - XVII. Large Scale Machine Learning大规模机器学习 (Week 10)

机器学习Machine Learning - Andrew NG courses学习笔记 Large Scale Machine Learning大规模机器学习 Learning With Large Datasets大数据集学习 Stochastic Gradient Descent随机梯度下...
  • pipisorry
  • pipisorry
  • 2015-04-06 19:32
  • 6799

Stanford Machine Learning 公开课笔记(1) Linear Regression

【NOTES】 regression: to predict the continues valued output. classification: to predict the discrete valued output. 如何用ML algorithm处...
  • feliciafay
  • feliciafay
  • 2014-04-05 15:12
  • 4838
    个人资料
    • 访问:134048次
    • 积分:1806
    • 等级:
    • 排名:千里之外
    • 原创:17篇
    • 转载:184篇
    • 译文:0篇
    • 评论:15条
    最新评论