Advice for applying machine learning - Learning curves_advice for applying machine learn ing-CSDN博客

本文链接：https://blog.csdn.net/edward_wang1/article/details/107433828

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程，第十一章《应用机器学习的建议》中第89课时《学习曲线》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正，使其更加简洁，方便阅读，以便日后查阅使用。现分享给大家。如有错误，欢迎大家批评指正，在此表示诚挚地感谢！同时希望对大家的学习能有所帮助.
————————————————

In this video, I'd like to tell you about learning curves. Learning curves is often a useful thing to plot. If either you want to sanity check that your algorithm is working correctly, or if you want to improve the performance of the algorithm. And learning curves is a tool that I actually use very often to try to diagnose if a physical learning algorithm is suffering from bias, variance problem or a it of both.

Here's what a learning curve is. To plot a learning curve, what I usually do is plot $J_{train}$ which is average squared error on my training set or $J_{cv}$ which is average squared error on my cross validation set. And I'm going to plot that as a function of , that is a function of the number of training examples I have. And so is usually a constant like maybe I just have 100 training examples. But what I am going to do is artificially reduce my traing set size. So, I deliberately limit myself to using only, say, 10 20 30 or 40 training examples. and plot what the training error is and what the cross validation error is for these smaller training set sizes. So let's see what these plots may look like. Suppose I have only one training example like that shown in this first example here. And let's say I'm fitting a quadratic function. Well, if I have only one training example, I'm going to be able to fit it perfectly. I'm going to have 0 error on the one training example. If I have two training examples. Well the quadratic function can also fit that very well. So, even if I am using regularization, I can probably fit this quite well. And if I am using no regularization, I'm going to fit this perfectly. And if I have three training examples, again I can fit a quadratic function perfectly. So, if m=1, m=2 or m=3 , my training error on my training set is going to be 0, assuming I'm not using regularization. Or it may slightly larger than