Homework_Week6_Coursera【Machine Learning】AndrewNg、Part1.Advice for Applying Machine Learning
第一题
题目
1 You train a learning algorithm, and find that it has unacceptably high error on the test set. You plot the learning curve, and obtain the figure below. Is the algorithm suffering from high bias, high variance, or neither?
答案:
C
解析:
看这个图很明显了,数据再多,也没用、这就是高方差
第二题
题目
2 Suppose you have implemented regularized logistic regression
to classify what object is in an image (i.e., to do object
recognition). However, when you test your hypothesis on a new
set of images, you find that it makes unacceptably large
errors with its predictions on the new images. However, your
hypothesis performs well (has low error) on the
training set. Which of the following are promising steps to
take? Check all that apply.
答案
AD
分析
很明显是属于在训练集表现很好
但是在新的测试集表现十分糟糕。
可以确定是属于过拟合问题。应该是属于high variance
A选项–少一些特征–>完全正确、可以减轻high variance的症状
B尝试增加多项式特征->已经过拟合了、还要雪上加霜,错误
C说用少点的训练集、错误、需要多一些训练集才能在测试集上表现好、已经在测试集表现得很差了。用更少的样本训练会更糟糕
D 获得更多的训练样本,完全正确,理由如上
第三题
题目第 3 个问题
Suppose you have implemented regularized logistic regression
to predict what items customers will purchase on a web
shopping site. However, when you test your hypothesis on a new
set of customers, you find that it makes unacceptably large
errors in its predictions. Furthermore, the hypothesis
performs poorly on the training set. Which of the
following might be promising steps to take? Check all that
apply.
答案
BC
分析
在训练集上的表现同样糟糕、为欠拟合。high bias可能模型本身就有些问题
这个时候来看选项
A说用更少的训练样本
B尝试更多的特征 完全正确 可能是没学到特征才会错误率很高 所以要多学点
C用更少的lamda 正确 减轻正则化项的权重、可以fix high bias
D尝试通过交叉验证集来评估假设而非测试集、不选择
第四题
题目
Which of the following statements are true? Check all that apply.
题目
答案
BD
分析
选正确的
A不一定吧,训练时错误率低那么测试集呢
B 通常; 一般; 典型地; 具有代表性地是typically的意思、没毛病吧,训练集表现肯定要比测试集好啊
C低测试集错误率不代表训练集错误率也低啊
D就交叉验证错误低,那就是正确
第五题
题目
Which of the following statements are true? Check all that apply.
答案
ACD
分析
依旧是选择正确的
A加入一个学习算法是high bias,仅仅增加训练样本不会改善测试误差,完全正确。可能是模型本身或者特征提取除了问题。故正确
B说的是神经网络模型的训练集错误比测试集低很多很多。说明过拟合啊、增加更多的层数只会加剧现象、所以错误不选
C 模型参数越多更容易过拟合或者有更高的variance、正确的
D说在调整学习算法的时候。画一条学习曲线是有助于理解模型的高bias或variance问题的 正确。画出来后就很直观、第一道题不就是吗