机器学习——吴恩达 week3
1: logistic regression:
hypothesis:
0<h<1, 表示条件概率,sigmoid函数
decision boundary:
theta' * x表示边界。
cost function:
y=1,h=0 cost=无穷;y=0,h=1 cost=无穷; y=1,h=1 cost=0; y=0,h=0 cost=0;
gradient descent:
与linear regression 形式相同,只不过cost function不同
advanced optimization :
用fminunc速度更快,需要提前构造cost function函数,提供cost function和偏导数的计算方法。
multiclass classification: one vs all
2: Regularization—过拟合问题:
特征太少 underfit/high bias
特征太多 overfit
If we have too many features, the learned hypothesis may fit the training set very well (cost function约等于0 ), but fail to generalize to new examples (predict prices on new examples).
可以理解为高阶项越多,对数据集拟合越好。但对新数据预测性能可能会变差。
Regularization 方法:降低theta的大小,在cost function 中加入theta1....thetan的平方的和,注意不包括theta0
normal equation不再存在不可逆的问题。