Cost Function
Training set(训练集):
{(x(1),(y(1)),(x(2),(y(2)),...,(x(m),(y(m))}
m 个训练样本
hθ(x)=11+e−θTx
如何选择拟合参数 θ ?
代价函数
线性回归:
J(θ)=1m∑i=1m12(hθ(x(i))−y(i))2
Cost(hθ(x(i)),y(i))=12(hθ(x(i))−y(i))2
Logistic 回归:
Note: y=0 or 1 always
结合函数图像比较好理解。
Simplified cost function and gradient descent
Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))
J(θ)=1m∑i=1mCost(hθ(x(i)),y(i))=−1m[∑i=1my(i)log hθ(x(i))+(1−y(i))log(1−hθ(x(i))]
拟合参数 θ :
minθJ(θ)
针对一个新的 x 预测输出值:
Output
want minθJ(θ) :
Repeat {
θj:=θj−α∂∂θjJ(θ)
}
∂∂θjJ(θ)=1m∑i=1m(hθ(x(i)),y(i))x(i)j
Advanced Optimization(高级优化)
Optimization algorithm
Gradient descent(梯度下降)
Conjugate gradient(共轭梯度法)
BFGS(变尺度法)
L-BFGS(限制变尺度法)
后三种算法优点:
不需要手动选择学习率
一般情况下比梯度下来收敛得更快
缺点:更加复杂
Example:
θ=[θ0θ1]
J(θ)=(θ1−5)2+(θ2−5)2
∂∂θ1J(θ)=2(θ1−5)
∂∂θ2J(θ)=2(θ2−5)
function [jVal, gradient] = costFunction(theta)
jVal = (theta(1) - 5)^2 + (theta(2) - 5)^2;
gradient = zeros(2, 1);
gradient(1) = 2*(theta(1) - 5);
gradient(2) = 2*(theta(2) - 5);
options = optimset('GradObj', 'on', 'MaxIter', '100');
initialTeta = zeros(2,1);
[optTheta, functionVal, exitFlag]
= fminunc(@costFunction, initialTheta, options);
Multiclass Classification: One-vs-all
One-vs-all(one-vs-rest)
h(i)θ(x)=P(y=i|x;θ) (x=1,2,3)
给定新的输入 x 值,选择对应类别: