Classification
y∈{0,1} 0: “Negative Class”
1:“Positio Class”
Logistic Regression: 0 <= h(x) <= 1
Hypothesis(假设函数)
- Sigmoid function(Loistion function):
- Hypothesis:
h(x) = estimated probability that y=1 on input x
P(y = 0) + P(y = 1) = 1
Cost function
如果使用线性回归的代价函数,J(θ)图像是"non-convex"(非凸函数),无法通过迭代找到 θ。
- cost function:
y = 1:
y = 0:
方法一:Gradient Descent
对J(θ)求关于θ的偏导,形式与线性回归一致,但是 h(θ) 不一致。
方法二:Advanced optimization(高级优化)
- Optimization algorithms:
Gradient descent
Conjugate gradient
BFGS
L-BFGS - Example:
costFunction():
optimset():‘GradObj’:设置梯度目标参数开关;‘MaxIter’:最大迭代次数
fminunc():无约束最小化函数
Multi-class classification:One-vs-all(多元分类:一对多)
Programming Exercise
根据学生的两门卡是成绩决定是否对其录取。
训练集(分数1,分数2,是否录取(boolean))
- 可视化数据
data = load('ex2data1.txt');
X = data(:, [1, 2]); y = data(:, 3);
plotData(X, y);
function plotData(X, y)
pos = find(y == 1); %返回y == 1的索引
neg = find(y == 0);
plot(X(pos,1),X(pos,2),'k+','LineWidth',2,'MarkerSize',7);
hold on;
plot(X(neg,1),X(neg,2),'ko','MarkerFaceColor','y','MarkerSize',7);
end
- 实现costFunction
[m, n] = size(X); %m:行数 n:列数
X = [ones(m, 1) X]; %左侧添加一列,元素为1
initial_theta = zeros(n + 1, 1);
[cost, grad] = costFunction(initial_theta, X, y);
function g = sigmoid(z)
g = zeros(size(z));
g = 1./(1+exp(-1 * z));
end
function [J, grad] = costFunction(theta, X, y)
m = length(y);
grad = zeros(size(theta));
h = sigmoid(X * theta); %100*1矩阵
J = (-1/m)*((y' * log(h))+(1-y)' * (log(1 - h)));
grad = (1/m)*X' * (h - y); %公式中的X(j)代表的是一列数据
end
- fminunc
options = optimset('GradObj', 'on', 'MaxIter', 400);
[theta, cost] = fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);
- 画决策界限
function plotDecisionBoundary(theta, X, y)
plotData(X(:,2:3), y); %训练集可视化
hold on
if size(X, 2) <= 3 %特征个数<=2
plot_x = [min(X(:,2))-2, max(X(:,2))+2]; %确定两个特征的而范围,并向外扩大2
plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1)); %根据决策边界定义,可得 θ'X = 0,即 θ1 + θ2*x2 + θ3*x3 = 0,即 x3 = (-1/θ3) * (θ2*x2 + θ1)。注意,此处plot_y中的y表示y轴,而非样本y。
plot(plot_x, plot_y);
legend('Admitted', 'Not admitted', 'Decision Boundary')
axis([30, 100, 30, 100]) %x,y轴的范围
else
u = linspace(-1, 1.5, 50); %1*50 类似的,首先确定x轴、y轴上的坐标取值和范围
v = linspace(-1, 1.5, 50); %1*50
z = zeros(length(u), length(v)); %50*50 z用于填写坐标(x,y)处对应的高度值
for i = 1:length(u)
for j = 1:length(v)
z(i,j) = mapFeature(u(i), v(j))*theta; %在分类边界明显不为直线时,使用mapFeature函数,将原来低次幂的特征映射为高次组合 (u(i),v(2)代表一个点)mapFeature(u(i), v(j))的到1*28的矩阵(1,x1,x2,x1^2,x1x2,x2^2......x1x2^5,x2^6)
end
end
z = z'; % 将 z 转置,以满足 contour 函数对x、y轴的特定顺序要求
contour(u, v, z, [0, 0], 'LineWidth', 2) % 绘制高度值为0的等高线即决策边界,[0, 0]表示高度为0
end
end
5. 预测和准确度
prob = sigmoid([1 45 85] * theta);
p = predict(theta, X);
fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100); %计算准确度,mean计算平均值
function p = predict(theta, X)
m = size(X, 1);
p = zeros(m, 1);
p = sigmoid(X * theta);
for i = 1:m,
if(p(i,1) < 0.5),
p(i,1) = 0;
else
p(i,1) = 1;
end
end
end