part 1.加载数据形成矩阵
fprintf('Plotting Data ...\n')
data = load('ex1data1.txt'); %加载数据,文本中只有两列,一个是x,一个是y
X = data(:, 1); y = data(:, 2); %形成矩阵
m = length(y); % 计算样例数
% Plot Data
% Note: You have to complete the code in plotData.m
plotData(X, y); %调用函数plotData()
fprintf('Program paused. Press enter to continue.\n');
pause;
plotData(x,y)函数
function plotData(x, y)
figure; % 开启一个新的窗口
plot(x,y,'rx','MarkerSize',10); % 在图上用红色的x标点,大小为10,将数据全部画出来了
ylabel('Profit in $10,000s'); % y轴的标签
xlabel('Population of City in 10,000s'); % x轴的标签
end
效果如图:
因为数据是从x1开始的,所以我们默认x0为1,需要将X矩阵第一列加入x0的值
X = [ones(m, 1), data(:,1)]; % 加一列数据
theta = zeros(2, 1); % 初始化θ参数,假设只有两个
iterations = 1500; % 设迭代次数是1500,
alpha = 0.01;% 迭代时候用的参数
fprintf('\nTesting the cost function ...\n')
计算损失函数J
J = computeCost(X, y, theta); % 调用函数computeCost(X,y,theta)来计算J
fprintf('With theta = [0 ; 0]\nCost computed = %f\n', J);
fprintf('Expected cost value (approx) 32.07\n');
computeCost函数
function J = computeCost(X, y, theta)
m = length(y); % 样例数
J = 0; % 这个J当然是越小越好,后面会用梯度下降来找到合适的θ来减小损失函数
value = 0;
for i = 1:m
value = value+(X(i,:)*theta - y(i,:))^2;
end
J = 1/(2*m)*value;
end
用梯度下降迭代寻找合适的θ
theta = gradientDescent(X, y, theta, alpha, iterations); %调用函数
梯度下降函数
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y);
J_history = zeros(num_iters, 1); %记录每一次迭代后J的值
for iter = 1:num_iters
theta_len = size(theta); % 记录θ的长度
theta_temp = zeros(theta_len,1); % 因为每次迭代所有的系数是同时更新的,所以用temp来记录,一次迭代完后重新赋值回去
for j = 1:theta_len
value = 0.0;
for i = 1:m
value = value + X(i,j)*(X(i,:)*theta-y(i));
end
theta_temp(j) = theta(j) - alpha / m * value;
end
theta = theta_temp;
J_history(iter) = computeCost(X, y, theta); % 记录每次迭代后的损失值
end
end
找到最佳的θ向量后,根据函数来预测值画出预测的直线
hold on; % 在原先的图上画线
plot(X(:,2), X*theta, '-')
legend('Training data', 'Linear regression') %线的标识
hold off % don't overlay any more plots on this figure
将J(θ)可视化
theta0_vals = linspace(-10, 10, 100);
theta1_vals = linspace(-1, 4, 100);
J_vals = zeros(length(theta0_vals), length(theta1_vals));
% 填充J的值
for i = 1:length(theta0_vals)
for j = 1:length(theta1_vals)
t = [theta0_vals(i); theta1_vals(j)];
J_vals(i,j) = computeCost(X, y, t);
end
end
J_vals = J_vals';
figure;
surf(theta0_vals, theta1_vals, J_vals) %画3d图
xlabel('\theta_0'); ylabel('\theta_1');
画轮廓图
figure;
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
xlabel('\theta_0'); ylabel('\theta_1');
hold on;
plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);
如果是多维的情况,加一个数值归一化,将加载的数据进行处理
function [X_norm, mu, sigma] = featureNormalize(X)
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
mu = mean(X); % 按列算每一列的均值,返回的是行向量
sigma = std(X); % 按列算每一列的标准差,返回的是行向量
for i = 1:length(X_norm)
X_norm(i,:) = (X_norm(i,:) - mu) ./ sigma;
end
end
mean是求平均值,这里mean(X)是计算每一列的平均值,放在一个行向量中
std是求标准差,也是返回行向量。
figure;
plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);
xlabel('Number of iterations');
ylabel('Cost J');
这是根据迭代次数来画出J(θ)的值,可以描述J的变化。
numel函数是返回矩阵J_history有多少个元素。