数据库:Iris 数据集;
属性:
- 1. 花萼长度(Sepal Length)
- 2. 花萼宽度(Sepal Width)
- 3. 花瓣长度(Petal Length)
- 4. 花瓣宽度(Petal Width)
- 5. 种类:山鸢尾(Setosa),杂色鸢尾(Versicolour),维吉尼亚鸢尾(Virginica)
线性回归的目标是最小化成本函数:
- %% Initallization
- clear all;
- close all;
- clc;
- %% ======================== Part 1: Plotting ===========================
- fprintf('Plotting Data ...\n');
- data = csvread('Iris.txt'); % 导入Iris 数据
- x = data(:,1); y = data(:,3); % 花萼长度&花瓣长度
- m = length(y);
- % Plot Data
- plotData(x,y);
- fprintf('Program paused. Press enter to continue. \n');
- pause;
- %% ======================== Part 2: Gradient descent ===========================
- fprintf('Running Gradient Descent ...\n');
- X = [ones(m, 1), data(:, 1)];
- theta = zeros(2, 1);
- % 参数设置
- iterations = 1500;
- alpha = 0.01;
- % 计算损失函数
- %computeCost(X, y, theta);
- % 运行梯度下降
- theta = gradientDescent(X, y, theta, alpha, iterations);
- fprintf('Theta found by gradient descent: ');
- fprintf('%f %f \n', theta(1), theta(2));
- hold on;
- plot(X(:,2), X*theta, 'r-');
- legend('Training data', 'Linear regression')
- hold off;
- predict1 = [1, 8] * theta;
- fprintf('对于花萼长度为8厘米, 花瓣长度预测为 %f cm\n', ...
- predict1);
- predict2 = [1, 5.5] * theta;
- fprintf('对于花萼长度为5.5厘米, 花瓣长度预测为 %f cm\n', ...
- predict2);
- fprintf('Program paused. Press enter to continue.\n');
- pause;
- %% ======================== Part 3: Visualizing J (theta_0, theta_1, ) ===========================
- fprintf('Visualizing J(theta_0, theta_1) ...\n')
- % 数据可视化
- theta0_vals = linspace(-10, 10, 100);
- theta1_vals = linspace(-1, 4, 100);
- J_vals = zeros(length(theta0_vals), length(theta1_vals));
- for i = 1:length(theta0_vals)
- for j = 1:length(theta1_vals)
- t = [theta0_vals(i); theta1_vals(j)];
- J_vals(i, j) = computeCost(X, y, t);
- end
- end
- J_vals = J_vals';
- figure;
- surf(theta0_vals, theta1_vals, J_vals)
- xlabel('\theta_0'); ylabel('\theta_1');
- figure;
- contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 3, 20))
- xlabel('\theta_0'); ylabel('\theta_1');
- hold on;
- plot(theta(1), theta(2), 'rx', 'MarkerSize', 10, 'LineWidth', 2);
计算梯度下降:
- function [theta , J_history] = gradientDescent(X, y, theta, alpha, num_iters)
- m = length(y);
- J_history = zeros(num_iters, 1);
- for iter = 1:num_iters
- H = X * theta;
- T = [0 ; 0];
- for i = 1 : m
- T = T + (H(i) - y(i)) * X(i, :)';
- end
- theta = theta - (alpha * T) / m;
- J_history(iter) = computeCost(X, y, theta);
- end
- end
- function J = computeCost(X, y, theta)
- m = length(y);
- J = 0;
- J = sum((X*theta - y).^2) / (2.* m);
- end
结果展示: