机器学习：线性回归（Linear Regression）小项目

最新推荐文章于 2024-08-29 20:32:56 发布

小威威__

最新推荐文章于 2024-08-29 20:32:56 发布

阅读量6.8k

点赞数 2

分类专栏： machine-learning 文章标签：机器学习线性回归算法梯度下降法正规方程法

本文链接：https://blog.csdn.net/linwh8/article/details/77756524

版权

本文介绍了一个机器学习的小项目，专注于线性回归。内容涵盖单变量和多变量线性回归，讨论了梯度下降法和正规方程法的适用场景。文章通过代码实践展示了线性回归的实现过程，包括特征缩放、损失函数和向量化计算。作者还分享了在Octave中遇到的陷阱和解决方案，并指出梯度下降法可能找到局部最优解的局限性。

摘要由CSDN通过智能技术生成

该小项目所有代码在我的github上，欢迎有兴趣的同学与我探讨研究～
地址：Machine-Learning/machine-learning-ex1/

1. Introduction

线性回归(Linear Regression)，在wiki上的定义如下：

In statistics, linear regression is a linear approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression.

谈谈我个人的理解吧。
线性回归，从应用层面上来讲，是用于数据拟合的工具。从许多数据中找到一条直线能够拟合大部分数据，从而能够根据输入的值预测输出的值。

何为线性？数据拟合得到的是呈线性的，换句话说就是条直线。
何为回归？就是根据以前的数据预测出一个准确的输出值。

而线性回归算法，可以分为两类：

Linear Regression with one variable 单变量线性回归
Linear Regression with multiple variable 多变量线性回归
注：有时会用feature代替variable这个词

而解决线性回归问题，通常会涉及到两种方法：

Gradient Descent 梯度下降法
Normal Equation 正规方程法

简单谈谈什么时候使用梯度下降法，什么时候使用正规方程法？

首先，正规方程法在训练集个数较少时(<10000)，计算效率会优于梯度下降法，否则便使用梯度下降法；
其次，正规方程法不需要设定学习率，即不会涉及到调参的问题，且不需要迭代；
最后，梯度下降法的时间复杂度O(kn^2), 正规方程法的时间复杂度O(n^3)。

总而言之，训练集个数少于10000优先使用正规方程法，否则使用梯度下降法。

对于多变量的线性回归，还会涉及到：

Feature scaling 特征缩放
Mean Normalization 均值归一化
注：有时会提到Feature Normalization，指的便是Mean Normalization。

还有，Vectorization(向量化)：
很多复杂的计算都可以转换成矩阵或者向量的计算，这在一定程度上大大提高了计算的效率。同时，代码也会变的十分简洁。待会在项目代码中会有所体现。

最后，还得了解一下Cost Function (损失／代价函数)。当数据拟合度越高，损失函数的值越小，极限是等于0.

需要掌握的公式：
1. 线性回归函数(含一般式与向量式)；
2. 损失函数；
3. 梯度下降的迭代式子(含一般式和向量式)；
4. 正规函数参数向量化求法。

声明：本项目代码用Octave实现(语法与matlab相似)。代码注释很详细，就不另外讲解了了～

2. Linear Regression with one variable

主函数：

%% Initialization
% clear means clear all the valuable in the workspace
% close all means close all the windows except the main window
% clc means clean all the info in command 
clear ; close all; clc

%% ==================== Part 1: Basic Function ====================
% Complete warmUpExercise.m
fprintf('Running warmUpExercise ... \n');
fprintf('5x5 Identity Matrix: \n');
warmUpExercise()

fprintf('Program paused. Press enter to continue.\n');
pause;


%% ======================= Part 2: Plotting =======================
fprintf('Plotting Data ...\n')
data = load('ex1data1.txt');
X = data(:, 1); y = data(:, 2);
m = length(y); % number of training examples

% Plot Data
% Note: You have to complete the code in plotData.m
plotData(X, y);

fprintf('Program paused. Press enter to continue.\n');
pause;

%% =================== Part 3: Cost and Gradient descent ===================

X = [ones(m, 1), data(:,1)]; % Add a column of ones to x
theta = zeros(2, 1); % initialize fitting parameters

% Some gradient descent settings
iterations = 1500;
alpha = 0.01;

fprintf('\nTesting the cost function ...\n')
% compute and display initial cost
J = computeCost(X, y, theta);
fprintf('With theta = [0 ; 0]\nCost computed = %f\n', J);
fprintf('Expected cost value (approx) 32.07\n');

% further testing of the cost function
J = computeCost(X, y, [-1 ; 2]);
fprintf('\nWith theta = [-1 ; 2]\nCost computed = %f\n', J);
fprintf('Expected cost value (approx) 54.24\n');

fprintf('Program paused. Press enter to continue.\n');
pause;

fprintf('\nRunning Gradient Descent ...\n')
% run gradient descent
theta = gradientDescent(X, y, theta, alpha, iterations);

% print theta to screen
fprintf('Theta found by gradient descent:\n');
fprintf('%f\n', theta);
fprintf('Expected theta values (approx)\n');
fprintf(' -3.6303\n  1.1664\n\n');

% Plot the linear fit
hold on; % keep previous plot visible
plot(X(:,2), X*theta, '-')
legend('Training data', 'Linear regression')
hold off % don't overlay any more plots on this figure

% Predict values for population sizes of 35,000 and 70,000
predict1 = [1, 3.5] *theta;
fprintf('For population = 35,000, we predict a profit of %f\n',...
    predict1*10000);
predict2 = [1, 7] * theta;
fprintf('For population = 70,000, we predict a profit of %f\n<