machine-learning-ex1-answer

最新推荐文章于 2020-01-15 23:13:05 发布

sharim

最新推荐文章于 2020-01-15 23:13:05 发布

阅读量107

点赞数 1

分类专栏：学习笔记文章标签： machine-learning coursera note

本文链接：https://blog.csdn.net/weixin_43778286/article/details/99122270

版权

学习笔记专栏收录该内容

1 篇文章 0 订阅

订阅专栏

在coursera学习Andrew Ng的机器学习的第一次编程作业答案（Linear Regression）

最近开始在coursera上学习吴恩达先生的机器学习课程，现在进行到了第二周的课程，也完成了第一次编程作业，在此写下自己对于作业的理解以巩固所学,之后也会继续写，算是一种督促吧。

Warm up exercise

Description

function A = warmUpExercise()
%WARMUPEXERCISE Example function in octave
%   A = WARMUPEXERCISE() is an example function that returns the 5x5 identity matrix
A = [];
% ============= YOUR CODE HERE ==============

Solution

返回一 5x5 单位矩阵即可

Code

A = eye(5);

Computing Cost (for One Variable)

Description

function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================

Solution

由代价函数公式：
$J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$
翻译为对应代码即可

Code

J = sum((X * theta - y) .^ 2)/(2 * m);

Gradient Descent (for One Variable)

Description

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================

Solution

单变量线性回归模型为：
$h_\theta(x)=\theta_0+\theta_1x$
对应梯度下降算法为：

repeat until convergence{
$\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0,\theta_1)$
}

翻译为对应代码即可

Code

H = X * theta - y;
theta(1) = theta(1) - alpha * (1/m) * sum(H .* X(:,1));
theta(2) = theta(2) - alpha * (1/m) * sum(H .* X(:,2));

Feature Normalization

Description

function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X
%   FEATURENORMALIZE(X) returns a normalized version of X where
%   the mean value of each feature is 0 and the standard deviation
%   is 1. This is often a good preprocessing step to do when
%   working with learning algorithms.
% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
% ====================== YOUR CODE HERE ======================

Solution

函数要求实现特征缩放（均值为 0，平均方差为 1），

为此可以先计算出原矩阵的均值与平均差¹

计算均值可调用 mean() 函数：

mean (X) = SUM_i X(i) / N

计算标准差可调用 std() 函数：

std (X) = sqrt ( 1/(N-1) SUM_i (X(i) - mean(X))^2 )

已知均值与标准差便能得到特征缩放结果

Code

mu = mean(X);
sigma = std(X);
X_norm = (X - mu) ./ sigma;

Computing Cost (for Multiple Variables)

Description

function J = computeCostMulti(X, y, theta)
%COMPUTECOSTMULTI Compute cost for linear regression with multiple variables
%   J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================

Solution

同Computing Cost (for One Variable)
由多变量代价函数公式
$J(\theta_0,\theta_1,……,\theta_n)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$
即得

Code

J = sum((X * theta - y) .^ 2)/(2 * m);

Gradient Descent (for Multiple Variables)

Description

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
%GRADIENTDESCENTMULTI Performs gradient descent to learn theta
%   theta = GRADIENTDESCENTMULTI(x, y, theta, alpha, num_iters) updates theta by
%   taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================

Solution

多变量时，假设函数为
$h_\theta(x)=\theta_0x_0+\theta_1x_1+……+\theta_nx_n (x_0=1)$
即
$h_\theta(x)=\theta^TX$
代价函数为
$J(\theta_0,\theta_1,……,\theta_n)=\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2$
在多元变量的梯度下降中，我们将对每个θ都求偏导。其形式如下：
Repeat until convergence:{
$\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta)$
}²

Code

H = X*theta;
J = H - y;
j = J' *X;
theta = theta - alpha * (1/m) * j';

Normal Equations

Description

function [theta] = normalEqn(X, y)
%NORMALEQN Computes the closed-form solution to linear regression
%   NORMALEQN(X,y) computes the closed-form solution to linear
%   regression using the normal equations.
theta = zeros(size(X, 2), 1);
% ====================== YOUR CODE HERE ======================

Solution

由正规方程：
$\theta=(X^TX)^{-1}X^Ty$
即可得答案

Code

theta = pinv(X' * X) * X' * y;

当计算数据为对象总体的数据时，标准差为 $\sqrt{\frac{\sum_{i=1}^N(x_i-\overline{x})^2}{N}}$ ，当计算数据为样本数据时，标准差为 $\sqrt{\frac{\sum_{i=1}^N(x_i-\overline{x})^2}{N-1}}$ ↩︎
在一次迭代过程中，必须同时更新每个θ。例如不能在更新了θ1之后，就把新的θ1用于更新后面的θ2，而应该使用上一次迭代产生的θ1来更新这一次迭代中的θ2。 ↩︎

sharim

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
machine-learning-ex1-answer

在coursera学习Andrew Ng的机器学习的第一次编程作业答案（Linear Regression）Warm up exerciseDescriptionSolutionCodeComputing Cost (for One Variable)DescriptionSolutionCodeGradient Descent (for One Variable)DescriptionSolut...
复制链接

扫一扫