The normal equations

最新推荐文章于 2018-08-06 14:59:33 发布

weixin_30483013

最新推荐文章于 2018-08-06 14:59:33 发布

阅读量152

点赞数

原文链接：http://www.cnblogs.com/ustccjw/archive/2013/04/12/3016231.html

版权

1. Guide

Gradient descent gives one way of minimizing J. Lets discuss a second way of doing so, this time performing the minimization explicitly and without resorting to an iterative algorithm. In this method, we will minimize J by explicitly taking its derivatives with respect to the θ_j ’s, and setting them to zero.

2. Matrix derivatives

For a function f : R^m×n → R mapping from m-by-n matrices to the real numbers, we define the derivative of f with respect to A to be:

Thus, the gradient ∇Af(A) is itself an m-by-n matrix, whose (i, j)-element is ∂f/∂A_ij .

We also introduce the trace operator, written “tr”:

If a is a real number (i.e., a 1-by-1 matrix), then tr a = a.

More properties about tr:

a. trAB = trBA;

b. trABC = trCAB = trBCA;

c. trABCD = trDABC = trCDAB = trBCDA;

d. trA = trA^T;

e. tr(A + B) = trA + trB;

f. tr(aA) = a(trA)

We now state without proof some facts of matrix derivatives:

a. ∇_AtrAB = B^T;

b. ∇_A^T f(A) = (∇_Af(A))^T;

c. ∇_AtrABA^TC = CAB + C^TAB^T;

d. ∇_A|A| = |A|(A⁻¹)^T.(A is non-singular square matrices)

　 proof: We define A′ to be the matrix whose (i, j) element is (−1)^i+j times the determinant of the square matrix resulting from deleting row i and column j from A, then it can be proved that A⁻¹ = (A′)^T /|A|.

　　 The determinant of a matrix can be written |A| =Σ_j A_ij(A′ )_ij . Since (A′)_ij does not depend on A_ij (as can be seen from its definition), this implies that (∂/∂A_ij)|A| = A′_ij .=>∇_A|A| = A′ = |A|(A⁻¹)^T.

2. Least squares revisited

Define the design matrix X to be the m-by-n+1 matrix that contains the training examples' input values in its row:

Also, let ~y(y is a vector) be the m-dimensional vector containing all the target values from the training set:

Now, since h(x⁽ⁱ⁾) = (x⁽ⁱ⁾)^T θ, we can easily verify that:

Hence,

To minimize J, we set its derivatives to zero, and obtain the normal equations:

Thus, the value of θ that minimizes J(θ) is given in closed form by the equation:

Here， the inverse is a pseudoinverse.

转载于:https://www.cnblogs.com/ustccjw/archive/2013/04/12/3016231.html

weixin_30483013

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
The normal equations

1. Guide Gradient descent gives one way of minimizing J. Lets discuss a second wayof doing so, this time performing the minimization explicitly and withoutresorting to an iterative algorithm. I...
复制链接

扫一扫