The normal equations

1. Guide

    Gradient descent gives one way of minimizing J. Lets discuss a second way of doing so, this time performing the minimization explicitly and without resorting to an iterative algorithm. In this method, we will minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero.

 

2. Matrix derivatives

    For a function f : Rm×n → R mapping from m-by-n matrices to the real numbers, we define the derivative of f with respect to A to be:

                                              

    Thus, the gradient ∇Af(A) is itself an m-by-n matrix, whose (i, j)-element is ∂f/∂Aij .

    We also introduce the trace operator, written “tr”:

                                             

    If a is a real number (i.e., a 1-by-1 matrix), then tr a = a.

    More properties about tr:

    a. trAB = trBA;

    b. trABC = trCAB = trBCA;

    c. trABCD = trDABC = trCDAB = trBCDA;

    d. trA = trAT;

    e. tr(A + B) = trA + trB;

    f. tr(aA) = a(trA)

    We now state without proof some facts of matrix derivatives:

    a. ∇AtrAB = BT;

    b. ∇AT f(A) = (∇Af(A))T;

    c. ∇AtrABATC = CAB + CTABT;

    d. ∇A|A| = |A|(A−1)T.(A is non-singular square matrices)

  proof: We define A′ to be the matrix whose (i, j) element is (−1)i+j times the determinant of the square matrix resulting from deleting row i and column j from A, then it can be proved that A−1 = (A′)T /|A|.

     The determinant of a matrix can be written |A| =Σj Aij(A′ )ij . Since (A′)ij does not depend on Aij (as can be seen from its definition), this implies that (∂/∂Aij)|A| = A′ij .=>∇A|A| = A′ = |A|(A−1)T.

 

2. Least squares revisited

    Define the design matrix X to be the m-by-n+1 matrix that contains the training examples' input values in its row:

                                            

    Also, let ~y(y is a vector) be the m-dimensional vector containing all the target values from the training set:

                                        

    Now, since h(x(i)) = (x(i))T θ, we can easily verify that:

                                         

                                       

    Hence,

                                       

    To minimize J, we set its derivatives to zero, and obtain the normal equations:

                                        

    

    Thus, the value of θ that minimizes J(θ) is given in closed form by the equation:

                                         

    Here, the inverse is a pseudoinverse.

    

转载于:https://www.cnblogs.com/ustccjw/archive/2013/04/12/3016231.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值