Computing Parameters Analytically

Computing Parameters Analytically

Normal Equation

Find the optimum θ \theta θ without iteration

  • Minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero.

Formula:

θ = ( X T X ) − 1 X T y \theta={(X^TX)}^{-1}X^Ty θ=(XTX)1XTy

Octave: pinv (X’X) X’*y

Design matrix (X)

m examples ( x ( 1 ) , y ( 1 ) ) , . . . ( x ( m ) , y ( m ) ) (x^{(1)}, y^{(1)}) ,...(x^{(m)} ,y^{(m)}) (x(1),y(1)),...(x(m),y(m));n features
x ( i ) = [ x 0 ( i ) x 1 ( i ) ⋅ ⋅ ⋅ x n ( i ) ] ∈ R n + 1 x^{(i)}=\begin{bmatrix} x_0^{(i)}\\ x_1^{(i)}\\ \cdot\\ \cdot\\ \cdot\\ x_n^{(i)} \end{bmatrix}\in\R^{n+1} x(i)=x0(i)x1(i)xn(i)Rn+1

X = [ − ( x ( 1 ) ) T − − ( x ( 2 ) ) T − ⋅ ⋅ ⋅ − ( x ( m ) ) T − ] ( m × ( n + 1 ) − d i m e n s i o n a l ) X=\begin{bmatrix} -(x^{(1)})^T-\\ -(x^{(2)})^T-\\ \cdot\\ \cdot\\ \cdot\\ -(x^{(m)})^T- \end{bmatrix}(m\times(n+1)-dimensional) X=(x(1))T(x(2))T(x(m))T(m×(n+1)dimensional)

There is no need to do feature scaling.

Comparison of gradient descent and normal equation:

G r a d i e n t    D e s e n t N o r m a l    E q u a t i o n N e e d    t o    c h o o s e    a l p h a N o    n e e d    t o    c h o o s e    a l p h a N e e d s    m a n y    i t e r a t i o n s N o    n e e d    t o    i t e r a t e o ( k n 2 ) o ( n 3 ) , n e e d    t o    c a l c u l a t e    i n v e r s e    o f    X T X W o r k s    w e l l    w h e n    n    i s    l a r g e S l o w    i f    n    i s    v e r y    l a r g e \begin{array}{|c|clr|} \hline Gradient \;Desent&Normal\;Equation\\ \hline Need\;to\;choose\;alpha&No\;need\;to\;choose\;alpha\\ \hline Needs\;many\;iterations&No\;need\;to\;iterate\\ \hline \mathcal{o}(kn^2)&\mathcal{o}(n^3),need\;to\;calculate\;inverse\;of\;X^TX\\ \hline Works\;well\;when\;n\;is\;large&Slow\;if\;n\;is\;very\;large\\ \end{array} GradientDesentNeedtochoosealphaNeedsmanyiterationso(kn2)WorkswellwhennislargeNormalEquationNoneedtochoosealphaNoneedtoiterateo(n3),needtocalculateinverseofXTXSlowifnisverylarge
With the normal equation, computing the inversion has complexity O ( n 3 ) \mathcal{O}(n^3) O(n3). So if we have a very large number of features, the normal equation will be slow.

Normal Equation Noninvertibility

If X T X X^TX XTX is noninvertible, the common causes might be having :

  • Redundant features, where two features are very closely related (i.e. they are linearly dependent)
  • Too many features (e.g. m ≤ n). In this case, delete some features or use “regularization” (to be explained in a later lesson).

Solutions

  • Deleting a feature that is linearly dependent with another .(Redundant features)
  • Deleting one or more features or use regularization when there are too many features( e.g. m ≤ n m\leq n mn).
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值
>