Computing Parameters Analytically

NZOGGY_

已于 2022-01-25 22:50:49 修改

阅读量220

点赞数

文章标签：线性代数机器学习

于 2022-01-25 22:49:38 首次发布

本文链接：https://blog.csdn.net/NZOGGY_/article/details/122693412

版权

Computing Parameters Analytically

Normal Equation

Find the optimum $\theta$ without iteration

Minimize J by explicitly taking its derivatives with respect to the θj ’s, and setting them to zero.

Formula:

$\theta={(X^TX)}^{-1}X^Ty$

Octave: pinv (X’X) X’*y

Design matrix (X)

m examples $x^{(1)}, y^{(1)}) ,...(x^{(m)} ,y^{(m)})$ ;n features
$x^{(i)}=\begin{bmatrix} x_0^{(i)}\\ x_1^{(i)}\\ \cdot\\ \cdot\\ \cdot\\ x_n^{(i)} \end{bmatrix}\in\R^{n+1}$

$X=\begin{bmatrix} -(x^{(1)})^T-\\ -(x^{(2)})^T-\\ \cdot\\ \cdot\\ \cdot\\ -(x^{(m)})^T- \end{bmatrix}(m\times(n+1)-dimensional)$

There is no need to do feature scaling.

Comparison of gradient descent and normal equation:

$\begin{array}{|c|clr|} \hline Gradient \;Desent&Normal\;Equation\\ \hline Need\;to\;choose\;alpha&No\;need\;to\;choose\;alpha\\ \hline Needs\;many\;iterations&No\;need\;to\;iterate\\ \hline \mathcal{o}(kn^2)&\mathcal{o}(n^3),need\;to\;calculate\;inverse\;of\;X^TX\\ \hline Works\;well\;when\;n\;is\;large&Slow\;if\;n\;is\;very\;large\\ \end{array}$
With the normal equation, computing the inversion has complexity $\mathcal{O}(n^3)$ . So if we have a very large number of features, the normal equation will be slow.

Normal Equation Noninvertibility

If $X^TX$ is noninvertible, the common causes might be having :

Redundant features, where two features are very closely related (i.e. they are linearly dependent)
Too many features (e.g. m ≤ n). In this case, delete some features or use “regularization” (to be explained in a later lesson).

Solutions

Deleting a feature that is linearly dependent with another .(Redundant features)
Deleting one or more features or use regularization when there are too many features( e.g. $m\leq n$ ).