Class 2 Gradient Descent
For \[n\times n\] matrix A, B,
tr(AB)=trr(BA)
tr(ABC)=tr(CAB)=tr(BCA)
tr(A)=tr($A^T$)
tr():representing the trace of matrix, equal to the sum of diagonal elements of matrix
for \[A\in R^{m*n}, f(A) \in R^1:\]
$(\bigtriangledown)_A f(x)=[\frac{\partial f(A)}{\partial A_(ij)}]_{m*n)$
$ (\bigtriangledown)_A tr(ABA^TC)=CAB+C^TAB^T$
least square formula solution
$x\times \theta to predict y$
$x=[
1, x_{11}, x_{12}, x_{13},..x_{1n}
1, x_{21}, x_{22}, x_{23},..x_{2n}
...
1, x_{m1}, x_{m2}, x_{m3},..x_{mn}
]
where m is number of observations, n is number of features $
$\theta=[\theta_0, \theta_1, \theta_2, ..., \theta_n] ^T is parameters$
To get the least square, we can get the following equaltion
$x^T\times x \times \theta=x^T\times y&
$\theta=(x^T\tims x)^{-1}\tims x^T\times y$