Machine learning review week03

最新推荐文章于 2022-06-01 13:20:28 发布

weixin_30485799

最新推荐文章于 2022-06-01 13:20:28 发布

阅读量202

点赞数

原文链接：http://www.cnblogs.com/xhblog/p/4854116.html

版权

introduction to linear algebra 总算差不多看完了，还差QR ALGORITHM AND PCA....

faster matrix multiplcation?

matrix properties:

matrix multiplication is used in

multiple linear regression:

$y_n = f(\mathbf{x_n}):= \mathbf{\widetilde{X}_n^T} \mathbf{\beta}$

$\mathbf{\beta} =
\begin{bmatrix}\beta_0 \
\beta_1\\vdots \
\beta_D \end{bmatrix} $

$\mathbf{\widetilde{X}_n^T} = \begin{bmatrix} 1 \ \mathbf{x_n^T}\end{bmatrix} $
cost function --gradient descend and lease square

deficiency, ill-conditional, non-invertible, singular, condition number of a matrix

singular matrix:
not invertible
= have dependent rows
= $rank(\mathbf{A})<$ # of rows
= not full rank
= cols are linearly dependent
= some rows of the matrix donot have pivots, it can happen when there are more cols than rows in the matrix, or some of the rows are duplicated
? eigenvalues if an inveritable matrix are all different from zero?
why we need the matrix to be invertible?
how to make a singular matrix non-singular
fact: most of the matrix are singular, adding noise can make them non-singluar -- call regularization

condition number
condition number measure how easy it is to invert a matrix
- condition number of $\mathbf{A} = \parallel\mathbf{A^T}\parallel\times \parallel \mathbf{A} \parallel$ link: == $\parallel \mathbf{A^{-1}} \parallel \parallel \mathbf{A} \parallel = \frac{\sigma_{max} (\mathbf{A})}{\sigma_{min} (\mathbf{A})}$
- matrix norms: a way of sizing up a matrix, a small change to a matrix can change the norm in a great scale
- large condition number of $\mathbf{A}$
  = ill-conditioned/ill-posed problem
  = inaccurate result
  = hard to invert
- view condition number as error multiplier of solution of linear system $\mathbf{A}x = b$
for an invertible matrix:
$\Delta b$ is the error of the collected data
$\mathbf{x^*}$ is the solution of the system
$\mathbf{A}x = b + \Delta b$
$\mathbf{x} = \mathbf{A^{-1}} (\mathbf{b} + \Delta \mathbf{b}) = \mathbf{A^{-1}} \mathbf{b} + \mathbf{A^{-1}} \Delta \mathbf{b} = \mathbf{x^*} + \mathbf{A^{-1}} \Delta \mathbf{b}$
intuitively, if $A$ is small then $A^{-1}$ is large

norm

The 1-norm of a square matrix is the maximum of the absolute column sums
$ \parallel \mathbf{A} \parallel 1 = max {\substack{1<j<n}}(\Sigma |a{ij}|)$
The infinity-norm of a square matrix is the maximum of the absolute row sums.
The Euclidean norm of a square matrix is the square root of the sum of all the squares of the elements.
link: norm and condition number of matrix

diagonalizing a matrix $A_{n \times n}$

by using the eigenvactor properly; make A^n computation easy
eigenvector linearly independent then $A = S \Lambda S^{-1}$
in this way, A is similar to dia matrix which is made of eigenvalue
remark:
- any A without repeated $\lambda$ is diagonalizable
- to diagonalize A we must use eigenvector
- invertibility -- eigenvalue is zero or not
- diagonalizability -- eigenvectors is too few or enough

symmetry matrix:

spetral theroem: eyery symmetric matrix A ha a complete set of orthogonal eigtnvectors there fore $A = s \Lambda S^{-1}$ => $A = Q \Lambda Q^{T}$
are positive definiteif :eigenvalues are all positive; upper left determinants are positive; all pivots are positive; x^TAx is positive except x = 0
x^TAx = 1 --> ellipse

decomposition: eigen-de and SVD

singular value

singular value related to the distance between a matrix and the set of singular matrices from link
singular value $\sigma$ and singular vector $v,u$ of $\mathbf{A}$
$\mathbf{A}v = \sigma u$
$\mathbf{A^H}u = \sigma v$
singular value : used when transform from V to differV

eigenvalue:

used when the matrix is a transformation from one vector space on itself: corres. to stability parameters
The product of the n eigenvalues of A is the same as the determinant of A (det(A) = det($\lambda I$))
n × n matrix A and its transpose A^T have the same eigenvalues.
λ2 is an eigenvalue of A2 prove here
product of pivots = determinant = product of eigenvalues

eigenvalue decomposition

degree of charecteristic polynomial of a square matrix = size of the square matrix
$\mathbf{AX} = \mathbf{X\Lambda}$
notice that: $\begin{bmatrix} \lambda_1 \mathbf{v_1} & \lambda_2 \mathbf{v_2} & \lambda_3 \mathbf{v_2}\end{bmatrix} = \begin{bmatrix} \mathbf{v_1} &\mathbf{v_2} & \mathbf{v_3} \end{bmatrix} \begin{bmatrix}\lambda_1 & 0 & 0\\0&\lambda_2 & 0\\0&0&\lambda_3\end{bmatrix}$
if eigenvector of A is indenpendent then A can be 'eigen decomposite'
$\mathbf{A} = \mathbf{X \Lambda X^{-1}}$
in this case, A is similar to diag matrix therefore orthogonal diagonol
noticed that all sysmetry matrix have the above properties
only diagonalizable matrix can be engendecomposite

Singular Value Decomposition

optimal low rank approximation of matrix A
idea: want $v_1 \bot v_2$ and also $Av_1 \bot Av_2$, then
$A \begin{bmatrix} v_1 & v_2 \end{bmatrix} = \begin{bmatrix} Av_1 & Av_2 \end{bmatrix} = \begin{bmatrix} \sigma_1 u_1 & \sigma_2 u_2 \end{bmatrix} = \begin{bmatrix} u_1 & u_2 \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 \\ 0&\sigma_2 \end{bmatrix}$
so that we have:
$\Sigma$ is diag matrix with singular alue as entry,
U and V: orthogonal matrix
cols of U: left singular vectors (gene coefficient vectors)
$AV = U\Sigma$
$A^H U = V \Sigma^H$
here
$A = U \Sigma V^H$
to cal $\sigma_{1,2}$
$A = U\SigmaV^{T}$
$A^T = V\Sigma^T U^T$
$A A^T = U\Sigma V^{T} V\Sigma^T U^T $
$A A^T = U \Sigma \Sigma U^T $
$A A^T = U \begin{bmatrix} \sigma_{1}^2 & 0 \ 0&\sigma_{2}^2 \end{bmatrix} U^T $
NOTICE: $\Sigma^2$ is diagonal matrix with entry as eigenvalue of $AA^T$，and U with eigenvalue of it
? singular vector can be chosen to be perpendicular to each other so U and V are orthogonal and th ematrix can be 'singular value decomposite'
$AA^{T}$ is always invertible?
ECONOMY SVD decomposition

remark

singular value = length of the radius of ellipse (to be proved)
if A is symmetry, eigenvector of A_2*2 is the on the radius(long and short) of the ellipse, see eigshow in MATLAB, eigenvalue equal to the length of the radius, $\lambda = \sigma$
eigenvalue decomposition is not applied to all matrix(most have non-invertable eigenvector-matrix) -- nonexistence diffcultyand even if the matrix can be eigen-decomposite, it might not provide a basis for robust computation(have ill-conditioned invertable eigenvector-matrix) --robustness diffculty
why are we looking for robust computation
in numerical cal, how to cal a inverse of a matrix if it is not invertible? how to approximate?
why want eigen-decomposition and try to overcome the diffculty? can SVD replace it ? is SVD applied to all matrix?
defective matrix: a matrix with at least one multiple eigenvalue that does not have a full set of linearly independent eigenvectors, example is a matrix which zero is an eigenvalue of multiplicity five that has only one eigenvector.
Jordan Canonical form(JCF) decomposition use generalized eigenvectors to make up the place left by missed eigenvectors for the defective matrix. if A non-defective, JCF == eigenvalue decomposition; otherwise $A = XJX^{-1}$ J is triangle matrix
SVD decomposition is accurate?

sensitivity of eigenvalues

The sensitivity of the eigenvalues $\Lambda$ is estimated by the condition number
of the matrix of eigenvectors. from link:
assume A can be eigenvalue decomposition: then
$\parallel \delta \Lambda \parallel <= \mathbf{ \parallel X^{-1} \parallel \parallel X \parallel \parallel \delta A \parallel} = \kappa (X) \parallel \delta A \parallel$
which gives a rough idea that the sensity of $\lambda$ is related to the condition number of the eigenvector-matrix
The condition number of the eigenvector matrix $\kappa (X)$ is an upper bound for the individual
eigenvalue condition numbers $\kappa (\lambda,A)$.
The eigenvalues of symmetric and Hermitian matrices are perfectly well conditioned $\kappa (\lambda,A) = 1$
sensitivity of eigenvalue will lead to inaccuracy of the computed value of $\lambda$ caused by the round off error
how to decided whether a matrix is invertible or not

Jordan form

for matrix that are not diagonalizable, want to find $M^{-1}AM$ with M is nearly diagonal as possible: triangle
jordan matrix:
$\begin{bmatrix} 5&1&0\\0&5&1\\0&0&5 \end{bmatrix}$ with $\lambda = 5,5,5$
jordan theory: $J^T$ is similar to $J$, with the 'reverse indentity matrix'
key face: matrix J is similar to every matrix A with engenvalues 5,5,5 and one line of eigenvectors
for $A_{n \times n}$ with s independent eigenvctors, it is similar to a matrix that has s Jordan blocks, i,e
$A = M^{-1}JM = \begin{bmatrix} v1&v2\cdots & vs \end{bmatrix} \begin{bmatrix} J1 &\cdots &\cdots \\ \cdots &\ddots&\cdots \\ \cdots & \cdots &Js \end{bmatrix} M$
Jordan form 的概念就是，在big family of similar matrix 里面，选一个最接近diag 的matrix作为J，那么if A and B share the same Jordan form then A is similar to B (not otherwise)

decomposition

discrete fourier transform

"inreo to linear alge p335"
when we multiply the eigenvector-matrix of the fourier matrix, we split the sinal into pure frequences

similar transform

not changed: wigenvalue,trace,determinant, rank,num of independent eigenvectors, jordan form
changed: eigenvectors,nullspace,column space, row space, left nullspace, singular values

Fourier series: linear algebra for function

fo ra vector in inifinite dimension - it can be vectors with inifinite enties or a function
we have infinite vector $\mathbf{v} = \begin{bmatrix} v_1 & v_2 & \cdots \end{bmatrix}$ and $\mathbf{u} = \begin{bmatrix} u_1 & u_2 & \cdots \end{bmatrix}$
DEF: inner product of vectors <u,v> = u1v1 + u2v2 + u3v3 + ....
DEF: only vectors with finite length is in our infinite-dimensional "Hilbert space", there for $\Sigma_{n: 1,2,\cdots} vn$ = finite number, {$v_n$} converge
therefore, $\parallel \mathbf{vu} \parallel$ is a finite number even they are vector in infinite dimension
DEF: for function f , define the inner product:
$<f(x),g(x)> = \int_{0 - 2pi} f(x)g(x) dx$
require: $\parallel f \parallel ^2 = \int_{0 - 2pi} f^2(x) dx$ (= finite numeber)
we now can have a list of function $\begin{bmatrix} 1 & f_1(x) & g_1(x) & f_2(x) & g_2(x) & \cdots \end{bmatrix} = \begin{bmatrix} 1 & cos(x) & sin(x) & cos(2x) & sin(2x) & \cdots \end{bmatrix}$ the vectors(functions) can be view as basis vectors, so that any vectors(function) can be written as the linear combination of the basis vectors :
$f(x) = \begin{bmatrix} 1 & f_1(x) & g_1(x) & f_2(x) & g_2(x) & \cdots \end{bmatrix} \begin{matrix} a_0\\a_2\\a_3\\ \vdots \end{bmatrix} = \begin{bmatrix} a_0 + a_1cos(x) + a_2sin(x)+a_3 cos(2x) + a_4sin(2x) & \cdots \end{bmatrix}$
we can find that when f(x) = 1 $|f|^2 = 2 pi $ so to make the basis vectors orthonormal(the vectors perpendicular/orthogonal to each other alread), we divided then by the length therefore:
$\begin{bmatrix} 1/ \sqrt{2pi} & f_1(x)/ \sqrt{pi} & g_1(x)/ \sqrt{pi} & f_2(x)/ \sqrt{pi} & g_2(x)/ \sqrt{pi} & \cdots \end{bmatrix} = \begin{bmatrix} 1/ \sqrt{2pi} & cos(x)/ \sqrt{pi} & sin(x)/ \sqrt{pi} & cos(2x)/ \sqrt{pi} & sin(2x)/ \sqrt{pi} & \cdots \end{bmatrix}$
in this way, $\|f\|^2 = \|a_{0}^2 + a_{1}^2 + a_{2}^2 + \cdots \|$

转载于:https://www.cnblogs.com/xhblog/p/4854116.html

weixin_30485799

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Machine learning review week03

introduction to linear algebra 总算差不多看完了，还差QR ALGORITHM AND PCA....faster matrix multiplcation?matrix properties:matrix multiplication is used inmultiple linear regression:\(y_n = f(\ma...
复制链接

扫一扫