introduction to linear algebra 总算差不多看完了, 还差QR ALGORITHM AND PCA....
faster matrix multiplcation?
matrix properties:
- matrix multiplication is used in
multiple linear regression:
\(y_n = f(\mathbf{x_n}):= \mathbf{\widetilde{X}_n^T} \mathbf{\beta}\)
$\mathbf{\beta} =
\begin{bmatrix}\beta_0 \
\beta_1\\vdots \
\beta_D \end{bmatrix} $$\mathbf{\widetilde{X}_n^T} = \begin{bmatrix} 1 \ \mathbf{x_n^T}\end{bmatrix} $
cost function --gradient descend and lease square
deficiency, ill-conditional, non-invertible, singular, condition number of a matrix
singular matrix
:
not invertible
= have dependent rows
= \(rank(\mathbf{A})<\) # of rows
= not full rank
= cols are linearly dependent
= some rows of the matrix donot have pivots, it can happen when there are more cols than rows in the matrix, or some of the rows are duplicated
? eigenvalues if an inveritable matrix are all different from zero?why we need the matrix to be invertible?
how to make a singular matrix non-singular
fact: most of the matrix are singular, adding noise can make them non-singluar -- callregularization
condition number
condition number
measure how easy it is to invert a matrix- condition number of \(\mathbf{A} = \parallel\mathbf{A^T}\parallel\times \parallel \mathbf{A} \parallel\) link: == \(\parallel \mathbf{A^{-1}} \parallel \parallel \mathbf{A} \parallel = \frac{\sigma_{max} (\mathbf{A})}{\sigma_{min} (\mathbf{A})}\)
- matrix norms: a way of sizing up a matrix, a small change to a matrix can change the norm in a great scale
- large condition number of \(\mathbf{A}\)
= ill-conditioned/ill-posed problem
= inaccurate result
= hard to invert - view condition number as error multiplier of solution of linear system \(\mathbf{A}x = b\)
for an invertible matrix:
\(\Delta b\) is the error of the collected data
\(\mathbf{x^*}\) is the solution of the system
\(\mathbf{A}x = b + \Delta b\)
\(\mathbf{x} = \mathbf{A^{-1}} (\mathbf{b} + \Delta \mathbf{b}) = \mathbf{A^{-1}} \mathbf{b} + \mathbf{A^{-1}} \Delta \mathbf{b} = \mathbf{x^*} + \mathbf{A^{-1}} \Delta \mathbf{b}\)
intuitively, if \(A\) is small then \(A^{-1}\) is large
norm
- The 1-norm of a square matrix is the maximum of the absolute column sums
$ \parallel \mathbf{A} \parallel 1 = max {\substack{1<j<n}}(\Sigma |a{ij}|)$ - The infinity-norm of a square matrix is the maximum of the absolute row sums.
- The Euclidean norm of a square matrix is the square root of the sum of all the squares of the elements.
- link: norm and condition number of matrix
diagonalizing a matrix \(A_{n \times n}\)
- by using the eigenvactor properly; make A^n computation easy
- eigenvector linearly independent then \(A = S \Lambda S^{-1}\)
- in this way, A is similar to dia matrix which is made of eigenvalue
- remark:
- any A without repeated \(\lambda\) is diagonalizable
- to diagonalize A we must use eigenvector
- invertibility -- eigenvalue is zero or not
- diagonalizability -- eigenvectors is too few or enough
symmetry matrix:
- spetral theroem: eyery symmetric matrix A ha a complete set of orthogonal eigtnvectors there fore \(A = s \Lambda S^{-1}\) => \(A = Q \Lambda Q^{T}\)
- are
positive definite
if :eigenvalues are all positive; upper left determinants are positive; all pivots are positive; x^TAx is positive except x = 0 - x^TAx = 1 --> ellipse
decomposition: eigen-de and SVD
singular value
- singular value related to the distance between a matrix and the set of singular matrices from link
- singular value \(\sigma\) and singular vector \(v,u\) of \(\mathbf{A}\)
- \(\mathbf{A}v = \sigma u\)
- \(\mathbf{A^H}u = \sigma v\)
singular value
: used when transform from V to differV
eigenvalue:
- used when the matrix is a transformation from one vector space on itself: corres. to stability parameters
- The product of the n eigenvalues of A is the same as the determinant of A (det(A) = det(\(\lambda I\)))
- n × n matrix A and its transpose A^T have the same eigenvalues.
- λ2 is an eigenvalue of A2 prove here
- product of pivots = determinant = product of eigenvalues
eigenvalue decomposition
- degree of charecteristic polynomial of a square matrix = size of the square matrix
- \(\mathbf{AX} = \mathbf{X\Lambda}\)
- notice that: \(\begin{bmatrix} \lambda_1 \mathbf{v_1} & \lambda_2 \mathbf{v_2} & \lambda_3 \mathbf{v_2}\end{bmatrix} = \begin{bmatrix} \mathbf{v_1} &\mathbf{v_2} & \mathbf{v_3} \end{bmatrix} \begin{bmatrix}\lambda_1 & 0 & 0\\0&\lambda_2 & 0\\0&0&\lambda_3\end{bmatrix}\)
- if eigenvector of A is indenpendent then A can be 'eigen decomposite'
\(\mathbf{A} = \mathbf{X \Lambda X^{-1}}\)
in this case, A is similar to diag matrix thereforeorthogonal diagonol
- noticed that all sysmetry matrix have the above properties
- only diagonalizable matrix can be engendecomposite
Singular Value Decomposition
- optimal low rank approximation of matrix A
idea: want \(v_1 \bot v_2\) and also \(Av_1 \bot Av_2\), then
\(A \begin{bmatrix} v_1 & v_2 \end{bmatrix} = \begin{bmatrix} Av_1 & Av_2 \end{bmatrix} = \begin{bmatrix} \sigma_1 u_1 & \sigma_2 u_2 \end{bmatrix} = \begin{bmatrix} u_1 & u_2 \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 \\ 0&\sigma_2 \end{bmatrix}\)
so that we have:- \(\Sigma\) is diag matrix with singular alue as entry,
- U and V: orthogonal matrix
cols of U: left singular vectors (gene coefficient vectors)
\(AV = U\Sigma\)
\(A^H U = V \Sigma^H\)
here
\(A = U \Sigma V^H\)- to cal \(\sigma_{1,2}\)
- \(A = U\SigmaV^{T}\)
- \(A^T = V\Sigma^T U^T\)
- $A A^T = U\Sigma V^{T} V\Sigma^T U^T $
- $A A^T = U \Sigma \Sigma U^T $
- $A A^T = U \begin{bmatrix} \sigma_{1}^2 & 0 \ 0&\sigma_{2}^2 \end{bmatrix} U^T $
NOTICE: \(\Sigma^2\) is diagonal matrix with entry as eigenvalue of \(AA^T\),and U with eigenvalue of it
- ? singular vector can be chosen to be perpendicular to each other so U and V are orthogonal and th ematrix can be 'singular value decomposite'
$AA^{T}$
is always invertible?ECONOMY SVD decomposition
remark
- singular value = length of the radius of ellipse (to be proved)
- if A is symmetry, eigenvector of A_2*2 is the on the radius(long and short) of the ellipse, see
eigshow
in MATLAB, eigenvalue equal to the length of the radius, \(\lambda = \sigma\) - eigenvalue decomposition is not applied to all matrix(most have non-invertable eigenvector-matrix) --
nonexistence diffculty
and even if the matrix can be eigen-decomposite, it might not provide a basis for robust computation(have ill-conditioned invertable eigenvector-matrix) --robustness diffculty
why are we looking for robust computation
in numerical cal, how to cal a inverse of a matrix if it is not invertible? how to approximate?
why want eigen-decomposition and try to overcome the diffculty? can SVD replace it ? is SVD applied to all matrix?
defective matrix:
a matrix with at least one multiple eigenvalue that does not have a full set of linearly independent eigenvectors, example is a matrix which zero is an eigenvalue of multiplicity five that has only one eigenvector.Jordan Canonical form(JCF) decomposition
usegeneralized eigenvectors
to make up the place left by missed eigenvectors for the defective matrix. if A non-defective, JCF == eigenvalue decomposition; otherwise \(A = XJX^{-1}\) J is triangle matrix- SVD decomposition is accurate?
sensitivity of eigenvalues
- The sensitivity of the eigenvalues \(\Lambda\) is estimated by the condition number
of the matrix of eigenvectors. from link:
assume A can be eigenvalue decomposition: then
\(\parallel \delta \Lambda \parallel <= \mathbf{ \parallel X^{-1} \parallel \parallel X \parallel \parallel \delta A \parallel} = \kappa (X) \parallel \delta A \parallel\)
which gives a rough idea that the sensity of \(\lambda\) is related to the condition number of the eigenvector-matrix - The condition number of the eigenvector matrix \(\kappa (X)\) is an upper bound for the individual
eigenvalue condition numbers \(\kappa (\lambda,A)\). - The eigenvalues of symmetric and Hermitian matrices are perfectly well conditioned \(\kappa (\lambda,A) = 1\)
- sensitivity of eigenvalue will lead to inaccuracy of the computed value of \(\lambda\) caused by the round off error
how to decided whether a matrix is invertible or not
Jordan form
- for matrix that are not diagonalizable, want to find \(M^{-1}AM\) with M is nearly diagonal as possible: triangle
- jordan matrix:
\(\begin{bmatrix} 5&1&0\\0&5&1\\0&0&5 \end{bmatrix}\) with \(\lambda = 5,5,5\) - jordan theory: \(J^T\) is similar to \(J\), with the 'reverse indentity matrix'
- key face: matrix J is similar to every matrix A with engenvalues 5,5,5 and one line of eigenvectors
- for \(A_{n \times n}\) with s independent eigenvctors, it is similar to a matrix that has s Jordan blocks, i,e
\(A = M^{-1}JM = \begin{bmatrix} v1&v2\cdots & vs \end{bmatrix} \begin{bmatrix} J1 &\cdots &\cdots \\ \cdots &\ddots&\cdots \\ \cdots & \cdots &Js \end{bmatrix} M\) - Jordan form 的概念就是, 在big family of similar matrix 里面,选一个最接近diag 的matrix作为J,那么if A and B share the same Jordan form then A is similar to B (not otherwise)
decomposition
discrete fourier transform
- "inreo to linear alge p335"
- when we multiply the eigenvector-matrix of the fourier matrix, we split the sinal into pure frequences
similar transform
- not changed: wigenvalue,trace,determinant, rank,num of independent eigenvectors, jordan form
- changed: eigenvectors,nullspace,column space, row space, left nullspace, singular values
Fourier series: linear algebra for function
- fo ra vector in inifinite dimension - it can be vectors with inifinite enties or a function
- we have infinite vector \(\mathbf{v} = \begin{bmatrix} v_1 & v_2 & \cdots \end{bmatrix}\) and \(\mathbf{u} = \begin{bmatrix} u_1 & u_2 & \cdots \end{bmatrix}\)
- DEF: inner product of vectors <u,v> = u1v1 + u2v2 + u3v3 + ....
- DEF: only vectors with finite length is in our infinite-dimensional "Hilbert space", there for \(\Sigma_{n: 1,2,\cdots} vn\) = finite number, {\(v_n\)} converge
- therefore, \(\parallel \mathbf{vu} \parallel\) is a finite number even they are vector in infinite dimension
- DEF: for function f , define the inner product:
\(<f(x),g(x)> = \int_{0 - 2pi} f(x)g(x) dx\)
require: \(\parallel f \parallel ^2 = \int_{0 - 2pi} f^2(x) dx\) (= finite numeber) - we now can have a list of function \(\begin{bmatrix} 1 & f_1(x) & g_1(x) & f_2(x) & g_2(x) & \cdots \end{bmatrix} = \begin{bmatrix} 1 & cos(x) & sin(x) & cos(2x) & sin(2x) & \cdots \end{bmatrix}\) the vectors(functions) can be view as basis vectors, so that any vectors(function) can be written as the linear combination of the basis vectors :
\(f(x) = \begin{bmatrix} 1 & f_1(x) & g_1(x) & f_2(x) & g_2(x) & \cdots \end{bmatrix} \begin{matrix} a_0\\a_2\\a_3\\ \vdots \end{bmatrix} = \begin{bmatrix} a_0 + a_1cos(x) + a_2sin(x)+a_3 cos(2x) + a_4sin(2x) & \cdots \end{bmatrix}\) - we can find that when f(x) = 1 $|f|^2 = 2 pi $ so to make the basis vectors orthonormal(the vectors perpendicular/orthogonal to each other alread), we divided then by the length therefore:
\(\begin{bmatrix} 1/ \sqrt{2pi} & f_1(x)/ \sqrt{pi} & g_1(x)/ \sqrt{pi} & f_2(x)/ \sqrt{pi} & g_2(x)/ \sqrt{pi} & \cdots \end{bmatrix} = \begin{bmatrix} 1/ \sqrt{2pi} & cos(x)/ \sqrt{pi} & sin(x)/ \sqrt{pi} & cos(2x)/ \sqrt{pi} & sin(2x)/ \sqrt{pi} & \cdots \end{bmatrix}\) - in this way, \(\|f\|^2 = \|a_{0}^2 + a_{1}^2 + a_{2}^2 + \cdots \|\)