Machine learning review week03

introduction to linear algebra 总算差不多看完了, 还差QR ALGORITHM AND PCA....


faster matrix multiplcation?


matrix properties:

  • matrix multiplication is used in
  1. multiple linear regression:

    \(y_n = f(\mathbf{x_n}):= \mathbf{\widetilde{X}_n^T} \mathbf{\beta}\)

    $\mathbf{\beta} =
    \begin{bmatrix}\beta_0 \
    \beta_1\\vdots \
    \beta_D \end{bmatrix} $

    $\mathbf{\widetilde{X}_n^T} = \begin{bmatrix} 1 \ \mathbf{x_n^T}\end{bmatrix} $

  2. cost function --gradient descend and lease square

deficiency, ill-conditional, non-invertible, singular, condition number of a matrix

  • singular matrix:
    not invertible
    = have dependent rows
    = \(rank(\mathbf{A})<\) # of rows
    = not full rank
    = cols are linearly dependent
    = some rows of the matrix donot have pivots, it can happen when there are more cols than rows in the matrix, or some of the rows are duplicated
    ? eigenvalues if an inveritable matrix are all different from zero?
    why we need the matrix to be invertible?
    how to make a singular matrix non-singular
    fact: most of the matrix are singular, adding noise can make them non-singluar -- call regularization

    condition number
  • condition number measure how easy it is to invert a matrix
    • condition number of \(\mathbf{A} = \parallel\mathbf{A^T}\parallel\times \parallel \mathbf{A} \parallel\) link: == \(\parallel \mathbf{A^{-1}} \parallel \parallel \mathbf{A} \parallel = \frac{\sigma_{max} (\mathbf{A})}{\sigma_{min} (\mathbf{A})}\)
    • matrix norms: a way of sizing up a matrix, a small change to a matrix can change the norm in a great scale
    • large condition number of \(\mathbf{A}\)
      = ill-conditioned/ill-posed problem
      = inaccurate result
      = hard to invert
    • view condition number as error multiplier of solution of linear system \(\mathbf{A}x = b\)
  • for an invertible matrix:
    \(\Delta b\) is the error of the collected data
    \(\mathbf{x^*}\) is the solution of the system
    \(\mathbf{A}x = b + \Delta b\)
    \(\mathbf{x} = \mathbf{A^{-1}} (\mathbf{b} + \Delta \mathbf{b}) = \mathbf{A^{-1}} \mathbf{b} + \mathbf{A^{-1}} \Delta \mathbf{b} = \mathbf{x^*} + \mathbf{A^{-1}} \Delta \mathbf{b}\)
    intuitively, if \(A\) is small then \(A^{-1}\) is large

norm
  • The 1-norm of a square matrix is the maximum of the absolute column sums
    $ \parallel \mathbf{A} \parallel 1 = max {\substack{1<j<n}}(\Sigma |a{ij}|)$
  • The infinity-norm of a square matrix is the maximum of the absolute row sums.
  • The Euclidean norm of a square matrix is the square root of the sum of all the squares of the elements.
  • link: norm and condition number of matrix

diagonalizing a matrix \(A_{n \times n}\)

  • by using the eigenvactor properly; make A^n computation easy
  • eigenvector linearly independent then \(A = S \Lambda S^{-1}\)
  • in this way, A is similar to dia matrix which is made of eigenvalue
  • remark:
    • any A without repeated \(\lambda\) is diagonalizable
    • to diagonalize A we must use eigenvector
    • invertibility -- eigenvalue is zero or not
    • diagonalizability -- eigenvectors is too few or enough

symmetry matrix:

  • spetral theroem: eyery symmetric matrix A ha a complete set of orthogonal eigtnvectors there fore \(A = s \Lambda S^{-1}\) => \(A = Q \Lambda Q^{T}\)
  • are positive definiteif :eigenvalues are all positive; upper left determinants are positive; all pivots are positive; x^TAx is positive except x = 0
  • x^TAx = 1 --> ellipse

decomposition: eigen-de and SVD

singular value
  • singular value related to the distance between a matrix and the set of singular matrices from link
  • singular value \(\sigma\) and singular vector \(v,u\) of \(\mathbf{A}\)
  • \(\mathbf{A}v = \sigma u\)
  • \(\mathbf{A^H}u = \sigma v\)
  • singular value : used when transform from V to differV
eigenvalue:
  • used when the matrix is a transformation from one vector space on itself: corres. to stability parameters
  • The product of the n eigenvalues of A is the same as the determinant of A (det(A) = det(\(\lambda I\)))
  • n × n matrix A and its transpose A^T have the same eigenvalues.
  • λ2 is an eigenvalue of A2 prove here
  • product of pivots = determinant = product of eigenvalues
eigenvalue decomposition
  • degree of charecteristic polynomial of a square matrix = size of the square matrix
  • \(\mathbf{AX} = \mathbf{X\Lambda}\)
  • notice that: \(\begin{bmatrix} \lambda_1 \mathbf{v_1} & \lambda_2 \mathbf{v_2} & \lambda_3 \mathbf{v_2}\end{bmatrix} = \begin{bmatrix} \mathbf{v_1} &\mathbf{v_2} & \mathbf{v_3} \end{bmatrix} \begin{bmatrix}\lambda_1 & 0 & 0\\0&\lambda_2 & 0\\0&0&\lambda_3\end{bmatrix}\)
  • if eigenvector of A is indenpendent then A can be 'eigen decomposite'
    \(\mathbf{A} = \mathbf{X \Lambda X^{-1}}\)
    in this case, A is similar to diag matrix therefore orthogonal diagonol
  • noticed that all sysmetry matrix have the above properties
  • only diagonalizable matrix can be engendecomposite
Singular Value Decomposition
  • optimal low rank approximation of matrix A
  • idea: want \(v_1 \bot v_2\) and also \(Av_1 \bot Av_2\), then
    \(A \begin{bmatrix} v_1 & v_2 \end{bmatrix} = \begin{bmatrix} Av_1 & Av_2 \end{bmatrix} = \begin{bmatrix} \sigma_1 u_1 & \sigma_2 u_2 \end{bmatrix} = \begin{bmatrix} u_1 & u_2 \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 \\ 0&\sigma_2 \end{bmatrix}\)
    so that we have:

  • \(\Sigma\) is diag matrix with singular alue as entry,
  • U and V: orthogonal matrix
  • cols of U: left singular vectors (gene coefficient vectors)
    \(AV = U\Sigma\)
    \(A^H U = V \Sigma^H\)
    here
    \(A = U \Sigma V^H\)

  • to cal \(\sigma_{1,2}\)
  • \(A = U\SigmaV^{T}\)
  • \(A^T = V\Sigma^T U^T\)
  • $A A^T = U\Sigma V^{T} V\Sigma^T U^T $
  • $A A^T = U \Sigma \Sigma U^T $
  • $A A^T = U \begin{bmatrix} \sigma_{1}^2 & 0 \ 0&\sigma_{2}^2 \end{bmatrix} U^T $
  • NOTICE: \(\Sigma^2\) is diagonal matrix with entry as eigenvalue of \(AA^T\),and U with eigenvalue of it

  • ? singular vector can be chosen to be perpendicular to each other so U and V are orthogonal and th ematrix can be 'singular value decomposite'
  • $AA^{T}$ is always invertible?

  • ECONOMY SVD decomposition

remark
  • singular value = length of the radius of ellipse (to be proved)
  • if A is symmetry, eigenvector of A_2*2 is the on the radius(long and short) of the ellipse, see eigshow in MATLAB, eigenvalue equal to the length of the radius, \(\lambda = \sigma\)
  • eigenvalue decomposition is not applied to all matrix(most have non-invertable eigenvector-matrix) -- nonexistence diffcultyand even if the matrix can be eigen-decomposite, it might not provide a basis for robust computation(have ill-conditioned invertable eigenvector-matrix) --robustness diffculty
    why are we looking for robust computation
    in numerical cal, how to cal a inverse of a matrix if it is not invertible? how to approximate?
    why want eigen-decomposition and try to overcome the diffculty? can SVD replace it ? is SVD applied to all matrix?
  • defective matrix: a matrix with at least one multiple eigenvalue that does not have a full set of linearly independent eigenvectors, example is a matrix which zero is an eigenvalue of multiplicity five that has only one eigenvector.
  • Jordan Canonical form(JCF) decomposition use generalized eigenvectors to make up the place left by missed eigenvectors for the defective matrix. if A non-defective, JCF == eigenvalue decomposition; otherwise \(A = XJX^{-1}\) J is triangle matrix
  • SVD decomposition is accurate?
sensitivity of eigenvalues
  • The sensitivity of the eigenvalues \(\Lambda\) is estimated by the condition number
    of the matrix of eigenvectors. from link:
    assume A can be eigenvalue decomposition: then
    \(\parallel \delta \Lambda \parallel <= \mathbf{ \parallel X^{-1} \parallel \parallel X \parallel \parallel \delta A \parallel} = \kappa (X) \parallel \delta A \parallel\)
    which gives a rough idea that the sensity of \(\lambda\) is related to the condition number of the eigenvector-matrix
  • The condition number of the eigenvector matrix \(\kappa (X)\) is an upper bound for the individual
    eigenvalue condition numbers \(\kappa (\lambda,A)\).
  • The eigenvalues of symmetric and Hermitian matrices are perfectly well conditioned \(\kappa (\lambda,A) = 1\)
  • sensitivity of eigenvalue will lead to inaccuracy of the computed value of \(\lambda\) caused by the round off error
    how to decided whether a matrix is invertible or not

Jordan form

  • for matrix that are not diagonalizable, want to find \(M^{-1}AM\) with M is nearly diagonal as possible: triangle
  • jordan matrix:
    \(\begin{bmatrix} 5&1&0\\0&5&1\\0&0&5 \end{bmatrix}\) with \(\lambda = 5,5,5\)
  • jordan theory: \(J^T\) is similar to \(J\), with the 'reverse indentity matrix'
  • key face: matrix J is similar to every matrix A with engenvalues 5,5,5 and one line of eigenvectors
  • for \(A_{n \times n}\) with s independent eigenvctors, it is similar to a matrix that has s Jordan blocks, i,e
    \(A = M^{-1}JM = \begin{bmatrix} v1&v2\cdots & vs \end{bmatrix} \begin{bmatrix} J1 &\cdots &\cdots \\ \cdots &\ddots&\cdots \\ \cdots & \cdots &Js \end{bmatrix} M\)
  • Jordan form 的概念就是, 在big family of similar matrix 里面,选一个最接近diag 的matrix作为J,那么if A and B share the same Jordan form then A is similar to B (not otherwise)

decomposition

discrete fourier transform

  • "inreo to linear alge p335"
  • when we multiply the eigenvector-matrix of the fourier matrix, we split the sinal into pure frequences

similar transform

  • not changed: wigenvalue,trace,determinant, rank,num of independent eigenvectors, jordan form
  • changed: eigenvectors,nullspace,column space, row space, left nullspace, singular values

Fourier series: linear algebra for function

  • fo ra vector in inifinite dimension - it can be vectors with inifinite enties or a function
  • we have infinite vector \(\mathbf{v} = \begin{bmatrix} v_1 & v_2 & \cdots \end{bmatrix}\) and \(\mathbf{u} = \begin{bmatrix} u_1 & u_2 & \cdots \end{bmatrix}\)
  • DEF: inner product of vectors <u,v> = u1v1 + u2v2 + u3v3 + ....
  • DEF: only vectors with finite length is in our infinite-dimensional "Hilbert space", there for \(\Sigma_{n: 1,2,\cdots} vn\) = finite number, {\(v_n\)} converge
  • therefore, \(\parallel \mathbf{vu} \parallel\) is a finite number even they are vector in infinite dimension
  • DEF: for function f , define the inner product:
    \(<f(x),g(x)> = \int_{0 - 2pi} f(x)g(x) dx\)
    require: \(\parallel f \parallel ^2 = \int_{0 - 2pi} f^2(x) dx\) (= finite numeber)
  • we now can have a list of function \(\begin{bmatrix} 1 & f_1(x) & g_1(x) & f_2(x) & g_2(x) & \cdots \end{bmatrix} = \begin{bmatrix} 1 & cos(x) & sin(x) & cos(2x) & sin(2x) & \cdots \end{bmatrix}\) the vectors(functions) can be view as basis vectors, so that any vectors(function) can be written as the linear combination of the basis vectors :
    \(f(x) = \begin{bmatrix} 1 & f_1(x) & g_1(x) & f_2(x) & g_2(x) & \cdots \end{bmatrix} \begin{matrix} a_0\\a_2\\a_3\\ \vdots \end{bmatrix} = \begin{bmatrix} a_0 + a_1cos(x) + a_2sin(x)+a_3 cos(2x) + a_4sin(2x) & \cdots \end{bmatrix}\)
  • we can find that when f(x) = 1 $|f|^2 = 2 pi $ so to make the basis vectors orthonormal(the vectors perpendicular/orthogonal to each other alread), we divided then by the length therefore:
    \(\begin{bmatrix} 1/ \sqrt{2pi} & f_1(x)/ \sqrt{pi} & g_1(x)/ \sqrt{pi} & f_2(x)/ \sqrt{pi} & g_2(x)/ \sqrt{pi} & \cdots \end{bmatrix} = \begin{bmatrix} 1/ \sqrt{2pi} & cos(x)/ \sqrt{pi} & sin(x)/ \sqrt{pi} & cos(2x)/ \sqrt{pi} & sin(2x)/ \sqrt{pi} & \cdots \end{bmatrix}\)
  • in this way, \(\|f\|^2 = \|a_{0}^2 + a_{1}^2 + a_{2}^2 + \cdots \|\)

转载于:https://www.cnblogs.com/xhblog/p/4854116.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值