C2-Linear Algebra

issory

于 2018-10-20 19:04:08 发布

阅读量179

点赞数

分类专栏： Deep Learning Note of Book

本文链接：https://blog.csdn.net/u011310345/article/details/83215934

版权

Deep Learning 同时被 2 个专栏收录

7 篇文章 0 订阅

订阅专栏

Note of Book

7 篇文章 0 订阅

订阅专栏

concepts

linear algebra is a form of continuous rather than discrete mathematics
detailed information --> Matrix Cookbook (Petersen and Pedersen, 2006)
Scalars:–>(ponit)
- A scalar is just a single number
- $a=a^T$
Vectors:–>(line)
- A vector is an array of numbers, the numbers are arranged in order.A standard column vevtor
Matrices:–>(face)
- A matrix is a 2-D array of numbers, so each element is indentified by tow indices.
- Note:transpose for matrix: $(\mathbf{A}^T)_{i,j}=A_{j,i}$
- $\mathbf{C} = \mathbf{A}+\mathbf{B}$ where $C_{i,j}=A_{i,j}+B_{i,j}$ .
Tensors:–>(volume)
- A tensors is an array of numbers arranges on a regular grid with a variable number of axes.

NOTE:
$\mathbf{D}=a\cdot \mathbf{B}+c,\text{ where }D_{i,j} = a\cdot B_{i,j}+c$
Broadcasting:
$\mathbf{C} = \mathbf{A}+\mathbf{b},\text{ where }C_{i,j} = A_{i,j}+b_{j}.$

matrix product:
- $\mathbf{C} = \mathbf{AB}$ , where $C_{i,j}=\sum\limits_k A_{i,k}B_{k.j}$
- also called element-wise product or Hadamard product and denoted as $\mathbf{A}\odot \mathbf{B}$
- distribution: $\mathbf{A(B+C)=AB+AC}$
- association: $\mathbf{A(BC)=(AB)C}$
- uncommutation: $\mathbf{AB\neq BA}$
- transpose: $(\mathbf{AB})^T=\mathbf{B}^T\mathbf{A}^T$
dot product: (for two vector $\mathbf{x}$ and $\mathbf{y}$ )
- likely martix product $\mathbf{x^Ty}$
- communication: $\mathbf{x^Ty=y^Tx}$
linear equations:
- $\mathbf{Ax=b}$ , where $\mathbf{A}\in\mathbb{R}^{m\times n}$ is a known matrix, $\mathbf{b}\in\mathbb{R}^m$ is a known vector, and $\mathbf{x}\in\mathbb{R}^n$ is a vector of unknown variables to be solved.
matrix inversion
- denoted by $\mathbf{A}^{-1}$ , and define as $\mathbf{A}^{-1}\mathbf{A=I}_n$
identity matrix
- $\mathbf{I}_n\in\mathbb{R}^{n\times n}$ , and we have $\forall \mathbf{x}\in\mathbb{R}^n,\mathbf{I}_n\mathbf{x=x}$

Linear Dependence

The geometry meaning of linear equations
$\mathbf{A}=\begin{bmatrix} a_{1,1}&a_{1,2}&\cdots&a_{1,N}\\ a_{2,1}&a_{2,2}&\cdots&a_{2,N}\\ \vdots&\vdots&\ddots&\vdots\\ a_{N,1}&a_{N,2}&\cdots&a_{N,N} \end{bmatrix} = \begin{bmatrix} \mathbf{A}_{:,1}&\mathbf{A}_{:,2}&\cdots&\mathbf{A}_{:,N} \end{bmatrix}$
linear combination
- $\{v^{(1)},\cdots,v^{(n)}\}$ is given by $\sum\limits_ic_iv^{(i)}$
Span
- the span of a set of vectors is the set of all points obtainable by linear combination of the original vectors.
- if $\mathbf{Ax=b}$ has a solution, $\mathbf{b}$ is in the span of the columns of $\mathbf{A}$ .–> column space or the range of $\mathbf{A}$ .
- $n\geq m$ is the necessary condition
- A set of vectors is linearly independent if no vector om the set is a linear combination of the oter vectors.
- A square matrix with linearly dependent colums is known as singular.
- For square matrices, the left inverse and the right inverse are equal.
Norms
- measure the size of vectors
- $L^p$ norm is given by $\|\mathbf{x}\|_p=(\sum\limits_i|x_i|^p)^{\frac{1}{p}}$ for $p\in\mathbb{R},p\geq1$
- functions mapping vectors to non-negative values.
- satisfies the properties:
  - $f(\mathbf{x})=0$ => $\mathbf{x}=0$
  - $f(\mathbf{x+y})\leq f(\mathbf{x})+f(\mathbf{y})$ (the triangle inequality)
  - $\forall\alpha\in\mathbb{R},f(\alpha\mathbf{x})=|\alpha f(\mathbf{x})|$
- $L^2$ is the Euclidean norm between the origin and the point $\mathbf{x}$ .
  - denoted as $\|\mathbf{x}\|$ , calculated as $\mathbf{x}^2$ .
  - increases very slowly near the origin.
- $L^1$ is the grows at the same rate in all locations.
  - $\|\mathbf{x}\|_1=\sum\limits_i|x_i|$
- $L^0$ measure the size of the vector by counting its number of nonzero elements.
  - not a norm, always used $L^1$ as a substitute.
- $L^{\infty}$ known as the max norm
  - $\|\mathbf{x}\|_{\infty}=\max\limits_i|x_i|$
- $L^2$ of matrix is Frobenius norm
  - used to measure the size of a matrix
  - $\|A\|_F=\sqrt{\sum\limits_{i,j}A^2_{i,j}}$
- Represented dot product by norms: $\mathbf{x}^T\mathbf{y}=\|\mathbf{x}\|_2\|\mathbf{y}\|_2\cos\theta$ , where $\theta$ is the angle between $\mathbf{x}$ and $\mathbf{y}$ .
Special Kinds of Matrices and Vectors
- Diagonal matrices:
  - $\mathbf{D}$ is a diagonal, if and only if $D_{i,j}=0$ for all $\neq j$ .
  - diag( $\mathbf{v}$ ) denote the diagonal matrix whose diagonal entries are given by the vector $\mathbf{v}$ .
  - diag( $\mathbf{v}$ ) $\mathbf{x}=\mathbf{v}\odot\mathbf{x}$
  - if every diagonal entry is nonzero, diag( $\mathbf{v}$ ) $^{-1}$ =diag([ $1/v_1,\cdots,1/v_n$ ]).
  - for a non-square diagonal matrix $\mathbf{D}$ , if $\mathbf{D}$ is taller than it is wide, concatenating some zeros to the results; if $\mathbf{D}$ is wider than it is tall, discarding some of the last elements of the vector.
- symmetric matrix:
  - $\mathbf{A}=\mathbf{A}^T$
- unit vector
  - unit norm: $\|\mathbf{x}\|_2=1$
  - orthogonal: $\mathbf{x}^T\mathbf{y}=0$
  - orthonormal: orthogonal and unit
- orthogonal matrix:
  - $\mathbf{A}^T\mathbf{A}=\mathbf{A}\mathbf{A}^T=\mathbf{I}$
  - Implies: $\mathbf{A}^{-1}=\mathbf{A}^T$
Eigendecomposition
- decompose a matrix into a set of eigenvectors and eigenvalues.
- eigenvector and eigenvalues
  - a square matrix $\mathbf{A}$ , a non-zero vector $\mathbf{v}$ ,satisfied $\mathbf{Av}=\lambda\mathbf{v}$ .
  - $\lambda$ is the eigenvalue corresponding to this eigenvector $\mathbf{v}$ . ++Also there are left eigenvector $\mathbf{v}^T\mathbf{A}=\lambda\mathbf{v}^T$ ++
  - for $s\in\mathbb{R},s\neq0$ , if $\mathbf{v}$ is an eigenvector of $\mathbf{A}$ , then $s\mathbf{v}$ has the same eigenvalue.
  - let $\mathbf{V}=[\mathbf{v}^{(1)},\cdots,\mathbf{v}^{(1)}],\mathbf{\lambda}=[\lambda_1,\cdots,\lambda_n]^T$ , then $\mathbf{V}diag(\mathbf{\lambda})=\mathbf{AV}$ , namely $\mathbf{A}=\mathbf{V}diag(\mathbf{\lambda})\mathbf{V}^{-1}$ .
- every real symmetric matrix can be decomposed
  - $\mathbf{A=Q\Lambda Q}^T$ , where $\mathbf{Q}$ an orthogonal matrix composed of eigenvectors of $\mathbf{A}$ and $\mathbf{\Lambda}$ is a diagonal matrix.
  - the eigenvalue $\Lambda_{i,i}$ is associated with the eigenvector $Q_{:,i}$
  - is not unique.
  - if any of eigenvalues are zero, the matrix is singluar
  - eigendecomposition:
    - $f(\mathbf{x})=\mathbf{x}^T\mathbf{Ax}$ , subject to $\|\mathbf{x}\|_2=1$ .
    - if $\mathbf{x}$ is the eigenvector of $\mathbf{A}$ , $f$ is the eigenvalue.
    - the maximum/minimum value of $f$ is the maximum/minimum eigenvalue
    - positive definite: a matrix with all positive eigenvalue. $\mathbf{x}^T\mathbf{Ax}=0$ ==> $\mathbf{x}=0$
    - positive semidefinite: a matrix with all positive or zero eigenvalue.==> $\forall \mathbf{x},\mathbf{x}^T\mathbf{Ax}\geq 0$
    - negative definite
    - negative semidefinite
Singular Value Decompsition:
- every real matrix has a singular value decomposition
- $\mathbf{A=UDV}^T,\mathbf{A}_{m\times n},\mathbf{U}_{m\times m},\mathbf{D}_{m\times n},\mathbf{V}_{n\times n}$ .
- $\mathbf{U,V}$ is the orthogonal matrix
- $\mathbf{D}$ is the diagonal matrix and is the singular value of $\mathbf{A}$
- the columns of $\mathbf{U}$ are the left-singular vectors.
- the columns of $\mathbf{V}$ are the right-singular vectors.
The Moore-Penrose Pseudoinverse
- define: $\mathbf{A}^+=\lim\limits_{\alpha\searrow 0}(\mathbf{A}^T\mathbf{A}+\alpha\mathbf{I})^{-1}\mathbf{A}^T$
- $\mathbf{AA}^+\mathbf{A=A}$
- $\mathbf{A}^+\mathbf{AA}^+=\mathbf{A}^+$
- $\mathbf{AA}^+,\mathbf{A}^+\mathbf{A}$ are symmetric
- Compute: $\mathbf{A}^+=\mathbf{VD}^+\mathbf{U}^T$ , where $\mathbf{U,V,D}$ are the singular value decomposition of $\mathbf{A}$ .
- if $\mathbf{A}$ is a wide matrix, $\mathbf{x=A}^+\mathbf{y}$ with the minimal Euclidean norm among all possible solutions.
- if $\mathbf{A}$ is a tall matrix, $\mathbf{Ax}$ is as close as possible to $\mathbf{y}$ .
Trace Operator
- Define: $\text{Tr}(\mathbf{A})=\sum\limits_i\mathbf{A}_{i,i}$
- Frobenius norm of a matrix: $\|A\|_F=\sqrt{\text{Tr}(\mathbf{AA}^T)}$
- $\text{Tr}(\mathbf{A})=\text{Tr}(\mathbf{A}^T)$
- $\text{Tr}(\mathbf{ABC})=\text{Tr}(\mathbf{CAB})=\text{Tr}(\mathbf{BCA})$ , more generally, $\text{Tr}(\prod\limits_{i=1}^n)=\text{Tr}(\mathbf{F}^{(n)}\prod\limits_{i=1}^{n-1}\mathbf{F}^{(i)})$
- $a=\text{Tr}(a)$
The Determinant
- define for a square matrix: $\det(\mathbf{A})=\prod\limits_{i=1}^n\lambda_i$
- The absolute value of the determinant can be thought of as a measure of how much multiplication by the matrix expands or contracts space.
- If the determinant is 0, then space is contracted(收缩) completely along at leastone dimension, causing it to lose all of its volume.
- If the determinant is 1, thenthe transformation preserves(保留) volume.