concepts
- linear algebra is a form of continuous rather than discrete mathematics
- detailed information --> Matrix Cookbook (Petersen and Pedersen, 2006)
- Scalars:–>(ponit)
- A scalar is just a single number
- a = a T a=a^T a=aT
- Vectors:–>(line)
- A vector is an array of numbers, the numbers are arranged in order.A standard column vevtor
- Matrices:–>(face)
- A matrix is a 2-D array of numbers, so each element is indentified by tow indices.
- Note:transpose for matrix: ( A T ) i , j = A j , i (\mathbf{A}^T)_{i,j}=A_{j,i} (AT)i,j=Aj,i
- C = A + B \mathbf{C} = \mathbf{A}+\mathbf{B} C=A+B where C i , j = A i , j + B i , j C_{i,j}=A_{i,j}+B_{i,j} Ci,j=Ai,j+Bi,j.
- Tensors:–>(volume)
- A tensors is an array of numbers arranges on a regular grid with a variable number of axes.
NOTE:
D
=
a
⋅
B
+
c
,
where
D
i
,
j
=
a
⋅
B
i
,
j
+
c
\mathbf{D}=a\cdot \mathbf{B}+c,\text{ where }D_{i,j} = a\cdot B_{i,j}+c
D=a⋅B+c, where Di,j=a⋅Bi,j+c
Broadcasting:
C
=
A
+
b
,
where
C
i
,
j
=
A
i
,
j
+
b
j
.
\mathbf{C} = \mathbf{A}+\mathbf{b},\text{ where }C_{i,j} = A_{i,j}+b_{j}.
C=A+b, where Ci,j=Ai,j+bj.
- matrix product:
- C = A B \mathbf{C} = \mathbf{AB} C=AB, where C i , j = ∑ k A i , k B k . j C_{i,j}=\sum\limits_k A_{i,k}B_{k.j} Ci,j=k∑Ai,kBk.j
- also called element-wise product or Hadamard product and denoted as A ⊙ B \mathbf{A}\odot \mathbf{B} A⊙B
- distribution: A ( B + C ) = A B + A C \mathbf{A(B+C)=AB+AC} A(B+C)=AB+AC
- association: A ( B C ) = ( A B ) C \mathbf{A(BC)=(AB)C} A(BC)=(AB)C
- uncommutation: A B ≠ B A \mathbf{AB\neq BA} AB̸=BA
- transpose: ( A B ) T = B T A T (\mathbf{AB})^T=\mathbf{B}^T\mathbf{A}^T (AB)T=BTAT
- dot product: (for two vector
x
\mathbf{x}
x and
y
\mathbf{y}
y)
- likely martix product x T y \mathbf{x^Ty} xTy
- communication: x T y = y T x \mathbf{x^Ty=y^Tx} xTy=yTx
- linear equations:
- A x = b \mathbf{Ax=b} Ax=b, where A ∈ R m × n \mathbf{A}\in\mathbb{R}^{m\times n} A∈Rm×n is a known matrix, b ∈ R m \mathbf{b}\in\mathbb{R}^m b∈Rm is a known vector, and x ∈ R n \mathbf{x}\in\mathbb{R}^n x∈Rn is a vector of unknown variables to be solved.
- matrix inversion
- denoted by A − 1 \mathbf{A}^{-1} A−1, and define as A − 1 A = I n \mathbf{A}^{-1}\mathbf{A=I}_n A−1A=In
- identity matrix
- I n ∈ R n × n \mathbf{I}_n\in\mathbb{R}^{n\times n} In∈Rn×n, and we have ∀ x ∈ R n , I n x = x \forall \mathbf{x}\in\mathbb{R}^n,\mathbf{I}_n\mathbf{x=x} ∀x∈Rn,Inx=x
Linear Dependence
- The geometry meaning of linear equations
A = [ a 1 , 1 a 1 , 2 ⋯ a 1 , N a 2 , 1 a 2 , 2 ⋯ a 2 , N ⋮ ⋮ ⋱ ⋮ a N , 1 a N , 2 ⋯ a N , N ] = [ A : , 1 A : , 2 ⋯ A : , N ] \mathbf{A}=\begin{bmatrix} a_{1,1}&a_{1,2}&\cdots&a_{1,N}\\ a_{2,1}&a_{2,2}&\cdots&a_{2,N}\\ \vdots&\vdots&\ddots&\vdots\\ a_{N,1}&a_{N,2}&\cdots&a_{N,N} \end{bmatrix} = \begin{bmatrix} \mathbf{A}_{:,1}&\mathbf{A}_{:,2}&\cdots&\mathbf{A}_{:,N} \end{bmatrix} A=⎣⎢⎢⎢⎡a1,1a2,1⋮aN,1a1,2a2,2⋮aN,2⋯⋯⋱⋯a1,Na2,N⋮aN,N⎦⎥⎥⎥⎤=[A:,1A:,2⋯A:,N]
- linear combination
- { v ( 1 ) , ⋯   , v ( n ) } \{v^{(1)},\cdots,v^{(n)}\} {v(1),⋯,v(n)} is given by ∑ i c i v ( i ) \sum\limits_ic_iv^{(i)} i∑civ(i)
- Span
- the span of a set of vectors is the set of all points obtainable by linear combination of the original vectors.
- if A x = b \mathbf{Ax=b} Ax=b has a solution, b \mathbf{b} b is in the span of the columns of A \mathbf{A} A.–> column space or the range of A \mathbf{A} A.
- n ≥ m n\geq m n≥m is the necessary condition
- A set of vectors is linearly independent if no vector om the set is a linear combination of the oter vectors.
- A square matrix with linearly dependent colums is known as singular.
- For square matrices, the left inverse and the right inverse are equal.
- Norms
- measure the size of vectors
- L p L^p Lp norm is given by ∥ x ∥ p = ( ∑ i ∣ x i ∣ p ) 1 p \|\mathbf{x}\|_p=(\sum\limits_i|x_i|^p)^{\frac{1}{p}} ∥x∥p=(i∑∣xi∣p)p1 for p ∈ R , p ≥ 1 p\in\mathbb{R},p\geq1 p∈R,p≥1
- functions mapping vectors to non-negative values.
- satisfies the properties:
- f ( x ) = 0 f(\mathbf{x})=0 f(x)=0=> x = 0 \mathbf{x}=0 x=0
- f ( x + y ) ≤ f ( x ) + f ( y ) f(\mathbf{x+y})\leq f(\mathbf{x})+f(\mathbf{y}) f(x+y)≤f(x)+f(y) (the triangle inequality)
- ∀ α ∈ R , f ( α x ) = ∣ α f ( x ) ∣ \forall\alpha\in\mathbb{R},f(\alpha\mathbf{x})=|\alpha f(\mathbf{x})| ∀α∈R,f(αx)=∣αf(x)∣
-
L
2
L^2
L2 is the Euclidean norm between the origin and the point
x
\mathbf{x}
x.
- denoted as ∥ x ∥ \|\mathbf{x}\| ∥x∥, calculated as x 2 \mathbf{x}^2 x2.
- increases very slowly near the origin.
-
L
1
L^1
L1 is the grows at the same rate in all locations.
- ∥ x ∥ 1 = ∑ i ∣ x i ∣ \|\mathbf{x}\|_1=\sum\limits_i|x_i| ∥x∥1=i∑∣xi∣
-
L
0
L^0
L0 measure the size of the vector by counting its number of nonzero elements.
- not a norm, always used L 1 L^1 L1 as a substitute.
-
L
∞
L^{\infty}
L∞ known as the max norm
- ∥ x ∥ ∞ = max i ∣ x i ∣ \|\mathbf{x}\|_{\infty}=\max\limits_i|x_i| ∥x∥∞=imax∣xi∣
-
L
2
L^2
L2 of matrix is Frobenius norm
- used to measure the size of a matrix
- ∥ A ∥ F = ∑ i , j A i , j 2 \|A\|_F=\sqrt{\sum\limits_{i,j}A^2_{i,j}} ∥A∥F=i,j∑Ai,j2
- Represented dot product by norms: x T y = ∥ x ∥ 2 ∥ y ∥ 2 cos θ \mathbf{x}^T\mathbf{y}=\|\mathbf{x}\|_2\|\mathbf{y}\|_2\cos\theta xTy=∥x∥2∥y∥2cosθ, where θ \theta θ is the angle between x \mathbf{x} x and y \mathbf{y} y.
- Special Kinds of Matrices and Vectors
- Diagonal matrices:
- D \mathbf{D} D is a diagonal, if and only if D i , j = 0 D_{i,j}=0 Di,j=0 for all i ≠ j i \neq j i̸=j.
- diag( v \mathbf{v} v) denote the diagonal matrix whose diagonal entries are given by the vector v \mathbf{v} v.
- diag( v \mathbf{v} v) x = v ⊙ x \mathbf{x}=\mathbf{v}\odot\mathbf{x} x=v⊙x
- if every diagonal entry is nonzero, diag( v \mathbf{v} v) − 1 ^{-1} −1=diag([ 1 / v 1 , ⋯   , 1 / v n 1/v_1,\cdots,1/v_n 1/v1,⋯,1/vn]).
- for a non-square diagonal matrix D \mathbf{D} D, if D \mathbf{D} D is taller than it is wide, concatenating some zeros to the results; if D \mathbf{D} D is wider than it is tall, discarding some of the last elements of the vector.
- symmetric matrix:
- A = A T \mathbf{A}=\mathbf{A}^T A=AT
- unit vector
- unit norm: ∥ x ∥ 2 = 1 \|\mathbf{x}\|_2=1 ∥x∥2=1
- orthogonal: x T y = 0 \mathbf{x}^T\mathbf{y}=0 xTy=0
- orthonormal: orthogonal and unit
- orthogonal matrix:
- A T A = A A T = I \mathbf{A}^T\mathbf{A}=\mathbf{A}\mathbf{A}^T=\mathbf{I} ATA=AAT=I
- Implies: A − 1 = A T \mathbf{A}^{-1}=\mathbf{A}^T A−1=AT
- Diagonal matrices:
- Eigendecomposition
- decompose a matrix into a set of eigenvectors and eigenvalues.
- eigenvector and eigenvalues
- a square matrix A \mathbf{A} A, a non-zero vector v \mathbf{v} v,satisfied A v = λ v \mathbf{Av}=\lambda\mathbf{v} Av=λv.
- λ \lambda λ is the eigenvalue corresponding to this eigenvector v \mathbf{v} v. ++Also there are left eigenvector v T A = λ v T \mathbf{v}^T\mathbf{A}=\lambda\mathbf{v}^T vTA=λvT++
- for s ∈ R , s ≠ 0 s\in\mathbb{R},s\neq0 s∈R,s̸=0, if v \mathbf{v} v is an eigenvector of A \mathbf{A} A, then s v s\mathbf{v} sv has the same eigenvalue.
- let
V
=
[
v
(
1
)
,
⋯
 
,
v
(
1
)
]
,
λ
=
[
λ
1
,
⋯
 
,
λ
n
]
T
\mathbf{V}=[\mathbf{v}^{(1)},\cdots,\mathbf{v}^{(1)}],\mathbf{\lambda}=[\lambda_1,\cdots,\lambda_n]^T
V=[v(1),⋯,v(1)],λ=[λ1,⋯,λn]T, then
V
d
i
a
g
(
λ
)
=
A
V
\mathbf{V}diag(\mathbf{\lambda})=\mathbf{AV}
Vdiag(λ)=AV, namely
A
=
V
d
i
a
g
(
λ
)
V
−
1
\mathbf{A}=\mathbf{V}diag(\mathbf{\lambda})\mathbf{V}^{-1}
A=Vdiag(λ)V−1.
- every real symmetric matrix can be decomposed
- A = Q Λ Q T \mathbf{A=Q\Lambda Q}^T A=QΛQT, where Q \mathbf{Q} Q an orthogonal matrix composed of eigenvectors of A \mathbf{A} A and Λ \mathbf{\Lambda} Λ is a diagonal matrix.
- the eigenvalue Λ i , i \Lambda_{i,i} Λi,i is associated with the eigenvector Q : , i Q_{:,i} Q:,i
- is not unique.
- if any of eigenvalues are zero, the matrix is singluar
- eigendecomposition:
- f ( x ) = x T A x f(\mathbf{x})=\mathbf{x}^T\mathbf{Ax} f(x)=xTAx, subject to ∥ x ∥ 2 = 1 \|\mathbf{x}\|_2=1 ∥x∥2=1.
- if x \mathbf{x} x is the eigenvector of A \mathbf{A} A, f f f is the eigenvalue.
- the maximum/minimum value of f f f is the maximum/minimum eigenvalue
- positive definite: a matrix with all positive eigenvalue. x T A x = 0 \mathbf{x}^T\mathbf{Ax}=0 xTAx=0==> x = 0 \mathbf{x}=0 x=0
- positive semidefinite: a matrix with all positive or zero eigenvalue.==> ∀ x , x T A x ≥ 0 \forall \mathbf{x},\mathbf{x}^T\mathbf{Ax}\geq 0 ∀x,xTAx≥0
- negative definite
- negative semidefinite
- Singular Value Decompsition:
- every real matrix has a singular value decomposition
- A = U D V T , A m × n , U m × m , D m × n , V n × n \mathbf{A=UDV}^T,\mathbf{A}_{m\times n},\mathbf{U}_{m\times m},\mathbf{D}_{m\times n},\mathbf{V}_{n\times n} A=UDVT,Am×n,Um×m,Dm×n,Vn×n.
- U , V \mathbf{U,V} U,V is the orthogonal matrix
- D \mathbf{D} D is the diagonal matrix and is the singular value of A \mathbf{A} A
- the columns of U \mathbf{U} U are the left-singular vectors.
- the columns of V \mathbf{V} V are the right-singular vectors.
- The Moore-Penrose Pseudoinverse
- define: A + = lim α ↘ 0 ( A T A + α I ) − 1 A T \mathbf{A}^+=\lim\limits_{\alpha\searrow 0}(\mathbf{A}^T\mathbf{A}+\alpha\mathbf{I})^{-1}\mathbf{A}^T A+=α↘0lim(ATA+αI)−1AT
- A A + A = A \mathbf{AA}^+\mathbf{A=A} AA+A=A
- A + A A + = A + \mathbf{A}^+\mathbf{AA}^+=\mathbf{A}^+ A+AA+=A+
- A A + , A + A \mathbf{AA}^+,\mathbf{A}^+\mathbf{A} AA+,A+A are symmetric
- Compute: A + = V D + U T \mathbf{A}^+=\mathbf{VD}^+\mathbf{U}^T A+=VD+UT, where U , V , D \mathbf{U,V,D} U,V,D are the singular value decomposition of A \mathbf{A} A.
- if A \mathbf{A} A is a wide matrix, x = A + y \mathbf{x=A}^+\mathbf{y} x=A+y with the minimal Euclidean norm among all possible solutions.
- if A \mathbf{A} A is a tall matrix, A x \mathbf{Ax} Ax is as close as possible to y \mathbf{y} y.
- Trace Operator
- Define: Tr ( A ) = ∑ i A i , i \text{Tr}(\mathbf{A})=\sum\limits_i\mathbf{A}_{i,i} Tr(A)=i∑Ai,i
- Frobenius norm of a matrix: ∥ A ∥ F = Tr ( A A T ) \|A\|_F=\sqrt{\text{Tr}(\mathbf{AA}^T)} ∥A∥F=Tr(AAT)
- Tr ( A ) = Tr ( A T ) \text{Tr}(\mathbf{A})=\text{Tr}(\mathbf{A}^T) Tr(A)=Tr(AT)
- Tr ( A B C ) = Tr ( C A B ) = Tr ( B C A ) \text{Tr}(\mathbf{ABC})=\text{Tr}(\mathbf{CAB})=\text{Tr}(\mathbf{BCA}) Tr(ABC)=Tr(CAB)=Tr(BCA), more generally, Tr ( ∏ i = 1 n ) = Tr ( F ( n ) ∏ i = 1 n − 1 F ( i ) ) \text{Tr}(\prod\limits_{i=1}^n)=\text{Tr}(\mathbf{F}^{(n)}\prod\limits_{i=1}^{n-1}\mathbf{F}^{(i)}) Tr(i=1∏n)=Tr(F(n)i=1∏n−1F(i))
- a = Tr ( a ) a=\text{Tr}(a) a=Tr(a)
- The Determinant
- define for a square matrix: det ( A ) = ∏ i = 1 n λ i \det(\mathbf{A})=\prod\limits_{i=1}^n\lambda_i det(A)=i=1∏nλi
- The absolute value of the determinant can be thought of as a measure of how much multiplication by the matrix expands or contracts space.
- If the determinant is 0, then space is contracted(收缩) completely along at leastone dimension, causing it to lose all of its volume.
- If the determinant is 1, thenthe transformation preserves(保留) volume.