MIT | 数据分析、信号处理和机器学习中的矩阵方法
本系列为MIT Gilbert Strang教授的"数据分析、信号处理和机器学习中的矩阵方法"的学习笔记。
- Gilbert Strang & Sarah Hansen | Spring 2018
- 18.065: Matrix Methods in Data Analysis, Signal Processing, and Machine Learning
- 视频网址: https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/
- 关注下面的公众号,回复“ 矩阵方法 ”,即可获取 本系列完整的pdf笔记文件~
内容在CSDN、知乎和微信公众号同步更新
- Markdown源文件暂未开源,如有需要可联系邮箱
- 笔记难免存在问题,欢迎联系邮箱指正
Lecture 0: Course Introduction
Lecture 1 The Column Space of A A A Contains All Vectors A x Ax Ax
Lecture 2 Multiplying and Factoring Matrices
Lecture 3 Orthonormal Columns in Q Q Q Give Q ′ Q = I Q'Q=I Q′Q=I
Lecture 4 Eigenvalues and Eigenvectors
Lecture 5 Positive Definite and Semidefinite Matrices
Lecture 6 Singular Value Decomposition (SVD)
Lecture 7 Eckart-Young: The Closest Rank k k k Matrix to A A A
Lecture 8 Norms of Vectors and Matrices
Lecture 9 Four Ways to Solve Least Squares Problems
Lecture 10 Survey of Difficulties with A x = b Ax=b Ax=b
Lecture 11 Minimizing ||x|| Subject to A x = b Ax=b Ax=b
Lecture 12 Computing Eigenvalues and Singular Values
Lecture 13 Randomized Matrix Multiplication
Lecture 14 Low Rank Changes in A A A and Its Inverse
Lecture 15 Matrices A ( t ) A(t) A(t) Depending on t t t, Derivative = d A / d t dA/dt dA/dt
Lecture 16 Derivatives of Inverse and Singular Values
Lecture 17 Rapidly Decreasing Singular Values
Lecture 18 Counting Parameters in SVD, LU, QR, Saddle Points
Lecture 19 Saddle Points Continued, Maxmin Principle
Lecture 20 Definitions and Inequalities
Lecture 21 Minimizing a Function Step by Step
Lecture 22 Gradient Descent: Downhill to a Minimum
Lecture 23 Accelerating Gradient Descent (Use Momentum)
Lecture 24 Linear Programming and Two-Person Games
Lecture 25 Stochastic Gradient Descent
Lecture 26 Structure of Neural Nets for Deep Learning
Lecture 27 Backpropagation: Find Partial Derivatives
Lecture 28 Computing in Class [No video available]
Lecture 29 Computing in Class (cont.) [No video available]
Lecture 30 Completing a Rank-One Matrix, Circulants!
Lecture 31 Eigenvectors of Circulant Matrices: Fourier Matrix
Lecture 32 ImageNet is a Convolutional Neural Network (CNN), The Convolution Rule
Lecture 33 Neural Nets and the Learning Function
Lecture 34 Distance Matrices, Procrustes Problem
Lecture 35 Finding Clusters in Graphs
Lecture 36 Alan Edelman and Julia Language
目录
文章目录
Lecture 2: Multiplying and Factoring Matrices
2.1 五种矩阵分解
-
矩阵分解的5种类型:(Five Key Factorizations)
- A = L U A = LU A=LU
- A = Q R A=QR A=QR
- S = Q Λ Q T S=Q\Lambda Q^T S=QΛQT
- A = X Λ X − 1 A=X\Lambda X^{-1} A=XΛX−1
- A = U Σ V A = U\Sigma V A=UΣV`
-
A = L U A = LU A=LU:
- 含义:
- L: lower triangular matrix
- U: Upper triangular matrix
- 作用:
- about elimination, solving linear systems
- More about elimination:
- 刚开始学线性代数时,求解
A
x
=
b
Ax=b
Ax=b的方法:
- 通过各种row operation进行求解
- 线性方程组的初等变换:
- (1)互换两个方程的位置
- (2)用一个非零数乘以某一个方程
- (3)把一个方程的倍数加到另一个方程
- 实质:线性方程组的解的过程实际上就是反复对方程组进行初等变换。
- 线性方程组的初等变换:
- 通过各种row operation进行求解
- "elimination"的含义:
- A = L U A = LU A=LU perfectly express those row operations!
- 刚开始学线性代数时,求解
A
x
=
b
Ax=b
Ax=b的方法:
- 为什么叫Elimination?
- 首先看如何通过elimination进行LU分解?
- A= [ 2 3 4 7 ] \begin{bmatrix} 2 & 3\\ 4 & 7\\ \end{bmatrix} [2437]
-
[
2
3
4
7
]
→
[
2
3
0
1
]
\begin{bmatrix} 2 & 3\\ 4 & 7\\ \end{bmatrix} \rightarrow \begin{bmatrix} 2 & 3\\ 0 & 1\\ \end{bmatrix}
[2437]→[2031]
- U = [ 2 3 0 1 ] U = \begin{bmatrix} 2 & 3\\ 0 & 1\\ \end{bmatrix} U=[2031]
- 即可得到L
- L = [ 1 0 2 1 ] L = \begin{bmatrix} 1 & 0\\ 2 & 1\\ \end{bmatrix} L=[1201]
- 完成了LU分解 A = L U = [ 1 0 2 1 ] [ 2 3 0 1 ] A = LU = \begin{bmatrix} 1 & 0\\ 2 & 1\\ \end{bmatrix}\begin{bmatrix} 2 & 3\\ 0 & 1\\ \end{bmatrix} A=LU=[1201][2031]
- Elimination的含义:
- 本来有很多个无法直接求解的方程
- 通过LU分解,U矩阵中最后一行只有一个方程需要求解
- 也就是,如何进行如下分解:?
- A = [ 2 3 4 7 ] = [ 2 3 4 ? ] + [ 0 0 0 ? ? ] A = \begin{bmatrix} 2 & 3\\ 4 & 7\\ \end{bmatrix} = \begin{bmatrix} 2 & 3\\ 4 & ?\\ \end{bmatrix} + \begin{bmatrix} 0 & 0\\ 0 & ??\\ \end{bmatrix} A=[2437]=[243?]+[000??]
- 且两个矩阵的rank都是1
- 其实高斯消元本质上也是这么做的
- 使用LU分解即可:
-
A
=
L
U
=
[
1
0
2
1
]
[
2
3
0
1
]
A = LU = \begin{bmatrix} 1 & 0\\ 2 & 1\\ \end{bmatrix}\begin{bmatrix} 2 & 3\\ 0 & 1\\ \end{bmatrix}
A=LU=[1201][2031]
= [ 2 3 4 6 ] + [ 0 0 0 1 ] = \begin{bmatrix} 2 & 3\\ 4 & 6\\ \end{bmatrix} + \begin{bmatrix} 0 & 0\\ 0 & 1\\ \end{bmatrix} =[2436]+[0001]
-
A
=
L
U
=
[
1
0
2
1
]
[
2
3
0
1
]
A = LU = \begin{bmatrix} 1 & 0\\ 2 & 1\\ \end{bmatrix}\begin{bmatrix} 2 & 3\\ 0 & 1\\ \end{bmatrix}
A=LU=[1201][2031]
- 也就是,如何进行如下分解:?
- 高斯消元
→
\rightarrow
→ “Elimination”
- peel off 剥去,脱掉
- 首先看如何通过elimination进行LU分解?
- 分解条件:
- 如果一个秩为k的矩阵A的前k个顺序主子式不为零,那么它就可以进行LU分解
- 含义:
-
A = Q R A = QR A=QR
- Gram-Schmidt
- Least squares is the big application
- Q: Orthogonal Orthogonal 正交的
- The columns are orthogonal
- means they are perpendicular perpendicular 垂直的 to each other
- often orthonormal orthonormal 标准正交的
- means they are unit vectors and orthogonal
- The columns are orthogonal
- R: 上三角矩阵
-
S = Q Λ Q T S = Q\Lambda Q^T S=QΛQT
- really a central one in math!
- in pure math/ applied math/ etc.
- 具体含义:
- S: symmetric matrix
- S = Q Λ Q T S = Q\Lambda Q^T S=QΛQT is a special factorization for symmetric matrix
- Λ \Lambda Λ: the diagonal eigenvalue matrix
-
Q
Q
Q: 也是orthonormal matrix,但由eigenvectors组成
- 需要额外的工作去计算特征向量才能得到
- 而不像QR分解中可直接通过施密特正交化得到
- S: symmetric matrix
- 对称矩阵的性质
- 展开后:
- S = Q Λ Q T = [ q 1 q 2 ⋅ ⋅ ⋅ q n ] [ λ 1 0 ⋅ ⋅ ⋅ 0 0 λ 2 ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 0 ⋅ ⋅ ⋅ λ n ] [ q 1 T q 2 T ⋅ ⋅ ⋅ q n T ] S = Q\Lambda Q^T = \begin{bmatrix} q_1 & q_2 & \cdot \cdot \cdot & q_n\\ \end{bmatrix}\begin{bmatrix} \lambda_1 & 0 & \cdot \cdot \cdot & 0\\ 0 & \lambda_2 & \cdot \cdot \cdot & 0\\ \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot\\ 0 & 0 & \cdot \cdot \cdot& \lambda_n\\ \end{bmatrix}\begin{bmatrix} q_1^T\\ q_2^T\\ \cdot \cdot \cdot \\ q_n^T \end{bmatrix} S=QΛQT=[q1q2⋅⋅⋅qn]⎣⎢⎢⎡λ10⋅⋅⋅00λ2⋅⋅⋅0⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅00⋅⋅⋅λn⎦⎥⎥⎤⎣⎢⎢⎡q1Tq2T⋅⋅⋅qnT⎦⎥⎥⎤
- 其中 q q q为特征向量(列向量)
- 结论1:The eigenvectors are orthogonal (For 对称矩阵)
- 全部n个特征向量
- 结论2:The eigenvalues are real (For 对称矩阵)
- 展开后:
- 作用1:矩阵乘法
- Q Λ Q T = ( Q Λ ) Q T Q\Lambda Q^T = (Q\Lambda )Q^T QΛQT=(QΛ)QT
- The columns of
(
Q
Λ
)
(Q \Lambda )
(QΛ)
×
\times
× rows of
Q
T
Q^T
QT
- 注意:这样矩阵乘法中的 每一个piece矩阵(一列与一行的乘积)的rank都是1!
- all columns 都是那一列的multiples
- all rows 都是那一行的multiples
- e.g., [ 1 2 ] [ 3 4 ] = [ 3 4 6 8 ] \begin{bmatrix} 1\\ 2\\ \end{bmatrix}\begin{bmatrix} 3 & 4\\ \end{bmatrix} = \begin{bmatrix} 3 & 4\\ 6 & 8\\ \end{bmatrix} [12][34]=[3648]
- 注意:这样矩阵乘法中的 每一个piece矩阵(一列与一行的乘积)的rank都是1!
- 因此:
(
Q
Λ
)
Q
T
(Q\Lambda )Q^T
(QΛ)QT is the sum of rank 1 matrices
- 第一个rank 1 matrix:
- ( Q Λ ) (Q\Lambda ) (QΛ)的第一列: λ 1 q 1 \lambda_1 q_1 λ1q1
- Q T Q^T QT的第一行: q 1 T q_1^T q1T
- 矩阵: λ 1 q 1 q 1 T \lambda_1 q_1 q_1^T λ1q1q1T
- 第一个rank 1 matrix:
- 最终结果:
S
=
Q
Λ
Q
T
=
(
Q
Λ
)
Q
T
=
λ
1
q
1
q
1
T
+
λ
2
q
2
q
2
T
+
⋅
⋅
⋅
+
λ
n
q
n
q
n
T
S = Q\Lambda Q^T = (Q\Lambda )Q^T = \lambda_1 q_1 q_1^T + \lambda_2 q_2 q_2^T + \cdot \cdot \cdot + \lambda_n q_n q_n^T
S=QΛQT=(QΛ)QT=λ1q1q1T+λ2q2q2T+⋅⋅⋅+λnqnqnT
- 相当于 把S分解为了n个rank 1 pieces
- 而且这里每个piece还很特殊
- 都是symmetric
- λ n q n q n T \lambda_n q_n q_n^T λnqnqnT是对称的
- 验证上述结果:
- Look at
S
q
1
Sq_1
Sq1
- S q 1 = λ 1 q 1 q 1 T q 1 + λ 2 q 2 q 2 T q 1 + . . . + λ n q n q n T q 1 Sq_1 = \lambda_1 q_1 q_1^T q_1 + \lambda_2 q_2 q_2^T q_1 + ... + \lambda_n q_n q_n^T q_1 Sq1=λ1q1q1Tq1+λ2q2q2Tq1+...+λnqnqnTq1
- 因为
Q
Q
Q是标准正交阵:
- S q 1 = λ 1 q 1 ∥ q 1 ∥ 2 = λ 1 q 1 Sq_1 = \lambda_1 q_1 \|q_1\|^2 = \lambda_1 q_1 Sq1=λ1q1∥q1∥2=λ1q1
- Look at
S
q
1
Sq_1
Sq1
- 结果正确:
S
q
1
=
λ
1
q
1
Sq_1 = \lambda_1 q_1
Sq1=λ1q1
- 符合the definitions of eigenvalues and eigenvectors
- really a central one in math!
-
A = U Σ V A = U\Sigma V A=UΣV
- 奇异值分解 (SVD)
- Singular Value Decomposition
- a foundational factorization for this course and for all of data science
- 含义:
- U: an orthogonal matrix
- V: an orthogonal matrix
- Σ \Sigma Σ: a diagonal matrix
- 意义:
- It works for every matrix
- Rectangular matrix
- has enough eigenvectors or not
- 但其他的分解,比如
A
=
X
Λ
X
−
1
A = X \Lambda X^{-1}
A=XΛX−1就不行,原因:
- SVD的U和V是两个不同的orth. 矩阵
- two different sets of singular vectors
- 而 A = X Λ X − 1 A = X \Lambda X^{-1} A=XΛX−1中X只代表了一个set
- SVD的U和V是两个不同的orth. 矩阵
- It works for every matrix
- 奇异值分解 (SVD)
5.2 Four sub-spaces in linear algebra
- about 4 fundamental sub spaces for matrix A
- 1 Column space
C
(
A
)
C(A)
C(A)
- dim = r ( r ≤ n ) (r \leq n) (r≤n)
- 2 Row Space
C
(
A
T
)
C(A^T)
C(AT)
- dim = r ( r ≤ n ) (r \leq n) (r≤n)
- dim of C ( A ) C(A) C(A) = dim of C ( A T ) C(A^T) C(AT) 是lecture 1的重要结论
- 3 Null Space
N
(
A
)
N(A)
N(A)
- null space = all solutions x x x to A x = 0 Ax = 0 Ax=0
- "null"的含义: reflecting the fact that that’s a zero (Ax = 0)
- How many independent vectors in the null space (N(A)的维度是多少)
-
d
i
m
=
n
−
r
dim = n-r
dim=n−r
- 有n个unkowns (未知数)
- r个constraints
- 因此Null space的维度为 n − r n-r n−r
-
d
i
m
=
n
−
r
dim = n-r
dim=n−r
- 4 Null Space of A T A^T AT N ( A T ) N(A^T) N(AT)
- 1 Column space
C
(
A
)
C(A)
C(A)
- What is implied when I say the “sapce” of vectors?
- “space”: means I can do the most important operations of linear algebra in that space
- 事实上,向量空间本来就和群环域在概念上有很大关联
- 给定域F,向量空间V记为F-向量空间。其二元运算:
- 向量加法:+ : V × V → V 记作 v + w, ∃ v, w ∈ V
- 标量乘法:·: F × V → V 记作 a v, ∃a ∈ F 且 v ∈ V
- 并且满足如下8条公理 [根据百科]:
- 向量加法结合律:u + (v + w) = (u + v) + w
- 向量加法的单位元:V存在零向量的0,∀ v ∈ V , v + 0 = v
- 向量加法的逆元素:∀v∈V, ∃w∈V,使得 v + w = 0
- 向量加法交换律:v + w = w + v
- 标量乘法与域乘法兼容性(compatibility): a(b v) = (ab)v
- 标量乘法有单位元: 1 v = v, 1指域F的乘法单位元
- 标量乘法对于向量加法满足分配律:a(v + w) = a v + a w
- 标量乘法对于域加法满足分配律: (a + b)v = a v + b v
- 例如:
- Given Ax = 0 and Ay = 0, then A(x+y) = 0:
- 若 x , y ∈ N ( A ) x, y \in N(A) x,y∈N(A), then x+y ∈ N ( A ) \in N(A) ∈N(A)
- Given Ax = 0 and Ay = 0, then A(x+y) = 0:
- 正是这些性质,使线性代数成为可能
- can take linear combinations
- if I take combinations of two null space guys, I am still in the null space
- 除了向量空间外,还有模(module)的概念:
-
A module over a ring is a generalization of the notion of vector space over a field, wherein the corresponding scalars are the elements of an arbitrary ring.
-
- 事实上,向量空间本来就和群环域在概念上有很大关联
- “space”: means I can do the most important operations of linear algebra in that space
- 这四个subspaces之间的关系
- 如下图
- 对于矩阵
A
(
m
×
n
)
A (m\times n)
A(m×n)
- Two spaces are in
R
n
R^n
Rn
- Row space
C
(
A
T
)
C(A^T)
C(AT)
- d i m = r dim = r dim=r
- Null space
N
(
A
)
N(A)
N(A)
- d i m = n − r dim = n-r dim=n−r
- 二者dim相加恰好为n
- 意味着: Every vector has a piece in the row space, and a piece in the null space
- and those two pieces give you back the vector
- 解释见 subspaces relationship
- 意味着: Every vector has a piece in the row space, and a piece in the null space
- Row space
C
(
A
T
)
C(A^T)
C(AT)
- Two spaces are in
R
m
R^m
Rm
- Column Space
C
(
A
)
C(A)
C(A)
- d i m = r dim = r dim=r
- Null space of
A
T
A^T
AT:
N
(
A
T
)
N(A^T)
N(AT)
- d i m = m − r dim = m-r dim=m−r
- Column Space
C
(
A
)
C(A)
C(A)
- Two spaces are in
R
n
R^n
Rn
- 下一个问题:
C
(
A
T
)
C(A^T)
C(AT)是
R
n
R^n
Rn中一个r dimensional plane,
N
(
A
)
N(A)
N(A)是
R
n
R^n
Rn中一个(n-r) dimensional plane, 那么 How are those two planes connected?
- row space:
- A T y A^T y ATy for ∀ y \forall y ∀y
- dim r
- Null space:
- x x x for A x = 0 Ax = 0 Ax=0
- dim n-r
- an example
- A = [ 1 2 4 2 4 8 ] A = \begin{bmatrix} 1 & 2 & 4\\ 2 & 4 & 8\\ \end{bmatrix} A=[122448]
- m = 2; n = 3; r = 1
- 先看Null space:
- n - r =2: 所以应该能够找到两个independent vectors
x
1
x_1
x1 and
x
2
x_2
x2, s.t.,
A
x
1
=
A
x
2
=
0
Ax_1 = Ax_2 = 0
Ax1=Ax2=0
- 例如 x 1 = [ 0 , − 2 , 1 ] T x_1 = [0, -2, 1]^T x1=[0,−2,1]T; x 2 = [ 4 , 0 , − 1 ] T x_2 = [4, 0, -1]^T x2=[4,0,−1]T
- n - r =2: 所以应该能够找到两个independent vectors
x
1
x_1
x1 and
x
2
x_2
x2, s.t.,
A
x
1
=
A
x
2
=
0
Ax_1 = Ax_2 = 0
Ax1=Ax2=0
- 再看 Row Space:
- 比如A的第一行 x r = [ 1 , 2 , 4 ] x_r = [1, 2, 4] xr=[1,2,4]
- 它们有什么关系??
- Orthogonal!
- 结论: Null space 与 Row Space正交!
- 这也是之前所说的
Every vector has a piece in the row space, and a piece in the null space
的含义
- 这也是之前所说的
- 意义:
- a completely general fact
- 凡是满足
A
x
=
0
Ax=0
Ax=0的
x
x
x, 一定和
A
A
A的rows及其combinations正交
- When I look at A x = 0 Ax = 0 Ax=0, it is telling me that x is orthogonal to the row
- 这也符合常识
- This is the fundamental theorem of linear algebra to see that the dimensions come out right, and the geometry comes out right.
- row space:
Next lecture:
- move on quickly to eigenvalues and positive definite matrices