MIT | 数据分析、信号处理和机器学习中的矩阵方法 笔记系列 Lecture 2: Multiplying and Factoring Matrices

MIT | 数据分析、信号处理和机器学习中的矩阵方法

本系列为MIT Gilbert Strang教授的"数据分析、信号处理和机器学习中的矩阵方法"的学习笔记。

  • Gilbert Strang & Sarah Hansen | Spring 2018
  • 18.065: Matrix Methods in Data Analysis, Signal Processing, and Machine Learning
  • 视频网址: https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/
  • 关注下面的公众号,回复“ 矩阵方法 ”,即可获取 本系列完整的pdf笔记文件~

内容在CSDN、知乎和微信公众号同步更新

在这里插入图片描述

  • Markdown源文件暂未开源,如有需要可联系邮箱
  • 笔记难免存在问题,欢迎联系邮箱指正

Lecture 0: Course Introduction

Lecture 1 The Column Space of A A A Contains All Vectors A x Ax Ax

Lecture 2 Multiplying and Factoring Matrices

Lecture 3 Orthonormal Columns in Q Q Q Give Q ′ Q = I Q'Q=I QQ=I

Lecture 4 Eigenvalues and Eigenvectors

Lecture 5 Positive Definite and Semidefinite Matrices

Lecture 6 Singular Value Decomposition (SVD)

Lecture 7 Eckart-Young: The Closest Rank k k k Matrix to A A A

Lecture 8 Norms of Vectors and Matrices

Lecture 9 Four Ways to Solve Least Squares Problems

Lecture 10 Survey of Difficulties with A x = b Ax=b Ax=b

Lecture 11 Minimizing ||x|| Subject to A x = b Ax=b Ax=b

Lecture 12 Computing Eigenvalues and Singular Values

Lecture 13 Randomized Matrix Multiplication

Lecture 14 Low Rank Changes in A A A and Its Inverse

Lecture 15 Matrices A ( t ) A(t) A(t) Depending on t t t, Derivative = d A / d t dA/dt dA/dt

Lecture 16 Derivatives of Inverse and Singular Values

Lecture 17 Rapidly Decreasing Singular Values

Lecture 18 Counting Parameters in SVD, LU, QR, Saddle Points

Lecture 19 Saddle Points Continued, Maxmin Principle

Lecture 20 Definitions and Inequalities

Lecture 21 Minimizing a Function Step by Step

Lecture 22 Gradient Descent: Downhill to a Minimum

Lecture 23 Accelerating Gradient Descent (Use Momentum)

Lecture 24 Linear Programming and Two-Person Games

Lecture 25 Stochastic Gradient Descent

Lecture 26 Structure of Neural Nets for Deep Learning

Lecture 27 Backpropagation: Find Partial Derivatives

Lecture 28 Computing in Class [No video available]

Lecture 29 Computing in Class (cont.) [No video available]

Lecture 30 Completing a Rank-One Matrix, Circulants!

Lecture 31 Eigenvectors of Circulant Matrices: Fourier Matrix

Lecture 32 ImageNet is a Convolutional Neural Network (CNN), The Convolution Rule

Lecture 33 Neural Nets and the Learning Function

Lecture 34 Distance Matrices, Procrustes Problem

Lecture 35 Finding Clusters in Graphs

Lecture 36 Alan Edelman and Julia Language


目录


Lecture 2: Multiplying and Factoring Matrices

2.1 五种矩阵分解

  • 矩阵分解的5种类型:(Five Key Factorizations)

    • A = L U A = LU A=LU
    • A = Q R A=QR A=QR
    • S = Q Λ Q T S=Q\Lambda Q^T S=QΛQT
    • A = X Λ X − 1 A=X\Lambda X^{-1} A=XΛX1
    • A = U Σ V A = U\Sigma V A=UΣV`
  • A = L U A = LU A=LU:

    • 含义:
      • L: lower triangular matrix
      • U: Upper triangular matrix
    • 作用:
      • about elimination, solving linear systems
      • More about elimination:
        • 刚开始学线性代数时,求解 A x = b Ax=b Ax=b的方法:
          • 通过各种row operation进行求解
            • 线性方程组的初等变换:
              • (1)互换两个方程的位置
              • (2)用一个非零数乘以某一个方程
              • (3)把一个方程的倍数加到另一个方程
            • 实质:线性方程组的解的过程实际上就是反复对方程组进行初等变换。
        • "elimination"的含义:
          • A = L U A = LU A=LU perfectly express those row operations!
    • 为什么叫Elimination?
      • 首先看如何通过elimination进行LU分解?
        • A= [ 2 3 4 7 ] \begin{bmatrix} 2 & 3\\ 4 & 7\\ \end{bmatrix} [2437]
        • [ 2 3 4 7 ] → [ 2 3 0 1 ] \begin{bmatrix} 2 & 3\\ 4 & 7\\ \end{bmatrix} \rightarrow \begin{bmatrix} 2 & 3\\ 0 & 1\\ \end{bmatrix} [2437][2031]
          • U = [ 2 3 0 1 ] U = \begin{bmatrix} 2 & 3\\ 0 & 1\\ \end{bmatrix} U=[2031]
        • 即可得到L
          • L = [ 1 0 2 1 ] L = \begin{bmatrix} 1 & 0\\ 2 & 1\\ \end{bmatrix} L=[1201]
        • 完成了LU分解 A = L U = [ 1 0 2 1 ] [ 2 3 0 1 ] A = LU = \begin{bmatrix} 1 & 0\\ 2 & 1\\ \end{bmatrix}\begin{bmatrix} 2 & 3\\ 0 & 1\\ \end{bmatrix} A=LU=[1201][2031]
      • Elimination的含义:
        • 本来有很多个无法直接求解的方程
        • 通过LU分解,U矩阵中最后一行只有一个方程需要求解
          • 也就是,如何进行如下分解:?
            • A = [ 2 3 4 7 ] = [ 2 3 4 ? ] + [ 0 0 0 ? ? ] A = \begin{bmatrix} 2 & 3\\ 4 & 7\\ \end{bmatrix} = \begin{bmatrix} 2 & 3\\ 4 & ?\\ \end{bmatrix} + \begin{bmatrix} 0 & 0\\ 0 & ??\\ \end{bmatrix} A=[2437]=[243?]+[000??]
            • 且两个矩阵的rank都是1
            • 其实高斯消元本质上也是这么做的
          • 使用LU分解即可:
            • A = L U = [ 1 0 2 1 ] [ 2 3 0 1 ] A = LU = \begin{bmatrix} 1 & 0\\ 2 & 1\\ \end{bmatrix}\begin{bmatrix} 2 & 3\\ 0 & 1\\ \end{bmatrix} A=LU=[1201][2031]
              = [ 2 3 4 6 ] + [ 0 0 0 1 ] = \begin{bmatrix} 2 & 3\\ 4 & 6\\ \end{bmatrix} + \begin{bmatrix} 0 & 0\\ 0 & 1\\ \end{bmatrix} =[2436]+[0001]
        • 高斯消元 → \rightarrow “Elimination”
          • peel off 剥去,脱掉
    • 分解条件:
      • 如果一个秩为k的矩阵A的前k个顺序主子式不为零,那么它就可以进行LU分解
  • A = Q R A = QR A=QR

    • Gram-Schmidt
    • Least squares is the big application
    • Q: Orthogonal Orthogonal 正交的
      • The columns are orthogonal
        • means they are perpendicular perpendicular 垂直的 to each other
      • often orthonormal orthonormal 标准正交的
        • means they are unit vectors and orthogonal
    • R: 上三角矩阵
  • S = Q Λ Q T S = Q\Lambda Q^T S=QΛQT

    • really a central one in math!
      • in pure math/ applied math/ etc.
    • 具体含义:
      • S: symmetric matrix
        • S = Q Λ Q T S = Q\Lambda Q^T S=QΛQT is a special factorization for symmetric matrix
      • Λ \Lambda Λ: the diagonal eigenvalue matrix
      • Q Q Q: 也是orthonormal matrix,但由eigenvectors组成
        • 需要额外的工作去计算特征向量才能得到
        • 而不像QR分解中可直接通过施密特正交化得到
    • 对称矩阵的性质
      • 展开后:
        • S = Q Λ Q T = [ q 1 q 2 ⋅ ⋅ ⋅ q n ] [ λ 1 0 ⋅ ⋅ ⋅ 0 0 λ 2 ⋅ ⋅ ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0 0 ⋅ ⋅ ⋅ λ n ] [ q 1 T q 2 T ⋅ ⋅ ⋅ q n T ] S = Q\Lambda Q^T = \begin{bmatrix} q_1 & q_2 & \cdot \cdot \cdot & q_n\\ \end{bmatrix}\begin{bmatrix} \lambda_1 & 0 & \cdot \cdot \cdot & 0\\ 0 & \lambda_2 & \cdot \cdot \cdot & 0\\ \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot\\ 0 & 0 & \cdot \cdot \cdot& \lambda_n\\ \end{bmatrix}\begin{bmatrix} q_1^T\\ q_2^T\\ \cdot \cdot \cdot \\ q_n^T \end{bmatrix} S=QΛQT=[q1q2qn]λ1000λ2000λnq1Tq2TqnT
        • 其中 q q q为特征向量(列向量)
        • 结论1:The eigenvectors are orthogonal (For 对称矩阵)
          • 全部n个特征向量
        • 结论2:The eigenvalues are real (For 对称矩阵)
    • 作用1:矩阵乘法
      • Q Λ Q T = ( Q Λ ) Q T Q\Lambda Q^T = (Q\Lambda )Q^T QΛQT=(QΛ)QT
      • The columns of ( Q Λ ) (Q \Lambda ) (QΛ) × \times × rows of Q T Q^T QT
        • 注意:这样矩阵乘法中的 每一个piece矩阵(一列与一行的乘积)的rank都是1!
          • all columns 都是那一列的multiples
          • all rows 都是那一行的multiples
          • e.g., [ 1 2 ] [ 3 4 ] = [ 3 4 6 8 ] \begin{bmatrix} 1\\ 2\\ \end{bmatrix}\begin{bmatrix} 3 & 4\\ \end{bmatrix} = \begin{bmatrix} 3 & 4\\ 6 & 8\\ \end{bmatrix} [12][34]=[3648]
      • 因此: ( Q Λ ) Q T (Q\Lambda )Q^T (QΛ)QT is the sum of rank 1 matrices
        • 第一个rank 1 matrix:
          • ( Q Λ ) (Q\Lambda ) (QΛ)的第一列: λ 1 q 1 \lambda_1 q_1 λ1q1
          • Q T Q^T QT的第一行: q 1 T q_1^T q1T
          • 矩阵: λ 1 q 1 q 1 T \lambda_1 q_1 q_1^T λ1q1q1T
      • 最终结果: S = Q Λ Q T = ( Q Λ ) Q T = λ 1 q 1 q 1 T + λ 2 q 2 q 2 T + ⋅ ⋅ ⋅ + λ n q n q n T S = Q\Lambda Q^T = (Q\Lambda )Q^T = \lambda_1 q_1 q_1^T + \lambda_2 q_2 q_2^T + \cdot \cdot \cdot + \lambda_n q_n q_n^T S=QΛQT=(QΛ)QT=λ1q1q1T+λ2q2q2T++λnqnqnT
        • 相当于 把S分解为了n个rank 1 pieces
        • 而且这里每个piece还很特殊
          • 都是symmetric
          • λ n q n q n T \lambda_n q_n q_n^T λnqnqnT是对称的
      • 验证上述结果:
        • Look at S q 1 Sq_1 Sq1
          • S q 1 = λ 1 q 1 q 1 T q 1 + λ 2 q 2 q 2 T q 1 + . . . + λ n q n q n T q 1 Sq_1 = \lambda_1 q_1 q_1^T q_1 + \lambda_2 q_2 q_2^T q_1 + ... + \lambda_n q_n q_n^T q_1 Sq1=λ1q1q1Tq1+λ2q2q2Tq1+...+λnqnqnTq1
          • 因为 Q Q Q是标准正交阵:
            • S q 1 = λ 1 q 1 ∥ q 1 ∥ 2 = λ 1 q 1 Sq_1 = \lambda_1 q_1 \|q_1\|^2 = \lambda_1 q_1 Sq1=λ1q1q12=λ1q1
      • 结果正确: S q 1 = λ 1 q 1 Sq_1 = \lambda_1 q_1 Sq1=λ1q1
        • 符合the definitions of eigenvalues and eigenvectors
  • A = U Σ V A = U\Sigma V A=UΣV

    • 奇异值分解 (SVD)
      • Singular Value Decomposition
      • a foundational factorization for this course and for all of data science
    • 含义:
      • U: an orthogonal matrix
      • V: an orthogonal matrix
      • Σ \Sigma Σ: a diagonal matrix
    • 意义:
      • It works for every matrix
        • Rectangular matrix
        • has enough eigenvectors or not
        • 但其他的分解,比如 A = X Λ X − 1 A = X \Lambda X^{-1} A=XΛX1就不行,原因:
          • SVD的U和V是两个不同的orth. 矩阵
            • two different sets of singular vectors
          • A = X Λ X − 1 A = X \Lambda X^{-1} A=XΛX1中X只代表了一个set

5.2 Four sub-spaces in linear algebra

  • about 4 fundamental sub spaces for matrix A
    • 1 Column space C ( A ) C(A) C(A)
      • dim = r ( r ≤ n ) (r \leq n) (rn)
    • 2 Row Space C ( A T ) C(A^T) C(AT)
      • dim = r ( r ≤ n ) (r \leq n) (rn)
      • dim of C ( A ) C(A) C(A) = dim of C ( A T ) C(A^T) C(AT) 是lecture 1的重要结论
    • 3 Null Space N ( A ) N(A) N(A)
      • null space = all solutions x x x to A x = 0 Ax = 0 Ax=0
      • "null"的含义: reflecting the fact that that’s a zero (Ax = 0)
      • How many independent vectors in the null space (N(A)的维度是多少)
        • d i m = n − r dim = n-r dim=nr
          • 有n个unkowns (未知数)
          • r个constraints
          • 因此Null space的维度为 n − r n-r nr
    • 4 Null Space of A T A^T AT N ( A T ) N(A^T) N(AT)

  • What is implied when I say the “sapce” of vectors?
    • “space”: means I can do the most important operations of linear algebra in that space
      • 事实上,向量空间本来就和群环域在概念上有很大关联
        • 给定域F,向量空间V记为F-向量空间。其二元运算:
        • 向量加法:+ : V × V → V 记作 v + w, ∃ v, w ∈ V
        • 标量乘法:·: F × V → V 记作 a v, ∃a ∈ F 且 v ∈ V
        • 并且满足如下8条公理 [根据百科]:
          • 向量加法结合律:u + (v + w) = (u + v) + w
          • 向量加法的单位元:V存在零向量的0,∀ v ∈ V , v + 0 = v
          • 向量加法的逆元素:∀v∈V, ∃w∈V,使得 v + w = 0
          • 向量加法交换律:v + w = w + v
          • 标量乘法与域乘法兼容性(compatibility): a(b v) = (ab)v
          • 标量乘法有单位元: 1 v = v, 1指域F的乘法单位元
          • 标量乘法对于向量加法满足分配律:a(v + w) = a v + a w
          • 标量乘法对于域加法满足分配律: (a + b)v = a v + b v
        • 例如:
          • Given Ax = 0 and Ay = 0, then A(x+y) = 0:
            • x , y ∈ N ( A ) x, y \in N(A) x,yN(A), then x+y ∈ N ( A ) \in N(A) N(A)
        • 正是这些性质,使线性代数成为可能
          • can take linear combinations
          • if I take combinations of two null space guys, I am still in the null space
      • 除了向量空间外,还有模(module)的概念:
        • A module over a ring is a generalization of the notion of vector space over a field, wherein the corresponding scalars are the elements of an arbitrary ring.


  • 这四个subspaces之间的关系
    • 如下图
    • 对于矩阵 A ( m × n ) A (m\times n) A(m×n)
      • Two spaces are in R n R^n Rn
        • Row space C ( A T ) C(A^T) C(AT)
          • d i m = r dim = r dim=r
        • Null space N ( A ) N(A) N(A)
          • d i m = n − r dim = n-r dim=nr
        • 二者dim相加恰好为n
          • 意味着: Every vector has a piece in the row space, and a piece in the null space
      • Two spaces are in R m R^m Rm
        • Column Space C ( A ) C(A) C(A)
          • d i m = r dim = r dim=r
        • Null space of A T A^T AT: N ( A T ) N(A^T) N(AT)
          • d i m = m − r dim = m-r dim=mr

1653034046668----Matrix_Gilbert_note.png

  • 下一个问题: C ( A T ) C(A^T) C(AT) R n R^n Rn中一个r dimensional plane, N ( A ) N(A) N(A) R n R^n Rn中一个(n-r) dimensional plane, 那么 How are those two planes connected?
    • row space:
      • A T y A^T y ATy for ∀ y \forall y y
      • dim r
    • Null space:
      • x x x for A x = 0 Ax = 0 Ax=0
      • dim n-r
    • an example
      • A = [ 1 2 4 2 4 8 ] A = \begin{bmatrix} 1 & 2 & 4\\ 2 & 4 & 8\\ \end{bmatrix} A=[122448]
      • m = 2; n = 3; r = 1
      • 先看Null space:
        • n - r =2: 所以应该能够找到两个independent vectors x 1 x_1 x1 and x 2 x_2 x2, s.t., A x 1 = A x 2 = 0 Ax_1 = Ax_2 = 0 Ax1=Ax2=0
          • 例如 x 1 = [ 0 , − 2 , 1 ] T x_1 = [0, -2, 1]^T x1=[0,2,1]T; x 2 = [ 4 , 0 , − 1 ] T x_2 = [4, 0, -1]^T x2=[4,0,1]T
      • 再看 Row Space:
        • 比如A的第一行 x r = [ 1 , 2 , 4 ] x_r = [1, 2, 4] xr=[1,2,4]
      • 它们有什么关系??
        • Orthogonal!
    • 结论: Null space 与 Row Space正交!
      • 这也是之前所说的 Every vector has a piece in the row space, and a piece in the null space的含义
    • 意义:
      • a completely general fact
      • 凡是满足 A x = 0 Ax=0 Ax=0 x x x, 一定和 A A A的rows及其combinations正交
        • When I look at A x = 0 Ax = 0 Ax=0, it is telling me that x is orthogonal to the row
        • 这也符合常识
      • This is the fundamental theorem of linear algebra to see that the dimensions come out right, and the geometry comes out right.

Next lecture:

  • move on quickly to eigenvalues and positive definite matrices
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

R.X. NLOS

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值