MIT | 数据分析、信号处理和机器学习中的矩阵方法 笔记系列: Lecture 8 Norms of Vectors and Matrices

本系列为MIT Gilbert Strang教授的"数据分析、信号处理和机器学习中的矩阵方法"的学习笔记。

  • Gilbert Strang & Sarah Hansen | Sprint 2018
  • 18.065: Matrix Methods in Data Analysis, Signal Processing, and Machine Learning
  • 视频网址: https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-analysis-signal-processing-and-machine-learning-spring-2018/
  • 关注 下面的公众号,回复“ 矩阵方法 ”,即可获取 本系列完整的pdf笔记文件~

内容在CSDN、知乎和微信公众号同步更新

在这里插入图片描述

  • Markdown源文件暂未开源,如有需要可联系邮箱
  • 笔记难免存在问题,欢迎联系邮箱指正

Lecture 0: Course Introduction

Lecture 1 The Column Space of A A A Contains All Vectors A x Ax Ax

Lecture 2 Multiplying and Factoring Matrices

Lecture 3 Orthonormal Columns in Q Q Q Give Q ′ Q = I Q'Q=I QQ=I

Lecture 4 Eigenvalues and Eigenvectors

Lecture 5 Positive Definite and Semidefinite Matrices

Lecture 6 Singular Value Decomposition (SVD)

Lecture 7 Eckart-Young: The Closest Rank k k k Matrix to A A A

Lecture 8 Norms of Vectors and Matrices

Lecture 9 Four Ways to Solve Least Squares Problems

Lecture 10 Survey of Difficulties with A x = b Ax=b Ax=b

Lecture 11 Minimizing ||x|| Subject to A x = b Ax=b Ax=b

Lecture 12 Computing Eigenvalues and Singular Values

Lecture 13 Randomized Matrix Multiplication

Lecture 14 Low Rank Changes in A A A and Its Inverse

Lecture 15 Matrices A ( t ) A(t) A(t) Depending on t t t, Derivative = d A / d t dA/dt dA/dt

Lecture 16 Derivatives of Inverse and Singular Values

Lecture 17 Rapidly Decreasing Singular Values

Lecture 18 Counting Parameters in SVD, LU, QR, Saddle Points

Lecture 19 Saddle Points Continued, Maxmin Principle

Lecture 20 Definitions and Inequalities

Lecture 21 Minimizing a Function Step by Step

Lecture 22 Gradient Descent: Downhill to a Minimum

Lecture 23 Accelerating Gradient Descent (Use Momentum)

Lecture 24 Linear Programming and Two-Person Games

Lecture 25 Stochastic Gradient Descent

Lecture 26 Structure of Neural Nets for Deep Learning

Lecture 27 Backpropagation: Find Partial Derivatives

Lecture 28 Computing in Class [No video available]

Lecture 29 Computing in Class (cont.) [No video available]

Lecture 30 Completing a Rank-One Matrix, Circulants!

Lecture 31 Eigenvectors of Circulant Matrices: Fourier Matrix

Lecture 32 ImageNet is a Convolutional Neural Network (CNN), The Convolution Rule

Lecture 33 Neural Nets and the Learning Function

Lecture 34 Distance Matrices, Procrustes Problem

Lecture 35 Finding Clusters in Graphs

Lecture 36 Alan Edelman and Julia Language



Lecure 8: Norms of Vectors and Matrices

  • 一个与本课程无关的问题(现象):Probability Matching

    • Biased Coin: 但是参与者不知道 (无先验)

      ▪ 75% likely to produce heads

      ▪ 25% likely to produce tails

    • payoff 1 guess right or 1 guess wrong

    • Optimal strategy: 刚开始会根据之前的经验猜测都是 50%, 但尝试过程中会慢慢优化 ⇒ \Rightarrow 最终,to guess heads all the time

    • 人类实际上会guess heads three quarters of the time and tails one quarter of the time

    • 该问题说明了: there is good math questions everywhere, 与本节课无关

  • 下面开始本节课的内容: Norms

8.1 Vectors norms ∥ v ∥ p \|v\|_p vp

  • ∥ v ∥ p = ( ∣ v 1 ∣ p + ∣ v 2 ∣ p + . . . + ∣ v n ∣ p ) 1 / p \|v\|_p = (|v_1|^p + |v_2|^p + ... + |v_n|^p)^{1/p} vp=(v1p+v2p+...+vnp)1/p

  • Norm is a way to measure

  • the size of a vector / matrix / tensor

  • p = 2

    • v 1 2 + v 2 2 + . . . + v n 2 \sqrt{v_1^2 + v_2^2 + ... + v_n^2} v12+v22+...+vn2
  • p = 1

    • ∣ v 1 ∣ + ∣ v 2 ∣ + . . . + ∣ v n ∣ |v_1| + |v_2| + ... + |v_n| v1+v2+...+vn
    • some things really work best in the L 1 L_1 L1 norm
  • p = ∞ \infty

    • m a x ∣ v i ∣ max |v_i| maxvi
    • as increasing p, whichever one is biggest is going to take over
  • p = 0

    • ∥ v ∥ 0 \|v\|_0 v0 is the number of non-zero components

    • important in question of sparsity

      🚩 you might want to min ∥ v ∥ 0 \|v\|_0 v0 ⇒ \Rightarrow to get sparse vectors ⇒ \Rightarrow 以加快运算速度

    • Note: ∥ v ∥ 0 \|v\|_0 v0 is not a norm! ⇒ \Rightarrow 2 ∥ v ∥ 0 ≠ 2 ∥ v ∥ 0 2 \|v\|_0 \not =2 {\|v\|_0} 2∥v0=2v0 ⇒ \Rightarrow violates violates 违反 违背 the rule for a norm

  • The Geometry of a norm

    • plots the norm in 2D space R 2 \mathbb{R}^2 R2

      ✅ 如下图

    • 哪些范数适合优化?True norm: convex unit ball ∥ v ∥ ≤ 1 \|v\|\leq 1 v1

      🚩 L 1 L_1 L1 优化的时候 NOT Convex!

      🚩 L 2 L_2 L2: Convex, 使优化求解稳定快速

picture 4

  • p = S
    • S: means a positive definite symmetric matrix

    • ∥ v ∥ S = v T S v \|v\|_S = \sqrt{v^T S v} vS=vTSv

    • ⋅ \sqrt{\cdot} 的意义:使 ∥ 2 v ∥ S = 2 × \| 2 v\|_S = 2\times ∥2vS=2× ∥ v ∥ S \|v\|_S vS ⇒ \Rightarrow grow linearly

    • What is the shape of ∥ v ∥ S = v T S v = 1 \|v\|_S = \sqrt{v^T S v} = 1 vS=vTSv =1? (S: symmetric positive definite)

      ✅ let S = [ 2 0 0 3 ] S = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix} S=[2003]

      ⇒ \Rightarrow v T S v = 2 v 1 2 + 3 v 2 2 = 1 v^T S v = 2 v_1^2 + 3 v_2^2 = 1 vTSv=2v12+3v22=1

      🚩 S范数的意义:get a new picture: a new norm that is kind of adjustable ⇒ \Rightarrow a weighted norm ⇒ \Rightarrow pixk some numbers sort of appropriate to the particular problem

      🚩 the shape is an ellipse

picture 1

8.2 Optimal Problem

  • A simple example:
    • m i n min min ∥ x ∥ 1 \|x\|_1 x1 or ∥ x ∥ 2 \|x\|_2 x2

      🚩 Ax = b (e.g., a 1 x 1 + a 2 x 2 = b a_1x_1 + a_2x_2 = b a1x1+a2x2=b)

    • L1: a famous name basis pursuit 基底追踪?

    • L2: Ridge regression

    • 下图,可见 L 1 L_1 L1 倾向于稀疏结果 (可从geometry 角度解释);

picture 2

8.3 Matrix Norms

  • ∥ A ∥ 2 \|A\|_2 A2 = σ 1 \sigma_1 σ1

    • the largest singular
    • How to connect it to 2 norms of vectors?
    • Matrix norm from vector norm = max blowup
    • ∣ ∣ A ∣ ∣ 2 = m a x ∀ x ∥ A x ∥ 2 ∣ ∣ x ∣ ∣ 2 ||A||_2 = max_{\forall x} \frac{\|Ax\|_2}{||x||_2} ∣∣A2=maxx∣∣x2Ax2
  • How to prove : ∣ ∣ A ∣ ∣ 2 = m a x ∀ x ∥ A x ∥ 2 ∣ ∣ x ∣ ∣ 2 ||A||_2 = max_{\forall x} \frac{\|Ax\|_2}{||x||_2} ∣∣A2=maxx∣∣x2Ax2 = σ 1 =\sigma_1 =σ1 ?

    • when x is the singular vector v 1 v_1 v1 (eigenvectors of A T A A^TA ATA)
    • ∥ A v 1 ∥ / ∥ v 1 ∥ = ∥ A v 1 ∥ / 1 = ∥ A v 1 ∥ = ∥ σ 1 u 1 ∥ = σ 1 \|Av_1\| / \|v_1\| = \|Av_1\| / 1 = \|Av_1\| = \|\sigma_1 u_1\| = \sigma_1 Av1∥/∥v1=Av1∥/1=Av1=σ1u1=σ1
  • ∥ A ∥ F = a d d a l l ∣ a i j ∣ 2 = σ 1 2 + . . . + σ r 2 \|A\|_F = \sqrt{add \quad all \quad |a_{ij}|^2} = \sqrt{\sigma_1^2 + ... + \sigma_r^2} AF=addallaij2 =σ12+...+σr2

    • Frobenius norm
    • In SVD: A = U Σ V T A = U\Sigma V^T A=UΣVT ⇒ \Rightarrow Orthogonal matrix U , V U, V U,V does not change any of these particular norms ⇒ \Rightarrow ∥ Σ ∥ F = ∥ A ∥ F \|\Sigma\|_F = \|A\|_F ∥ΣF=AF
  • ∥ A ∥ N \|A\|_N AN

    • Nuclear norm: σ 1 + σ 2 + . . . + σ r \sigma_1 + \sigma_2 + ... + \sigma_r σ1+σ2+...+σr

    • 在Deep Learning中的应用:

      🚩 conjecture 推测 – in a typical deep learning problem, there are many more weights than samples, and so there are a lot of possible minima – many different weights give the same minimum loss ⇒ \Rightarrow too many parameters (但也不见得是坏事,even part of the sucess)

      🚩 in a model situation, 梯度下降法 picks out the weights that minimize the nuclear norm ⇒ \Rightarrow 🚩 a norm of a lot of weights

    • 在压缩感知中也有应用

This lecture: about the norms

Next lecture: least squares problems

本节小结

  • 向量范数: L 1 L_1 L1, L 2 L_2 L2 norm等
  • 向量范数 geometry,范数的意义,几种范数用于优化时的特点
  • 矩阵范数
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

R.X. NLOS

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值