学习Coursera上Mathematics for Machine Learning Specialization后所做的笔记与整理
文章目录
- 第一部分 Linear Algebra 线性代数
- 第二部分 Multivariate 多元微积分
- 第三部分 PCA (Principal Component Analysis) 主成分分析
-
- 1. 1-D datasets 一维数据
- 2. Definite symmetric matrix
- 3. higher-dimensional datasets 高维数据
- 4. Effect of Linear Transformations 线性变换对均值与方差对影响
- 5. Dot product 点积
- 6. Inner product 内积
- 7. Projection 投影
- 8. PCA derivation 主成分分析推导
-
- 8.1 Setting up ( x n = ∑ i = 1 D β i n b i x_n=\sum_{i=1}^D\beta_{in}b_i xn=∑i=1Dβinbi, x n ~ = ∑ i = i M β i n b i \tilde{x_n} = \sum_{i=i}^M\beta_{in}b_i xn~=∑i=iMβinbi, J = 1 N ∥ X n − X n ~ ∥ 2 \mathbf{J} =\frac{1}{N}\|X_n-\tilde{X_n}\|^2 J=N1∥Xn−Xn~∥2, S = 1 N ∑ n = 1 N X n X n T \mathrm{S}=\frac{1}{N}\sum_{n=1}^N X_nX_n^T S=N1∑n=1NXnXnT)
- 8.2 got coordinate/code β i n \beta_{in} βin ( β i n = x n T b i \beta_{in}=x_n^Tb_i βin=xnTbi)
- 8.3 rewrite the formula ( x n − x n ~ = ∑ i = M + 1 D ( b i T x n ) b i x_n-\tilde{x_n}=\sum_{i=M+1}^D (b_i^T x_n) b_i xn−xn~=∑i=M+1D(biTxn)bi)
- 8.4 redefine J \mathrm{J} J( J = B ′ B ′ T S , B = b M + 1 ′ , ⋯ , b D ′ , b i ′ ∈ R D × 1 \mathrm{J} = B'B'^TS, B = b'_{M+1},\cdots,b'_D, b'_i \in R^{D \times 1} J=B′B′TS,B=bM+1′,⋯,bD′,bi′∈RD×1)
- 8.5 solve b i b_i bi
- 9. Key steps of PCA algorithm
- 10. PCA in high dimensions 高维数据的优化
第一部分 Linear Algebra 线性代数
1. Vector operations 矢量运算
- commutative 交换律: r + s = s + r \text{commutative 交换律:} \quad r + s = s + r commutative 交换律:r+s=s+r
- 2 r = r + r 2r = r + r 2r=r+r
- ∥ r ∥ 2 = ∑ i r i 2 \|r\|^2 = \sum_{i} r_i^2 ∥r∥2=∑iri2
1.1 dot or inner product 点积/数量积/内积
点积是一种特殊的内积
r ⋅ s = ∑ i r i s i r \cdot s = \sum_{i} r_i s_i r⋅s=i∑risi
- commutative 交换律: r ⋅ s = s ⋅ r \text{commutative 交换律:} \quad r \cdot s = s \cdot r commutative 交换律:r⋅s=s⋅r
- distributive 分配律: r ⋅ ( s + t ) = r ⋅ s + r ⋅ t \text{distributive 分配律:} \quad r \cdot (s + t) = r \cdot s + r \cdot t distributive 分配律:r⋅(s+t)=r⋅s+r⋅t
- associative 结合律 r ⋅ ( a s ) = a ( r ⋅ s ) \text{associative 结合律} \quad r \cdot (a s) = a(r \cdot s) associative 结合律r⋅(as)=a(r⋅s)
- r ⋅ r = ∥ r ∥ 2 r \cdot r = \|r\|^2 r⋅r=∥r∥2
- r ⋅ s = ∥ r ∥ ∥ s ∥ cos θ r \cdot s = \|r\| \|s\| \cos \theta r⋅s=∥r∥∥s∥cosθ
1.2 scalar and vector projection 投影
- scalar projection 投影/标量投影
例:向量s在向量r上的投影 r ⋅ s ∥ r ∥ \frac{r \cdot s}{\|r\|} ∥r∥r⋅s - vector projection 矢量投影
例:向量s在向量r上的投影 r ⋅ s r ⋅ r r \frac{r \cdot s} {r \cdot r} r r⋅rr⋅sr
2. basis 基
A basis is a set of n n n vectors that:
- are not linear combinations of each other
- span the space
The Space is then n-dimensional.
在线性空间 V V V中,如果存在 n n n个元素 a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1,a2,…,an,满足:
- a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1,a2,…,an线性无关;
- V V V中任一元素 a a a总可由 a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1,a2,…,an线性表示。
那么, a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1,a2,…,an就称为线性空间 V V V的一个基, n n n称为线性空间 V V V的维数。只含有一个零元素的线性空间没有基,规定它的维数为0.
维数为 n n n的线性空间称为 n n n维线性空间,记作 V n V_n Vn。
(同济大学线性代数第五版第六章第二节)
3. Matrices 矩阵
由 m × n m \times n m×n个数 a i j ( i = 1 , 2 , ⋯ , m ; j = 1 , 2 , … , n ) a_{ij}(i=1,2,\cdots,m;j=1,2,\dots,n) aij(i=1,2,⋯,m;j=1,2,…,n)排成的 m m m行 n n n列的数表称为 m m m行 n n n列矩阵,简称 m × n m \times n m×n矩阵。记作
A = ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m n ) A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} A=⎝⎜⎜⎜⎛a11a21⋮am1a12a22⋮am2⋯⋯⋱⋯a1na2n⋮amn⎠⎟⎟⎟⎞
(同济大学线性代数第五版第二章第一节)
矩阵与数相乘
同矩阵间的加法合起来,统称为矩阵的线形运算
- commutative 交换律
λ A = A λ \lambda A=A \lambda λA=Aλ - distributive 分配律
( λ + μ ) A = λ A + μ A (\lambda + \mu)A =\lambda A + \mu A (λ+μ)A=λA+μA
λ ( A + B ) = λ A + λ B \lambda(A + B) = \lambda A + \lambda B λ(A+B)=λA+λB - associative 结合律
( λ μ ) A = λ ( μ A ) (\lambda \mu) A = \lambda (\mu A) (λμ)A=λ(μA)
矩阵与向量相乘
[ a b c d ] [ e f ] = [ a e + b f c e + d f ] \begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} e \\ f \end{bmatrix} = \begin{bmatrix} ae + bf \\ ce + df \end{bmatrix} [acbd][ef]=[ae+bfce+df]
- 向量与矩阵相乘可以理解为: 向量 r r r经过矩阵A变换为 r ′ A r = r ′ r' \quad Ar=r' r′Ar=r′
- A ( n r ) = n ( A r ) = n r ′ A(nr) = n(Ar) = nr' A(nr)=n(Ar)=nr′ \quad ( n n n是常数)
- 分配律 A ( r + s ) = A r + A s \quad A(r + s) = Ar + As A(r+s)=Ar+As
- Identity 单位矩阵
I = [ 1 0 0 1 ] I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} I=[1001] - clockwise rotation by θ \theta θ 顺时针旋转 θ \theta θ角度
[ cos θ sin θ − sin θ cos θ ] \begin{bmatrix} \cos \theta & \sin \theta \\ - \sin \theta & \cos \theta \end{bmatrix} [cosθ−sinθsinθcosθ] - determinant of 2×2 matrix 行列式
∣ A ∣ = d e t A = d e t [ a b c d ] = a d − b c |A| = det A= det \begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc ∣A∣=detA=det[acbd]=ad−bc - inverse of 2×2 matrix 逆矩阵
[ a b c d ] − 1 = 1 a d − b c [ d − b − c a ] \begin{bmatrix} a & b \\ c & d \end{bmatrix}^{-1} = \frac {1} {ad -bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix} [acbd]−1=ad−bc1[d−c−ba]
矩阵与矩阵相乘
- multiplying matrices A A A and B B B 矩阵相乘
A ∈ R m × n , B ∈ R n × l A \in R^{m \times n}, B \in R^{n \times l} A∈Rm×n,B∈Rn×l
A B = [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 … a m n ] [ b 11 b 12 ⋯ b 1 l b 21 b 22 ⋯ b 2 l ⋮ ⋮ ⋱ ⋮ b n 1 b n 2 … b n l ] = [ c 11 c 12 ⋯ c 1 l c 21 c 22 ⋯ c 2 l ⋮ ⋮ ⋱ ⋮ c m 1 c m 2 … c m l ] = C A B = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} \end{bmatrix} \begin{bmatrix} b_{11} & b_{12} & \cdots & b_{1l} \\ b_{21} & b_{22} & \cdots & b_{2l} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n1} & b_{n2} & \dots & b_{nl} \end{bmatrix} = \begin{bmatrix} c_{11} & c_{12} & \cdots & c_{1l} \\ c_{21} & c_{22} & \cdots & c_{2l} \\ \vdots & \vdots & \ddots & \vdots \\ c_{m1} & c_{m2} & \dots & c_{ml} \end{bmatrix} = C AB=⎣⎢⎢⎢⎡a11a21⋮am1a12a22⋮am2⋯⋯⋱…a1na2n⋮amn⎦⎥⎥⎥⎤⎣⎢⎢⎢⎡b11b21⋮bn1b12b22⋮bn2⋯⋯⋱…b1lb2l⋮bnl⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡c11c21⋮cm1c12c22⋮cm2⋯⋯⋱…c1lc2l⋮cml⎦⎥⎥⎥⎤=C
c i k = a b i k = ∑ j = 1 n a i j b j k c_{ik} = ab_{ik} = \sum_{j = 1}^n a_{ij}b_{jk} cik=abik=j=1∑naijbjk- Einstein summation convention for multiplying matrices A A A and B B B
A B = C AB=C AB=C
- Einstein summation convention for multiplying matrices A A A and B B B
- A A − 1 = A − 1 A = I AA^{-1}=A^{-1}A=I AA−1=A−1A=I (A is square matrix, A is not invertible matrix)
- 不满足交换律
- distributive 分配律
A ( B + C ) = A B + A C A(B+C) = AB + AC A(B+C)=AB+AC
( B + C ) A = B A + C A (B+C)A = BA + CA (B+C)A=BA+CA - associative 结合律
( A B ) C = A ( B C ) (AB)C=A(BC) (AB)C=A(BC)
λ ( A B ) = ( λ A ) B = A ( λ B ) \lambda (AB) = (\lambda A)B = A(\lambda B) λ(AB)=(λA)B=A(λB) ( λ \lambda λ 是常数)
4. change of basis 基变换/坐标变换
Change from an original basis to a new, primed basis. The columns of the transformation matrix P P P are the new basis vectors in the original coordinate system. So
Let α ( a 1 , a 2 , … , a n ) \alpha (a_1,a_2,\dots,a_n) α(a1,a2,…,an) denote the old basis, β ( b 1 , b 2 , … , b n ) \beta (b_1,b_2,\dots,b_n) β(b1,b2,…,bn) denote the new basis. We can get,
β = α P \beta = \alpha P β=αP
or,
β T = P T α T \beta^T=P^T\alpha^T βT=PTαT
where r ′ r' r′ is the vector in the new basis, and r r r is the vector in the original basis.
$ α r = β r ′ = α P r ′ \alpha r = \beta r' = \alpha P r' αr=βr′=αPr′
r ′ = P − 1 r r' = P^{-1}r r′=P−1r
(参考同济大学线性代数第五版第六章第三节)
正交基
If a matrix A A A is orthonormal (all the columns are of unit size and orthogonal to each other) then
矩阵 A A A是正交矩阵(正交阵)的充分必要条件是 A A A等列向量都是单位向量,且两两正交。
A T = A − 1 A^T = A^{-1} AT=A−1
即
A T A − 1 = E A^TA^{-1}=E ATA−1=E
即
[ a 1 T a 2 T … a n T ] ( a 1 , a 2 , … , a n ) = E \begin{bmatrix} a_1^T \\ a_2^T \\ \dots \\ a_n^T \end{bmatrix} (a_1,a_2,\dots,a_n)=E ⎣⎢⎢⎡a1Ta2T…anT⎦⎥⎥⎤(a1,a2,…,an)=E
也即
( a i T a j ) = ( δ i j ) (a_i^Ta_j) = (\delta_{ij}) (aiTaj)=(δij)
相当于 n 2 n^2 n2个关系式
a i T a j = δ i j = { 1 when i = j 0 when i ≠ j a_i^Ta_j = \delta_{ij} = \begin{cases} 1 & \quad \text{when } i = j\\ 0 & \quad \text{when } i \neq j \end{cases} aiTaj=δij={
10when i=jwhen i=j
因为 A T = A − 1 A^T=A^{-1} AT=A−1,所以上述结论对 A A A的行向量亦成立。
(补充阅读:同济大学线性代数第五版第六章第三节)
5. Gram-Schmidt process for constructing an orthonormal basis 用格拉姆-施密特正交化构建正交基
Start with n n n linearly independent basis vectors v = { v 1 , v 2 , … , v n } v = \{ v_1,v_2,\dots,v_n \} v={
v1,v2,…,vn}. Then
e 1 = v 1 ∥ v 1 ∥ e_1 = \frac {v_1} {\|v_1\|} e1=∥v1∥v1
u 2 = v 2 − v 2 ⋅ e 1 e 1 ⋅ e 1 e 1 = v 2 − ( v 2 ⋅ e 1 ) e 1 u_2 = v_2 - \frac{v_2 \cdot e_1}{e_1 \cdot e_1}e_1= v_2 - (v_2 \cdot e_1)e_1 u2=v2−e1⋅e1v2⋅e1e1=v2−(v2⋅e1)e1, so e 2 = u 2 ∥ u 2 ∥ e_2 = \frac {u_2} {\|u_2\|} e2=∥u2∥u2
… and so on for u 3 u_3 u3 being the remnant part of v 3 v_3 v3 not composed of the preceding