Mathematics for Machine Learning 学习笔记

这篇博客是关于Coursera上的《Mathematics for Machine Learning》课程的学习笔记,主要涵盖了线性代数的基础概念如矢量运算、矩阵、基变换、正交基和特征分解,以及多元微积分的导数定义、泰勒展开式,最后探讨了主成分分析(PCA)的原理和步骤,包括线性变换对数据方差的影响、特征向量和特征值的计算以及PCA算法的关键步骤。
摘要由CSDN通过智能技术生成

学习Coursera上Mathematics for Machine Learning Specialization后所做的笔记与整理

文章目录

第一部分 Linear Algebra 线性代数

1. Vector operations 矢量运算

  • commutative 交换律: r + s = s + r \text{commutative 交换律:} \quad r + s = s + r commutative 交换律:r+s=s+r
  • 2 r = r + r 2r = r + r 2r=r+r
  • ∥ r ∥ 2 = ∑ i r i 2 \|r\|^2 = \sum_{i} r_i^2 r2=iri2

1.1 dot or inner product 点积/数量积/内积

点积是一种特殊的内积
r ⋅ s = ∑ i r i s i r \cdot s = \sum_{i} r_i s_i rs=irisi

  • commutative 交换律: r ⋅ s = s ⋅ r \text{commutative 交换律:} \quad r \cdot s = s \cdot r commutative 交换律:rs=sr
  • distributive 分配律: r ⋅ ( s + t ) = r ⋅ s + r ⋅ t \text{distributive 分配律:} \quad r \cdot (s + t) = r \cdot s + r \cdot t distributive 分配律:r(s+t)=rs+rt
  • associative 结合律 r ⋅ ( a s ) = a ( r ⋅ s ) \text{associative 结合律} \quad r \cdot (a s) = a(r \cdot s) associative 结合律r(as)=a(rs)
  • r ⋅ r = ∥ r ∥ 2 r \cdot r = \|r\|^2 rr=r2
  • r ⋅ s = ∥ r ∥ ∥ s ∥ cos ⁡ θ r \cdot s = \|r\| \|s\| \cos \theta rs=rscosθ

1.2 scalar and vector projection 投影

  • scalar projection 投影/标量投影
    例:向量s在向量r上的投影 r ⋅ s ∥ r ∥ \frac{r \cdot s}{\|r\|} rrs
  • vector projection 矢量投影
    例:向量s在向量r上的投影 r ⋅ s r ⋅ r r \frac{r \cdot s} {r \cdot r} r rrrsr

2. basis 基

A basis is a set of n n n vectors that:

  • are not linear combinations of each other
  • span the space
    The Space is then n-dimensional.

在线性空间 V V V中,如果存在 n n n个元素 a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1,a2,,an,满足:

  • a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1,a2,,an线性无关;
  • V V V中任一元素 a a a总可由 a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1,a2,,an线性表示。

那么, a 1 , a 2 , … , a n a_1,a_2,\dots,a_n a1,a2,,an就称为线性空间 V V V的一个 n n n称为线性空间 V V V维数。只含有一个零元素的线性空间没有基,规定它的维数为0.
维数为 n n n的线性空间称为 n n n维线性空间,记作 V n V_n Vn
(同济大学线性代数第五版第六章第二节)

3. Matrices 矩阵

m × n m \times n m×n个数 a i j ( i = 1 , 2 , ⋯   , m ; j = 1 , 2 , … , n ) a_{ij}(i=1,2,\cdots,m;j=1,2,\dots,n) aij(i=1,2,,m;j=1,2,,n)排成的 m m m n n n列的数表称为 m m m n n n列矩阵,简称 m × n m \times n m×n矩阵。记作
A = ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m n ) A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix} A=a11a21am1a12a22am2a1na2namn
(同济大学线性代数第五版第二章第一节)

矩阵与数相乘

同矩阵间的加法合起来,统称为矩阵的线形运算

  • commutative 交换律
    λ A = A λ \lambda A=A \lambda λA=Aλ
  • distributive 分配律
    ( λ + μ ) A = λ A + μ A (\lambda + \mu)A =\lambda A + \mu A (λ+μ)A=λA+μA
    λ ( A + B ) = λ A + λ B \lambda(A + B) = \lambda A + \lambda B λ(A+B)=λA+λB
  • associative 结合律
    ( λ μ ) A = λ ( μ A ) (\lambda \mu) A = \lambda (\mu A) (λμ)A=λ(μA)

矩阵与向量相乘

[ a b c d ] [ e f ] = [ a e + b f c e + d f ] \begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} e \\ f \end{bmatrix} = \begin{bmatrix} ae + bf \\ ce + df \end{bmatrix} [acbd][ef]=[ae+bfce+df]

  • 向量与矩阵相乘可以理解为: 向量 r r r经过矩阵A变换为 r ′ A r = r ′ r' \quad Ar=r' rAr=r
  • A ( n r ) = n ( A r ) = n r ′ A(nr) = n(Ar) = nr' A(nr)=n(Ar)=nr \quad n n n是常数)
  • 分配律 A ( r + s ) = A r + A s \quad A(r + s) = Ar + As A(r+s)=Ar+As
  • Identity 单位矩阵
    I = [ 1 0 0 1 ] I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} I=[1001]
  • clockwise rotation by θ \theta θ 顺时针旋转 θ \theta θ角度
    [ cos ⁡ θ sin ⁡ θ − sin ⁡ θ cos ⁡ θ ] \begin{bmatrix} \cos \theta & \sin \theta \\ - \sin \theta & \cos \theta \end{bmatrix} [cosθsinθsinθcosθ]
  • determinant of 2×2 matrix 行列式
    ∣ A ∣ = d e t A = d e t [ a b c d ] = a d − b c |A| = det A= det \begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc A=detA=det[acbd]=adbc
  • inverse of 2×2 matrix 逆矩阵
    [ a b c d ] − 1 = 1 a d − b c [ d − b − c a ] \begin{bmatrix} a & b \\ c & d \end{bmatrix}^{-1} = \frac {1} {ad -bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix} [acbd]1=adbc1[dcba]

矩阵与矩阵相乘

  • multiplying matrices A A A and B B B 矩阵相乘
    A ∈ R m × n , B ∈ R n × l A \in R^{m \times n}, B \in R^{n \times l} ARm×n,BRn×l
    A B = [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 … a m n ] [ b 11 b 12 ⋯ b 1 l b 21 b 22 ⋯ b 2 l ⋮ ⋮ ⋱ ⋮ b n 1 b n 2 … b n l ] = [ c 11 c 12 ⋯ c 1 l c 21 c 22 ⋯ c 2 l ⋮ ⋮ ⋱ ⋮ c m 1 c m 2 … c m l ] = C A B = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} \end{bmatrix} \begin{bmatrix} b_{11} & b_{12} & \cdots & b_{1l} \\ b_{21} & b_{22} & \cdots & b_{2l} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n1} & b_{n2} & \dots & b_{nl} \end{bmatrix} = \begin{bmatrix} c_{11} & c_{12} & \cdots & c_{1l} \\ c_{21} & c_{22} & \cdots & c_{2l} \\ \vdots & \vdots & \ddots & \vdots \\ c_{m1} & c_{m2} & \dots & c_{ml} \end{bmatrix} = C AB=a11a21am1a12a22am2a1na2namnb11b21bn1b12b22bn2b1lb2lbnl=c11c21cm1c12c22cm2c1lc2lcml=C
    c i k = a b i k = ∑ j = 1 n a i j b j k c_{ik} = ab_{ik} = \sum_{j = 1}^n a_{ij}b_{jk} cik=abik=j=1naijbjk
    • Einstein summation convention for multiplying matrices A A A and B B B
      A B = C AB=C AB=C
  • A A − 1 = A − 1 A = I AA^{-1}=A^{-1}A=I AA1=A1A=I (A is square matrix, A is not invertible matrix)
  • 不满足交换律
  • distributive 分配律
    A ( B + C ) = A B + A C A(B+C) = AB + AC A(B+C)=AB+AC
    ( B + C ) A = B A + C A (B+C)A = BA + CA (B+C)A=BA+CA
  • associative 结合律
    ( A B ) C = A ( B C ) (AB)C=A(BC) (AB)C=A(BC)
    λ ( A B ) = ( λ A ) B = A ( λ B ) \lambda (AB) = (\lambda A)B = A(\lambda B) λ(AB)=(λA)B=A(λB) ( λ \lambda λ 是常数)

4. change of basis 基变换/坐标变换

Change from an original basis to a new, primed basis. The columns of the transformation matrix P P P are the new basis vectors in the original coordinate system. So
Let α ( a 1 , a 2 , … , a n ) \alpha (a_1,a_2,\dots,a_n) α(a1,a2,,an) denote the old basis, β ( b 1 , b 2 , … , b n ) \beta (b_1,b_2,\dots,b_n) β(b1,b2,,bn) denote the new basis. We can get,
β = α P \beta = \alpha P β=αP
or,
β T = P T α T \beta^T=P^T\alpha^T βT=PTαT
where r ′ r' r is the vector in the new basis, and r r r is the vector in the original basis.
$ α r = β r ′ = α P r ′ \alpha r = \beta r' = \alpha P r' αr=βr=αPr
r ′ = P − 1 r r' = P^{-1}r r=P1r
(参考同济大学线性代数第五版第六章第三节)

正交基

If a matrix A A A is orthonormal (all the columns are of unit size and orthogonal to each other) then
矩阵 A A A是正交矩阵(正交阵)的充分必要条件是 A A A等列向量都是单位向量,且两两正交。
A T = A − 1 A^T = A^{-1} AT=A1

A T A − 1 = E A^TA^{-1}=E ATA1=E

[ a 1 T a 2 T … a n T ] ( a 1 , a 2 , … , a n ) = E \begin{bmatrix} a_1^T \\ a_2^T \\ \dots \\ a_n^T \end{bmatrix} (a_1,a_2,\dots,a_n)=E a1Ta2TanT(a1,a2,,an)=E
也即
( a i T a j ) = ( δ i j ) (a_i^Ta_j) = (\delta_{ij}) (aiTaj)=(δij)
相当于 n 2 n^2 n2个关系式
a i T a j = δ i j = { 1 when  i = j 0 when  i ≠ j a_i^Ta_j = \delta_{ij} = \begin{cases} 1 & \quad \text{when } i = j\\ 0 & \quad \text{when } i \neq j \end{cases} aiTaj=δij={ 10when i=jwhen i=j
因为 A T = A − 1 A^T=A^{-1} AT=A1,所以上述结论对 A A A的行向量亦成立。
(补充阅读:同济大学线性代数第五版第六章第三节)

5. Gram-Schmidt process for constructing an orthonormal basis 用格拉姆-施密特正交化构建正交基

Start with n n n linearly independent basis vectors v = { v 1 , v 2 , … , v n } v = \{ v_1,v_2,\dots,v_n \} v={ v1,v2,,vn}. Then
e 1 = v 1 ∥ v 1 ∥ e_1 = \frac {v_1} {\|v_1\|} e1=v1v1
u 2 = v 2 − v 2 ⋅ e 1 e 1 ⋅ e 1 e 1 = v 2 − ( v 2 ⋅ e 1 ) e 1 u_2 = v_2 - \frac{v_2 \cdot e_1}{e_1 \cdot e_1}e_1= v_2 - (v_2 \cdot e_1)e_1 u2=v2e1e1v2e1e1=v2(v2e1)e1, so e 2 = u 2 ∥ u 2 ∥ e_2 = \frac {u_2} {\|u_2\|} e2=u2u2
… and so on for u 3 u_3 u3 being the remnant part of v 3 v_3 v3 not composed of the preceding

  • 2
    点赞
  • 21
    收藏
    觉得还不错? 一键收藏
  • 9
    评论
评论 9
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值