24、机器学习数学基础:线性代数

1、向量、矩阵、张量

标量:x
向量: ( x 1 , x 2 , . . . , x n ) ∈ R n (x_1,x_2,...,x_n)\in R^n (x1,x2,...,xn)Rn, x ⃗ − i \vec{x}_{-i} x i表示去掉第i个元素的向量。
矩阵: A i j , ( x 11 . . . x 1 n . . . . . . x n 1 . . . x n n ) A_{ij},\left(\begin{array}{ccc} x_{11}&...&x_{1n}\\ ...&&...\\ x_{n1}&...&x_{nn} \end{array} \right) Aij, x11...xn1......x1n...xnn
张量: A i j k l A_{ijkl} Aijkl

2、向量与矩阵的运算

范数、距离、长度:
L 2 : ∣ ∣ x ⃗ ∣ ∣ 2 = x 1 2 + x 2 2 + . . . + x n 2 L_2:||\vec x||_2=\sqrt{x_1^2+x_2^2+...+x_n^2} L2∣∣x 2=x12+x22+...+xn2

L p : ∣ ∣ x ⃗ ∣ ∣ p = ( ∑ i = 1 n ∣ x i ∣ p ) 1 p L_p:||\vec x||_p=(\sum_{i=1}^n{|x_i|^p})^{\frac{1}{p}} Lp∣∣x p=(i=1nxip)p1

L 1 : ∣ ∣ x ⃗ ∣ ∣ p = ( ∑ i = 1 n ∣ x i ∣ ) L_1:||\vec x||_p=(\sum_{i=1}^n{|x_i|}) L1∣∣x p=(i=1nxi)

L ∞ : lim ⁡ p → ∞ ∥ ∣ x ⃗ ∣ ∣ p = max ⁡ i = 1 ∣ x i ∣ L_\infty:\lim\limits_{p \to \infty}\||\vec x||_p=\max\limits_{i=1}|x_i| Lplim∥∣x p=i=1maxxi

L 0 :有多少个量不为 0 。 ∣ ∣ x ⃗ ∣ ∣ 0 = k , ∣ x i ∣ = { 0 x i = 0 1 x i ≠ 0 L_0:有多少个量不为0。||\vec x||_0=k,|x_i|=\begin{cases}0&x_i=0 \\1&x_i\neq0 \end{cases} L0:有多少个量不为0∣∣x 0=kxi={01xi=0xi=0

点乘: < x ⃗ , y ⃗ > = x ⃗ ⋅ y ⃗ = x ⃗ T y ⃗ = ∑ i = 1 n x i y i ∈ R n <\vec x,\vec y>=\vec x\cdot \vec y=\vec x^T\vec y=\sum\limits_{i=1}^nx_iy_i\in R^n <x ,y >=x y =x Ty =i=1nxiyiRn

c o s ( x ⃗ , y ⃗ ) = x ⃗ ⋅ y ⃗ ∣ ∣ x ⃗ ∣ ∣ 2 ∣ ∣ y ⃗ ∣ ∣ 2 cos(\vec{x},\vec{y})=\frac{\vec{x}\cdot \vec{y}}{||\vec x||_2||\vec y||_2} cos(x ,y )=∣∣x 2∣∣y 2x y

c o s ( x ⃗ , y ⃗ ) cos(\vec{x},\vec{y}) cos(x ,y )为1时, x ⃗ = k y ⃗ , x i = k y i \vec{x}=k\vec{y},x_i=ky_i x =ky xi=kyi

矩阵相乘A*B=C, C i j = A i ⋅ B ⋅ j C_{ij}=A_{i\cdot}B_{\cdot j} Cij=AiBj

hadamard:对应位置元素相乘, A ⊙ B = C A \odot B=C AB=C,A、B、C形状相同

Kronecker: ( a 11 a 12 a 21 a 22 ) ( b 11 b 12 b 21 b 22 ) = ( a 11 b 11 a 12 b 11 . . . a 21 b 11 a 22 b 11 . . . . . . . . . . . . ) \left(\begin{array}{ccc}a_{11}&a_{12}\\a_{21}&a_{22}\end{array}\right)\left(\begin{array}{ccc}b_{11}&b_{12}\\b_{21}&b_{22}\end{array}\right)=\left(\begin{array}{ccc}a_{11}b_{11}&a_{12}b_{11}&...\\a_{21}b_{11}&a_{22}b_{11}&...\\...&...&... \end{array}\right) (a11a21a12a22)(b11b21b12b22)= a11b11a21b11...a12b11a22b11............

3、张量的运算

向量是1阶张量
矩阵是2阶张量
k阶张量
a ⃗ ∈ R k , b ⃗ ∈ R n , c ⃗ ∈ R m , a ⋅ b ⋅ c ∈ R k ⋅ n ⋅ m \vec{a}\in R^k,\vec{b}\in R^n,\vec{c}\in R^m,a\cdot b \cdot c\in R^{k\cdot n\cdot m} a Rkb Rnc RmabcRknm

k阶张量与m阶张量相乘得到k+m阶张量

A i j ⋅ b k = C i , j , k A_{ij}\cdot b_k=C_{i,j,k} Aijbk=Ci,j,k

A i 1 , i 2 , . . . , i k ⋅ b i k + 1 , i k + 2 , . . . , i k + n = C i 1 , . . . , i k + n A_{i_1,i_2,...,i_k}\cdot b_{i_{k+1},i_{k+2},...,i_{k+n}}=C_{i_1,...,i_{k+n}} Ai1,i2,...,ikbik+1,ik+2,...,ik+n=Ci1,...,ik+n

4、矩阵的逆和伪逆

A n × n b n × 1 = c n × 1 A_{n\times n}b_{n\times 1}=c_{n\times 1} An×nbn×1=cn×1 R n R^n Rn空间到 R n R^n Rn空间的映射

只要满足 K A = A K = I , K = A − 1 KA=AK=I,K=A^{-1} KA=AK=IK=A1是A的逆
Penrose伪逆:满足条件 { A − 1 A A − 1 = A − 1 A A − 1 A = A ( A A − 1 ) T = A A − 1 ( A − 1 A ) T = A − 1 A \begin{cases}A^{-1}AA^{-1}=A^{-1}\\AA^{-1}A=A\\(AA^{-1})^T=AA^{-1}\\(A^{-1}A)^T=A^{-1}A\end{cases} A1AA1=A1AA1A=A(AA1)T=AA1(A1A)T=A1A

5、行列式

行列式是矩阵到实数的映射,矩阵可以看做由向量组成,矩阵 A = ( a 1 , . . . , a n ) , a i A=(a_1,...,a_n),a_i A=(a1,...,an)ai为向量,行列式为 f ( a 1 , . . . , a n ) f(a_1,...,a_n) f(a1,...,an)映射到R上,

A n × n b n × 1 = c n × 1 A_{n\times n}b_{n\times 1}=c_{n\times 1} An×nbn×1=cn×1 R n R^n Rn空间到 R n R^n Rn空间的映射,是线性变换,例如: ( a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 ) ( 1 0 0 ) = ( a 11 a 21 a 31 ) \left(\begin{array} {ccc}a_{11}&a_{12}&a_{13}\\a_{21}&a_{22}&a_{23}\\a_{31}&a_{32}&a_{33}\end{array}\right)\left(\begin{array}{ccc}1\\0\\0\end{array}\right)=\left(\begin{array}{ccc}a_{11}\\a_{21}\\a_{31}\end{array}\right) a11a21a31a12a22a32a13a23a33 100 = a11a21a31 ,矩阵将向量 e 1 = ( 1 0 0 ) e_1=\left(\begin{array}{ccc}1\\0\\0\end{array}\right) e1= 100 映射到 ( a 11 a 21 a 31 ) \left(\begin{array}{ccc}a_{11}\\a_{21}\\a_{31}\end{array}\right) a11a21a31 ,将向量 e 2 = ( 0 1 0 ) e_2=\left(\begin{array}{ccc}0\\1\\0\end{array}\right) e2= 010 映射到 ( a 12 a 22 a 32 ) \left(\begin{array}{ccc}a_{12}\\a_{22}\\a_{32}\end{array}\right) a12a22a32 ,将向量 e 3 = ( 0 0 1 ) e_3=\left(\begin{array}{ccc}0\\0\\1\end{array}\right) e3= 001 映射到 ( a 13 a 23 a 33 ) \left(\begin{array}{ccc}a_{13}\\a_{23}\\a_{33}\end{array}\right) a13a23a33 ,定义 f ( e 1 , e 2 , . . . , e n ) = 1 f(e_1,e_2,...,e_n)=1 f(e1,e2,...,en)=1(性质1),性质2线性性:若 α 1 = k 1 β 11 + k 2 β 12 , f ( α 1 , α 2 , . . . , α n ) = k 1 f ( β 11 , α 2 , . . . , α n ) + k 2 f ( β 12 , α 2 , . . . , α n ) \alpha_1=k_1\beta_{11}+k_2\beta_{12} ,f(\alpha_1,\alpha_2,...,\alpha_n)=k_1f(\beta_{11},\alpha_2,...,\alpha_n)+k_2f(\beta_{12},\alpha_2,...,\alpha_n) α1=k1β11+k2β12,f(α1,α2,...,αn)=k1f(β11,α2,...,αn)+k2f(β12,α2,...,αn)。性质3反对称性: f ( α 1 , α 2 , α 2 , . . . , α n ) = 0 f(\alpha_1,\alpha_2,\alpha_2,...,\alpha_n)=0 f(α1,α2,α2,...,αn)=0,这三条性质使得f是唯一确定的,f是n维多面体对应的体积也是行列式的值。

( a 11 . . . a 13 . . . . . . a n 1 . . . a n 3 ) ( x 1 . . . x n ) = ( c 1 . . . c n ) \left(\begin{array}{ccc}a_{11}&...&a_{13}\\...&&...\\a_{n1}&...&a_{n3}\end{array}\right)\left(\begin{array}{ccc}x_1\\...\\x_n\end{array}\right)=\left(\begin{array}{ccc}c_1\\...\\c_n\end{array}\right) a11...an1......a13...an3 x1...xn = c1...cn
等价于
c 1 = a 11 x 1 + a 12 x 2 + . . . + a 1 n x n c 2 = a 21 x 1 + a 22 x 2 + . . . + a 2 n x n . . . c n = a n 1 x 1 + a n 2 x 2 + . . . + a n n x n \begin{aligned}c_1=a_{11}x_1+a_{12}x_2+...+a_{1n}x_n\\c_2=a_{21}x_1+a_{22}x_2+...+a_{2n}x_n\\...\\c_n=a_{n1}x_1+a_{n2}x_2+...+a_{nn}x_n\end{aligned} c1=a11x1+a12x2+...+a1nxnc2=a21x1+a22x2+...+a2nxn...cn=an1x1+an2x2+...+annxn
a i a_i ai ( a 12 , a 22 , . . . , a n 2 ) , f 为行列式 x i f ( a 1 , a 2 , . . . , a n ) = f ( a 1 , a 2 , . . . , a i − 1 , x i a i , a i + 1 , . . . , a n ) = x 1 f ( a 1 , a 2 , . . . , a 1 , . . . , a n ) + f ( a 1 , a 2 , . . . , a i − 1 , x i a i , a i + 1 , . . . , a n ) = f ( a 1 , a 2 , . . . , x i a i + x 1 a 1 , . . . , a n ) = f ( a 1 , a 2 , . . . , x i a i + x 1 a 1 , . . . , a n ) = f ( a 1 , a 2 , . . . , ∑ i = 1 n x i a i , . . . , a n ) = f ( a 1 , a 2 , . . . , c , . . . , a n ) x i = f ( a 1 , a 2 , . . . , c , . . . , a n ) f ( a 1 , a 2 , . . . , a i , . . . , a n ) (a_{12},a_{22},...,a_{n2}),f为行列式\\x_if(a_1,a_2,...,a_n)\\=f(a_1,a_2,...,a_{i-1},x_ia_i,a_{i+1},...,a_n)\\=x_1f(a_1,a_2,...,a_1,...,a_n)+f(a_1,a_2,...,a_{i-1},x_ia_i,a_{i+1},...,a_n)\\=f(a_1,a_2,...,x_ia_i+x_1a_1,...,a_n)\\=f(a_1,a_2,...,x_ia_i+x_1a_1,...,a_n)\\=f(a_1,a_2,...,\sum_{i=1}^n x_ia_i,...,a_n)\\=f(a_1,a_2,...,c,...,a_n)\\x_i=\frac{f(a_1,a_2,...,c,...,a_n)}{f(a_1,a_2,...,a_i,...,a_n)} (a12,a22,...,an2),f为行列式xif(a1,a2,...,an)=f(a1,a2,...,ai1,xiai,ai+1,...,an)=x1f(a1,a2,...,a1,...,an)+f(a1,a2,...,ai1,xiai,ai+1,...,an)=f(a1,a2,...,xiai+x1a1,...,an)=f(a1,a2,...,xiai+x1a1,...,an)=f(a1,a2,...,i=1nxiai,...,an)=f(a1,a2,...,c,...,an)xi=f(a1,a2,...,ai,...,an)f(a1,a2,...,c,...,an)
矩阵求逆
A A − 1 = I AA^{-1}=I AA1=I A ( x 1 , . . . , x n ) = ( e 1 , . . . , e 2 ) A(x_1,...,x_n)=(e_1,...,e_2) A(x1,...,xn)=(e1,...,e2),通过前述方法求得每个 e i e_i ei对应的 x i x_i xi

6、二次型与正定性

二次型:只含有 x i x_i xi的二次项 f ( x 1 , . . . , x n ) = a 1 x 1 2 + . . . + a n x n 2 + b 11 x 1 x 2 + + b n ( n − 1 ) x n − 1 x n f(x_1,...,x_n)=a_1x_1^2+...+a_nx_n^2+b_{11}x_1x_2++b_{n(n-1)}x_{n-1}x_n f(x1,...,xn)=a1x12+...+anxn2+b11x1x2++bn(n1)xn1xn
如何判断二次恒正或恒负
假设矩阵是对称的
x T A x = ( x 1 , x 2 ) ( a 11 a 12 a 21 a 22 ) ( x 1 x 2 ) = a 11 x 1 2 + a 12 x 1 x 2 + a 21 x 1 x 2 + a 22 x 2 2 = a 11 x 1 2 + 2 a 12 x 1 x 2 + a 22 x 2 2 x^TAx=(x_1,x_2)\left(\begin{array}{ccc}a_{11}&a_{12}\\a_{21}&a_{22}\end{array}\right)\left(\begin{array}{ccc}x_1\\x_2\end{array}\right)=a_{11}x_1^2+a_{12}x_1x_2+a_{21}x_1x_2+a_{22}x_2^2=a_{11}x_1^2+2a_{12}x_1x_2+a_{22}x_2^2 xTAx=(x1,x2)(a11a21a12a22)(x1x2)=a11x12+a12x1x2+a21x1x2+a22x22=a11x12+2a12x1x2+a22x22

如果有 x T A x = x T V T V x = ( V x ) T V x = y T y = ∑ y i 2 ≥ 0 x^TAx=x^TV^TVx=(Vx)^TVx=y^Ty=\sum y_i^2\geq0 xTAx=xTVTVx=(Vx)TVx=yTy=yi20,则A是半正定的

x 1 2 + x 1 x 2 + x 2 2 = ( x 1 + 0.5 x 2 ) 2 + 0.75 x 2 2 = 0.75 x 1 2 + ( 0.5 x 1 + x 2 ) 2 x_1^2+x_1x_2+x_2^2=(x_1+0.5x_2)^2+0.75x_2^2=0.75x_1^2+(0.5x_1+x_2)^2 x12+x1x2+x22=(x1+0.5x2)2+0.75x22=0.75x12+(0.5x1+x2)2可以看成

x T A x = y T B y x^TAx=y^TBy xTAx=yTBy

A = ( 1 0.5 0.5 1 ) = ( 1 0 0.5 1 ) ( 1 0 0 0.75 ) ( 1 0.5 0 1 ) = ( 1 0.5 0 1 ) ( 0.75 0 0 1 ) ( 1 0 0.5 1 ) = V T B V A=\left(\begin{array}{ccc}1&0.5\\0.5&1\end{array}\right)=\left(\begin{array}{ccc}1&0\\0.5&1\end{array}\right)\left(\begin{array}{ccc}1&0\\0&0.75\end{array}\right)\left(\begin{array}{ccc}1&0.5\\0&1\end{array}\right)=\left(\begin{array}{ccc}1&0.5\\0&1\end{array}\right)\left(\begin{array}{ccc}0.75&0\\0&1\end{array}\right)\left(\begin{array}{ccc}1&0\\0.5&1\end{array}\right)=V^TBV A=(10.50.51)=(10.501)(1000.75)(100.51)=(100.51)(0.75001)(10.501)=VTBV

惯性定理:A与B是合同的,只要B只有对角线不为0,B对角线上正数的数目(正惯性指数)是固定的,负数的数目(负惯性指数)是固定的。
hessian在该值是正定的,证明该值时极小值

6、分解

如果 A = V T V A=V^TV A=VTV则A为正定的,如果A为正定,则一定可以找到满足上式的V
cholesby分解:如果A是正定且对称的,存在L是下三角矩阵,使得 A = V T V A=V^TV A=VTV

LU分解:A为 m × n m\times n m×n矩阵,L为 m × k m\times k m×k上三角矩阵,U为 k × n k\times n k×n下三角矩阵,A=LU, A − 1 = ( L U ) − 1 = U − 1 L − 1 A^{-1}=(LU)^{-1}=U^{-1}L^{-1} A1=(LU)1=U1L1,转化为上三角矩阵和下三角矩阵求逆比较方便
A = a 1 a 1 T + a 2 a 2 T + . . . + a n a n T A=a_1a_1^T+a_2a_2^T+...+a_na_n^T A=a1a1T+a2a2T+...+ananT
( x T a 1 ) ( a 1 T x ) ≥ 0 (x^Ta_1)(a_1^Tx)\geq 0 (xTa1)(a1Tx)0

特征分解: A = V D V − 1 A=VDV^{-1} A=VDV1,V是一组正交基: V = ( v 1 , v 2 , . . . , v n ) , v i v j = 0 ( i ≠ j ) 即两个向量互相垂直 , v i ⋅ v i = 1 即长度为 1 , V V − 1 = I 为单位阵, V − 1 = V T V=(v_1,v_2,...,v_n),v_iv_j=0(i\neq j)即两个向量互相垂直,v_i\cdot v_i=1即长度为1,VV^{-1}=I为单位阵,V^{-1}=V^T V=(v1,v2,...,vn),vivj=0(i=j)即两个向量互相垂直,vivi=1即长度为1VV1=I为单位阵,V1=VT,D为对角阵, A = V D V − 1 = ∑ i = 1 n d i v i v i T ≈ D 1 v i v i T + D 2 v j v j T , v i 为 n × 1 的矩阵 , D 1 = max ⁡ i d i A=VDV^{-1}=\sum_{i=1}^nd_iv_iv_i^T\approx D_1v_iv_i^T+D_2v_jv_j^T,v_i为n\times 1的矩阵,D_1=\max\limits_i d_i A=VDV1=i=1ndiviviTD1viviT+D2vjvjT,vin×1的矩阵,D1=imaxdi

奇异值分解: A = U D V T = ∑ i = 1 n d i u i v i T , U 是 m × m 正交矩阵, D 是 m × n 的对角矩阵, V 是 n × n 的正交矩阵 A=UDV^T=\sum\limits_{i=1}^nd_iu_iv_i^T,U是m\times m正交矩阵,D是m\times n的对角矩阵,V是n\times n的正交矩阵 A=UDVT=i=1ndiuiviTUm×m正交矩阵,Dm×n的对角矩阵,Vn×n的正交矩阵

A A T = U D V T ( U D V T ) T = U D V T V D T U T = U D 2 U T AA^T=UDV^T(UDV^T)^T=UDV^TVD^TU^T=UD^2U^T AAT=UDVT(UDVT)T=UDVTVDTUT=UD2UT

A T A = V D 2 V T A^TA=VD^2V^T ATA=VD2VT

QR分解:Q是指正交矩阵,R是指上三角矩阵, A = Q R , 对 A 进行 Q R 分解, A 0 = Q 0 R 0 , A 1 = R 0 Q 0 = Q 1 R 1 . . . 经过几十轮迭代, A n 对角线上的值和特征值十分接近 A=QR,对A进行QR分解,A_0=Q_0R_0,A_1=R_0Q_0=Q_1R_1...经过几十轮迭代,A_n对角线上的值和特征值十分接近 A=QR,A进行QR分解,A0=Q0R0A1=R0Q0=Q1R1...经过几十轮迭代,An对角线上的值和特征值十分接近

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值