最小二乘法的矩阵表达

1 前期准备

为了方便表述,我们先做一些很简单的定义:

假设有一多项式函数:
f ( x 1 , x 2 , ⋯   , x m ) = ∑ i = 1 m a i x i f( x_1,x_2,\cdots ,x_m) =\sum_{i=1}^m{a_ix_i} f(x1,x2,,xm)=i=1maixi
我们将函数中的自变量都提取出来组成一个列向量 x x x
x = [ x 1 , x 2 , ⋯   , x m ] T x=[x_1,x_2,\cdots,x_m]^T x=[x1,x2,,xm]T
则称 x x x为一个向量变元

[ x 1 , x 2 ] T [x_1,x_2]^T [x1,x2]T就是 f ( x 1 , x 2 ) = x 1 + 2 x 2 f(x_1,x_2)=x_1+2x_2 f(x1,x2)=x1+2x2向量变元

此时,如果我们按照向量变元内部的变量排列顺序,依次在每个变量位置填上该变量对应的偏导函数,则就构成了对于函数 f ( x 1 , x 2 , ⋯   , x m ) f( x_1,x_2,\cdots ,x_m) f(x1,x2,,xm)进行向量变元 x x x的向量求导的结果,即:
∂ f ( x 1 , x 2 , ⋯   , x m ) ∂ x = [ ∂ f ( x 1 , x 2 , ⋯   , x m ) ∂ x 1 , ∂ f ( x 1 , x 2 , ⋯   , x m ) ∂ x 2 , ⋯   , ∂ f ( x 1 , x 2 , ⋯   , x m ) ∂ x m ] T \frac{\partial f(x_1,x_2,\cdots ,x_m)}{\partial x}=[ \frac{\partial f(x_1,x_2,\cdots ,x_m)}{\partial x_1},\frac{\partial f(x_1,x_2,\cdots ,x_m)}{\partial x_2},\cdots ,\frac{\partial f(x_1,x_2,\cdots ,x_m)}{\partial x_m} ] ^T xf(x1,x2,,xm)=[x1f(x1,x2,,xm),x2f(x1,x2,,xm),,xmf(x1,x2,,xm)]T
据此,我们对向量求导做出定义:

f ( x ) f(x) f(x)是一个关于 x x x的函数,其中 x x x是向量变元,并且 x = [ x 1 , x 2 , . . . , x n ] T x = [x_1, x_2,...,x_n]^T x=[x1,x2,...,xn]T


∂ f ∂ x = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , . . . , ∂ f ∂ x n ] T \frac{\partial f}{\partial x} = [\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n}]^T xf=[x1f,x2f,...,xnf]T
而该表达式也被称为向量求导的梯度向量形式。
∇ x f ( x ) = ∂ f ∂ x = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , . . . , ∂ f ∂ x n ] T \nabla _xf(x) = \frac{\partial f}{\partial x} = [\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, ..., \frac{\partial f}{\partial x_n}]^T xf(x)=xf=[x1f,x2f,...,xnf]T
接下来,我们去证明几个等式,这些等式都将再最小二乘法的矩阵形式推导中用到。

等式一:
∂ a ∂ x = 0 \frac{\partial a}{\partial x} = 0 xa=0
证明:
∂ a ∂ x = [ ∂ a ∂ x 1 , ∂ a ∂ x 2 , . . . , ∂ a ∂ x n ] T = [ 0 , 0 , . . . , 0 ] T \frac{\partial a}{\partial x} = [\frac{\partial a}{\partial x_1}, \frac{\partial a}{\partial x_2}, ..., \frac{\partial a}{\partial x_n}]^T = [0,0,...,0]^T xa=[x1a,x2a,...,xna]T=[0,0,...,0]T

等式二:
∂ ( x T ⋅ A ) ∂ x = ∂ ( A T ⋅ x ) ∂ x = A \frac{\partial(x^T \cdot A)}{\partial x} = \frac{\partial(A^T \cdot x)}{\partial x} = A x(xTA)=x(ATx)=A
证明:

A = [ a 1 , a 2 , . . . , a n ] T A = [a_1, a_2,...,a_n]^T A=[a1,a2,...,an]T,则有:
∂ ( x T ⋅ A ) ∂ x = ∂ ( A T ⋅ x ) ∂ x = ∂ ( a 1 ⋅ x 1 + a 2 ⋅ x 2 + . . . + a n ⋅ x n ) ∂ x = [ ∂ ( a 1 ⋅ x 1 + a 2 ⋅ x 2 + . . . + a n ⋅ x n ) ∂ x 1 ∂ ( a 1 ⋅ x 1 + a 2 ⋅ x 2 + . . . + a n ⋅ x n ) ∂ x 2 . . . ∂ ( a 1 ⋅ x 1 + a 2 ⋅ x 2 + . . . + a n ⋅ x n ) ∂ x n ] = [ a 1 a 2 . . . a n ] = A \begin{aligned} \frac{\partial(x^T \cdot A)}{\partial x} & = \frac{\partial(A^T \cdot x)}{\partial x}\\ & = \frac{\partial(a_1 \cdot x_1 + a_2 \cdot x_2 +...+ a_n \cdot x_n)}{\partial x}\\ & = \left [\begin{array}{cccc} \frac{\partial(a_1 \cdot x_1 + a_2 \cdot x_2 +...+ a_n \cdot x_n)}{\partial x_1} \\ \frac{\partial(a_1 \cdot x_1 + a_2 \cdot x_2 +...+ a_n \cdot x_n)}{\partial x_2} \\ . \\ . \\ . \\ \frac{\partial(a_1 \cdot x_1 + a_2 \cdot x_2 +...+ a_n \cdot x_n)}{\partial x_n} \\ \end{array}\right] \\ & =\left [\begin{array}{cccc} a_1 \\ a_2 \\ . \\ . \\ . \\ a_n \\ \end{array}\right] = A \end{aligned} x(xTA)=x(ATx)=x(a1x1+a2x2+...+anxn)=x1(a1x1+a2x2+...+anxn)x2(a1x1+a2x2+...+anxn)...xn(a1x1+a2x2+...+anxn)=a1a2...an=A

等式三:
∂ ( x T ⋅ x ) ∂ x = 2 x \frac{\partial (x^T \cdot x)}{\partial x} = 2x x(xTx)=2x
证明:
∂ ( x T ⋅ x ) ∂ x = ∂ ( x 1 2 + x 2 2 + . . . + x n 2 ) ∂ x = [ ∂ ( x 1 2 + x 2 2 + . . . + x n 2 ) ∂ x 1 ∂ ( x 1 2 + x 2 2 + . . . + x n 2 ) ∂ x 2 . . . ∂ ( x 1 2 + x 2 2 + . . . + x n 2 ) ∂ x n ] = [ 2 x 1 2 x 2 . . . 2 x n ] = 2 x \begin{aligned} \frac{\partial(x^T \cdot x)}{\partial x} & = \frac{\partial(x_1^2+x_2^2+...+x_n^2)}{\partial x}\\ & = \left [\begin{array}{cccc} \frac{\partial(x_1^2+x_2^2+...+x_n^2)}{\partial x_1} \\ \frac{\partial(x_1^2+x_2^2+...+x_n^2)}{\partial x_2} \\ . \\ . \\ . \\ \frac{\partial(x_1^2+x_2^2+...+x_n^2)}{\partial x_n} \\ \end{array}\right] \\ & =\left [\begin{array}{cccc} 2x_1 \\ 2x_2 \\ . \\ . \\ . \\ 2x_n \\ \end{array}\right] = 2x \end{aligned} x(xTx)=x(x12+x22+...+xn2)=x1(x12+x22+...+xn2)x2(x12+x22+...+xn2)...xn(x12+x22+...+xn2)=2x12x2...2xn=2x

此处 x T x x^Tx xTx也被称为向量的交叉乘积(crossprod)

等式四:
∂ ( x T A x ) x = A x + A T x \frac{\partial (x^T A x)}{x} = Ax + A^Tx x(xTAx)=Ax+ATx
证明:

首先:
X T A X = [ x 1 , x 2 , . . . , x n ] ⋅ [ a 11 a 12 . . . a 1 n a 21 a 22 . . . a 2 n . . . . . . . . . . . . a n 1 a n 2 . . . a n n ] ⋅ [ x 1 , x 2 , . . . , x n ] T = [ x 1 a 11 + x 2 a 21 + . . . + x n a n 1 , x 1 a 12 + x 2 a 22 + . . . + x n a n 2 , . . . , x 1 a 1 n + x 2 a 2 n + . . . + x n a n n ] ⋅ [ x 1 x 2 . . . x n ] = x 1 ( x 1 a 11 + x 2 a 21 + . . . + x n a n 1 ) + x 2 ( x 1 a 12 + x 2 a 22 + . . . + x n a n 2 ) + . . . + x n ( x 1 a 1 n + x 2 a 2 n + . . . + x n a n n ) \begin{aligned} X^TAX &= [x_1, x_2,...,x_n] \cdot \left [\begin{array}{cccc} a_{11} &a_{12} &... &a_{1n}\\ a_{21} &a_{22} &... &a_{2n}\\ ... &... &... &... \\ a_{n1} &a_{n2} &... &a_{nn}\\ \end{array}\right] \cdot [x_1, x_2,...,x_n]^T \\ &=[x_1a_{11}+x_2a_{21}+...+x_na_{n1}, x_1a_{12}+x_2a_{22}+...+x_na_{n2},...,x_1a_{1n}+x_2a_{2n}+...+x_na_{nn}] \cdot \left [\begin{array}{cccc} x_1 \\ x_2 \\ . \\ . \\ . \\ x_n \\ \end{array}\right] \\ &=x_1(x_1a_{11}+x_2a_{21}+...+x_na_{n1})+x_2(x_1a_{12}+x_2a_{22}+...+x_na_{n2})+...+x_n(x_1a_{1n}+x_2a_{2n}+...+x_na_{nn}) \end{aligned} XTAX=[x1,x2,...,xn]a11a21...an1a12a22...an2............a1na2n...ann[x1,x2,...,xn]T=[x1a11+x2a21+...+xnan1,x1a12+x2a22+...+xnan2,...,x1a1n+x2a2n+...+xnann]x1x2...xn=x1(x1a11+x2a21+...+xnan1)+x2(x1a12+x2a22+...+xnan2)+...+xn(x1a1n+x2a2n+...+xnann)
令:
k ( x ) = x 1 ( x 1 a 11 + x 2 a 21 + . . . + x n a n 1 ) + x 2 ( x 1 a 12 + x 2 a 22 + . . . + x n a n 2 ) + . . . + x n ( x 1 a 1 n + x 2 a 2 n + . . . + x n a n n ) k(x) = x_1(x_1a_{11}+x_2a_{21}+...+x_na_{n1})+x_2(x_1a_{12}+x_2a_{22}+...+x_na_{n2})+...+x_n(x_1a_{1n}+x_2a_{2n}+...+x_na_{nn}) k(x)=x1(x1a11+x2a21+...+xnan1)+x2(x1a12+x2a22+...+xnan2)+...+xn(x1a1n+x2a2n+...+xnann)
则:
∂ k ( x ) ∂ x 1 = ( x 1 a 11 + x 2 a 21 + . . . + x n a n 1 ) + ( x 1 a 11 + x 2 a 12 + . . . + x n a 1 n ) \frac{\partial k(x)}{\partial x_1} = (x_1a_{11}+x_2a_{21}+...+x_na_{n1})+ (x_1a_{11} + x_2a_{12}+...+x_na_{1n}) x1k(x)=(x1a11+x2a21+...+xnan1)+(x1a11+x2a12+...+xna1n)
所以:
在这里插入图片描述

2 最小二乘法矩阵形式推导过程

假设有一多元线性方程组:
f ( x ) = w 1 x 1 + w 2 x 2 + . . . + w d x d + b f(x) = w_1x_1+w_2x_2+...+w_dx_d+b f(x)=w1x1+w2x2+...+wdxd+b
w = [ w 1 , w 2 , . . . w d ] T w = [w_1,w_2,...w_d]^T w=[w1,w2,...wd]T x = [ x 1 , x 2 , . . . x d ] T x = [x_1,x_2,...x_d]^T x=[x1,x2,...xd]T,则上式可写为:
f ( x ) = w T x + b f(x) = w^Tx+b f(x)=wTx+b
但是上式还不够简洁,我们可以令:
w ^ = [ w 1 , w 2 , . . . , w d , b ] T x ^ = [ x 1 , x 2 , . . . , x d , 1 ] T \hat w = [w_1,w_2,...,w_d,b]^T\\ \hat x = [x_1,x_2,...,x_d,1]^T w^=[w1,w2,...,wd,b]Tx^=[x1,x2,...,xd,1]T
假设现在总共有 m m m条观测值( m > d m>d m>d), x ( i ) = [ x 1 ( i ) , x 2 ( i ) , . . . , x d ( i ) ] x^{(i)} = [x_1^{(i)}, x_2^{(i)},...,x_d^{(i)}] x(i)=[x1(i),x2(i),...,xd(i)],则带入 f ( x ) f(x) f(x)中可构成 m m m个方程:
在这里插入图片描述
再令:

所以方程组可写作:
X ^ ⋅ w ^ = y ^ \hat X \cdot \hat w = \hat y X^w^=y^
该线性模型也可写作:
f ( x ^ ) = w ^ T ⋅ x ^ f(\hat x) = \hat w^T \cdot \hat x f(x^)=w^Tx^
我们可建立使误差平方和 S S E SSE SSE最小的优化模型:
min ⁡ S ( w ^ ) = ∣ ∣ y − X w ^ ∣ ∣ 2 2 = ( y − X w ^ ) T ( y − X w ^ ) \min S(\hat w) = ||y - X\hat w||_2^2 = (y - X\hat w)^T(y - X\hat w) minS(w^)=yXw^22=(yXw^)T(yXw^)
上式中, ∣ ∣ y − X w ^ ∣ ∣ 2 ||y - X\hat w||_2 yXw^2为向量的2-范数的计算表达式。向量的2-范数计算过程为各分量求平方和再进行开平方。例如 a = [ 1 , − 1 , ] a=[1, -1,] a=[1,1,],则 ∣ ∣ a ∣ ∣ 2 = 1 2 + ( − 1 ) 2 = 2 ||a||_2= \sqrt{1^2+(-1)^2}=\sqrt{2} a2=12+(1)2 =2

我们只需要求得偏导数的零点,即可得到最优解,即最优的 w ^ \hat w w^值,即拟合的参数,即可得拟合的多元函数表达式

在此之前,需要补充两点矩阵转置的运算规则:
( A − B ) T = A T − B T ( A B ) T = B T A T (A-B)^T=A^T-B^T\\ (AB)^T=B^TA^T (AB)T=ATBT(AB)T=BTAT
S ( w ^ ) S(\hat w) S(w^)求导并令其为0即可:
S ( w ^ ) ∂ w ^ = ∂ ∣ ∣ y − X w ^ ∣ ∣ 2 2 ∂ w ^ = ∂ ( y − X w ^ ) T ( y − X w ^ ) ∂ w ^ = ∂ ( y T − w ^ T X T ) ( y − X w ^ ) ∂ w ^ = ∂ ( y T y − w ^ T X T y − y T X w ^ + w ^ T X T X w ^ ) ∂ w ^ = 0 − X T y − X T y + X T X w ^ + ( X T X ) T w ^ = 0 − X T y − X T y + 2 X T X w ^ = 2 ( X T X w ^ − X T y ) = 0 \begin{aligned} \frac{S(\hat w)}{\partial{\boldsymbol{\hat w}}} &= \frac{\partial{||\boldsymbol{y} - \boldsymbol{X\hat w}||_2}^2}{\partial{\boldsymbol{\hat w}}} \\ &= \frac{\partial(\boldsymbol{y} - \boldsymbol{X\hat w})^T(\boldsymbol{y} - \boldsymbol{X\hat w})}{\partial{\boldsymbol{\hat w}}} \\ & =\frac{\partial(\boldsymbol{y}^T - \boldsymbol{\hat w^T X^T})(\boldsymbol{y} - \boldsymbol{X\hat w})}{\partial{\boldsymbol{\hat w}}}\\ &=\frac{\partial(\boldsymbol{y}^T\boldsymbol{y} - \boldsymbol{\hat w^T X^Ty}-\boldsymbol{y}^T\boldsymbol{X \hat w} +\boldsymbol{\hat w^TX^T}\boldsymbol{X\hat w})}{\partial{\boldsymbol{\hat w}}}\\ & = 0 - \boldsymbol{X^Ty} - \boldsymbol{X^Ty}+X^TX\hat w+(X^TX)^T\hat w \\ &= 0 - \boldsymbol{X^Ty} - \boldsymbol{X^Ty} + 2\boldsymbol{X^TX\hat w}\\ &= 2(\boldsymbol{X^TX\hat w} - \boldsymbol{X^Ty}) = 0 \end{aligned} w^S(w^)=w^yXw^22=w^(yXw^)T(yXw^)=w^(yTw^TXT)(yXw^)=w^(yTyw^TXTyyTXw^+w^TXTXw^)=0XTyXTy+XTXw^+(XTX)Tw^=0XTyXTy+2XTXw^=2(XTXw^XTy)=0
即:
X T X w ^ = X T y X^TX\hat w = X^Ty XTXw^=XTy
X T X X^TX XTX存在逆矩阵,则:
w ^ = ( X T X ) − 1 X T y \hat w = (X^TX)^{-1}X^Ty w^=(XTX)1XTy
这样我们就得到了拟合的 w ^ \hat w w^,至此最小二乘法的推导结束!

3 代码验证

假如有这么一组数据:

x x x y y y
12
34

我们要利用最小二乘法得到它的一次线性拟合函数,过程如下:

我们可以知道:
X = [ 1 1 3 1 ] y = [ 2 4 ] X = \left [\begin{array}{cccc} 1 &1 \\ 3 &1 \\ \end{array}\right]\\ y = \left [\begin{array}{cccc} 2 \\ 4 \\ \end{array}\right] \\ X=[1311]y=[24]
需要拟合的参数为:
w ^ = [ w , b ] T \hat w = [w,b]^T w^=[w,b]T
则:

即拟合出来的函数表达式为:
y = x + 1 y=x+1 y=x+1
Python代码实现:

import numpy as np # 导入numpy库用于相关计算
X = np.array([[1, 1], [3, 1]]) # 矩阵X
y = np.array([2, 4]).reshape(2, 1) # 观察值
result=np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y) # 相关矩阵运算
# 得到的结果中,最后一个值为b,其余从上到下分别为x1的系数,x2的系数......
print("拟合的参数为:",result)

因为CSDN的Markdown编辑器无法正常编译一些公式,所以用了图片,原md文件的网址:https://gitee.com/image111111/image1/raw/master/%E6%9C%80%E5%B0%8F%E4%BA%8C%E4%B9%98%E6%B3%95%E7%9A%84%E7%9F%A9%E9%98%B5%E8%A1%A8%E8%BE%BE.md

  • 18
    点赞
  • 47
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值