函数的最佳逼近问题:最小二乘法

本文深入探讨了最小二乘法的基本原理,涵盖最佳逼近问题、最佳平方逼近、最小二乘学习,以及如何应用于线性回归和周期函数逼近,详细解析了最小二乘法的数学推导和实际应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1. 最佳逼近问题

\qquad 简单来说,最佳逼近问题就是用一些 ()函数 φ i ( x ) , i ∈ { 0 , 1 , ⋯   , M } \varphi_{i}(\boldsymbol x),i\in \{0,1,\cdots,M\} φi(x),i{0,1,,M}线性组合来逼近某个函数 f ( x ) f(\boldsymbol x) f(x),也就是定义

\qquad\qquad φ ( x ) = ∑ n = 0 M a n φ n ( x ) = a 0 φ 0 ( x ) + a 1 φ 1 ( x ) + ⋯ + a M φ M ( x ) \varphi(\boldsymbol x)=\sum\limits_{n=0}^{M}a_{n}\varphi_{n}(\boldsymbol x)=a_{0}\varphi_{0}(\boldsymbol x)+a_{1}\varphi_{1}(\boldsymbol x)+\cdots+a_{M}\varphi_{M}(\boldsymbol x) φ(x)=n=0Manφn(x)=a0φ0(x)+a1φ1(x)++aMφM(x)

\qquad 使得 f ( x ) f(\boldsymbol x) f(x) φ ( x ) \varphi(\boldsymbol x) φ(x)某种(度量)意义下最小,常见的度量包括 ℓ 1 \ell_{1} 1 范数, ℓ 2 \ell_{2} 2 范数(最佳平方逼近), ℓ ∞ \ell_{\infty} 范数(最佳一致逼近)。

\qquad 例如,多项式曲线拟合的基函数可以定义为 φ n ( x ) = x n \varphi_{n}(x)=x^{n} φn(x)=xn,就有

\qquad\qquad φ ( x ) = ∑ n = 0 M a n φ n ( x ) = a 0 + a 1 x + a 2 x 2 + ⋯ + a M x M \varphi(x)=\sum\limits_{n=0}^{M}a_{n}\varphi_{n}(x)=a_{0}+a_{1}x+a_{2}x^{2}+\cdots+a_{M}x^{M} φ(x)=n=0Manφn(x)=a0+a1x+a2x2++aMxM

\qquad f ( x ) f(x) f(x) 具有周期性质,可以采用三角多项式来逼近,就有

\qquad\qquad φ ( x ) = a 0 + a 1 cos ⁡ x + b 1 sin ⁡ x + ⋯ + a M cos ⁡ ( M x ) + b M sin ⁡ ( M x ) \varphi(x)=a_{0}+a_{1}\cos x+b_{1}\sin x+\cdots+a_{M}\cos (Mx)+b_{M}\sin(Mx) φ(x)=a0+a1cosx+b1sinx++aMcos(Mx)+bMsin(Mx)

\qquad 若基函数具有正交性,则会大大简化最佳逼近问题。
\qquad

2. 最佳平方(最小二乘)逼近

\qquad 最佳平方逼近采用 ℓ 2 \ell_{2} 2 范数来度量 f ( x ) f(\boldsymbol x) f(x) φ ( x ) \varphi(\boldsymbol x) φ(x) 之间接近程度。

\qquad 函数 f ( x ) f(\boldsymbol x) f(x) 的最佳逼近 φ ∗ ( x ) \varphi^{*}(\boldsymbol x) φ(x) 满足:

\qquad\qquad ∥ f ( x ) − φ ∗ ( x ) ∥ 2 2 = min ⁡ ∥ f ( x ) − φ ( x ) ∥ 2 2 \parallel f(\boldsymbol x)-\varphi^{*}(\boldsymbol x)\parallel_{2}^{2}=\min\parallel f(\boldsymbol x)-\varphi(\boldsymbol x)\parallel_{2}^{2} f(x)φ(x)22=minf(x)φ(x)22

\qquad 因此,可定义误差函数 E ( a 0 , ⋯   , a M ) E(a_{0},\cdots,a_{M}) E(a0,,aM) 为:

\qquad\qquad E ( a 0 , ⋯   , a M ) = ∥ f ( x ) − φ ( x ) ∥ 2 2 = ∥ f ( x ) − ∑ n = 0 M a n φ n ( x ) ∥ 2 2 = ∫ [ f ( x ) − ∑ n = 0 M a n φ n ( x ) ] 2 d x \begin{aligned} E(a_{0},\cdots,a_{M})&=\parallel f(\boldsymbol x)-\varphi(\boldsymbol x)\parallel_{2}^{2}\\ &=\parallel f(\boldsymbol x)-\sum\limits_{n=0}^{M}a_{n}\varphi_{n}(\boldsymbol x)\parallel_{2}^{2}\\ &=\int [ f(\boldsymbol x)-\sum\limits_{n=0}^{M}a_{n}\varphi_{n}(\boldsymbol x) ]^2dx \end{aligned} E(a0,,aM)=∥f(x)φ(x)22=∥f(x)n=0Manφn(x)22=[f(x)n=0Manφn(x)]2dx

\qquad 此时,误差函数 E ( a 0 , ⋯   , a M ) E(a_{0},\cdots,a_{M}) E(a0,,aM) 为系数 ( a 0 , ⋯   , a M ) (a_{0},\cdots,a_{M}) (a0,,aM) 的二次函数,其取极值的必要条件为:

\qquad\qquad ∂ E ( a 0 , ⋯   , a M ) ∂ a k = 0 ( k = 0 , 1 , ⋯   , M ) \begin{aligned} \dfrac{\partial E(a_{0},\cdots,a_{M})}{\partial a_{k}}&=0\qquad(k=0,1,\cdots,M)\\ \end{aligned} akE(a0,,aM)=0(k=0,1,,M)

\qquad 因此

\qquad\qquad ∂ E ( a 0 , ⋯   , a M ) ∂ a k = − 2 ∫ [ f ( x ) − ∑ n = 0 M a n φ n ( x ) ] φ k ( x ) d x = 0 ( k = 0 , 1 , ⋯   , M ) \begin{aligned} \dfrac{\partial E(a_{0},\cdots,a_{M})}{\partial a_{k}}&=-2\int [ f(\boldsymbol x)-\sum\limits_{n=0}^{M}a_{n}\varphi_{n}(\boldsymbol x) ]\varphi_{k}(\boldsymbol x)dx \\ &=0\qquad(k=0,1,\cdots,M)\\ \end{aligned} akE(a0,,aM)=2[f(x)n=0Manφn(x)]φk(x)dx=0(k=0,1,,M)

\qquad 于是

\qquad\qquad ∫ f ( x ) φ k ( x ) d x = ∫ ∑ n = 0 M a n φ n ( x ) φ k ( x ) d x = ∑ n = 0 M a n ∫ φ n ( x ) φ k ( x ) d x \begin{aligned} \int f(\boldsymbol x)\varphi_{k}(\boldsymbol x)dx&=\int\sum\limits_{n=0}^{M}a_{n}\varphi_{n}(\boldsymbol x) \varphi_{k}(\boldsymbol x)dx \\ &=\sum\limits_{n=0}^{M}a_{n}\int\varphi_{n}(\boldsymbol x) \varphi_{k}(\boldsymbol x)dx \\ \end{aligned} f(x)φk(x)dx=n=0Manφn(x)φk(x)dx=n=0Manφn(x)φk(x)dx

\qquad 写成函数内积的形式:

\qquad\qquad ( f , φ k ) = ∑ n = 0 M a n ( φ n , φ k ) (f,\varphi_{k})=\displaystyle\sum_{n=0}^{M}a_{n}(\varphi_{n},\varphi_{k}) (f,φk)=n=0Man(φn,φk)

\qquad 实际上是一个关于 a 0 , ⋯   , a M a_{0},\cdots,a_{M} a0,,aM线性方程组,称为法方程 (normal equation) \text{(normal\ equation)} (normal equation)
\qquad

\qquad\qquad [ ( φ 0 , φ 0 ) ( φ 1 , φ 0 ) ⋯ ( φ M , φ 0 ) ( φ 0 , φ 1 ) ( φ 1 , φ 1 ) ⋯ ( φ M , φ 1 ) ⋮ ⋮ ⋮ ⋮ ( φ 0 , φ M ) ( φ 1 , φ M ) ⋯ ( φ M , φ M ) ] [ a 0 a 1 ⋮ a M ] = [ ( f , φ 0 ) ( f , φ 1 ) ⋮ ( f , φ M ) ] ( 1 ) \left[\begin{matrix}(\varphi_{0},\varphi_{0})&(\varphi_{1},\varphi_{0})&\cdots&(\varphi_{M},\varphi_{0})\\ (\varphi_{0},\varphi_{1})&(\varphi_{1},\varphi_{1})&\cdots&(\varphi_{M},\varphi_{1})\\ \vdots&\vdots&\vdots&\vdots\\ (\varphi_{0},\varphi_{M})&(\varphi_{1},\varphi_{M})&\cdots&(\varphi_{M},\varphi_{M})\\ \end{matrix}\right] \left[\begin{matrix}a_{0}\\ a_{1}\\ \vdots\\ a_{M}\\ \end{matrix}\right]=\left[\begin{matrix}(f,\varphi_{0})\\ (f,\varphi_{1})\\ \vdots\\ (f,\varphi_{M})\\ \end{matrix}\right]\qquad(1) (φ0,φ0)(φ0,φ1)(φ0,φM)(φ1,φ0)(φ1,φ1)(φ1,φM)(φM,φ0)(φM,φ1)(φM,φM) a0a1aM = (f,φ0)(f,φ1)(f,φM) (1)

\qquad
\qquad 求解该线性方程组,就可以得到系数 ( a 0 , ⋯   , a M ) (a_{0},\cdots,a_{M}) (a0,,aM) 的值。

\qquad 对于连续函数的内积,通常有:

\qquad\qquad ( f , φ k ) = ∫ a b f ( x ) φ k ( x ) d x (f,\varphi_{k})=\int_a^bf(x)\varphi_{k}(x)dx (f,φk)=abf(x)φk(x)dx

\qquad 以及

\qquad\qquad ( φ n , φ k ) = ∫ a b φ n ( x ) φ k ( x ) d x (\varphi_{n},\varphi_{k})=\int_a^b\varphi_{n}(x)\varphi_{k}(x)dx (φn,φk)=abφn(x)φk(x)dx

\qquad

离散情况——以线性回归为例

\qquad 离散情况时,以一维线性回归为例,已知观测样本集 { x i , y i } ∣ 0 N \{x_{i},y_{i}\}\big|_{0}^{N} {xi,yi} 0N,要求出函数 f ( x ) f(x) f(x) 的逼近函数:

φ ( x ) = ∑ n = 0 1 a n φ n ( x ) = a 0 + a 1 x \qquad\qquad\varphi(x)=\sum\limits_{n=0}^{1}a_{n}\varphi_{n}(x)=a_{0}+a_{1}x φ(x)=n=01anφn(x)=a0+a1x
\qquad 在这里插入图片描述
\qquad 上图中,线性回归关于每个观测点 ( x i , y i ) (x_{i},y_{i}) (xi,yi) ℓ 2 \ell_{2} 2 损失平方误差)为: [ φ ( x i ) − y i ] 2 [\varphi(x_i)-y_i]^2 [φ(xi)yi]2

\qquad 误差函数定义为“误差的平方之和”:

\qquad\qquad E ( a 0 , a 1 ) = [ φ ( x 0 ) − y 0 ] 2 + [ φ ( x 1 ) − y 1 ] 2 + ⋯ + [ φ ( x N ) − y N ] 2 = ∑ i = 0 N [ φ ( x i ) − y i ] 2 = ∑ i = 0 N ( y i − a 0 − a 1 x i ) 2 \begin{aligned}E(a_{0},a_{1})&=[\varphi(x_0)-y_0]^2+[\varphi(x_1)-y_1]^2+\cdots+[\varphi(x_N)-y_N]^2\\ &=\displaystyle\sum_{i=0}^{N}[\varphi(x_i)-y_i]^2\\ &=\displaystyle\sum_{i=0}^{N}(y_i-a_{0}-a_{1}x_i)^2 \end{aligned} E(a0,a1)=[φ(x0)y0]2+[φ(x1)y1]2++[φ(xN)yN]2=i=0N[φ(xi)yi]2=i=0N(yia0a1xi)2

\qquad 通过求出系数 ( a 0 , a 1 ) (a_{0},a_{1}) (a0,a1) 的值,用 φ ( x ) ≈ f ( x ) \varphi(x)\approx f(x) φ(x)f(x)

  • 误差函数 a 0 a_{0} a0 的偏导:

∂ E ( a 0 , a 1 ) ∂ a 0 = 2 ∑ i = 0 N ( y i − a 0 − a 1 x i ) ( − 1 ) = 0 \qquad\qquad\dfrac{\partial E(a_0,a_1)}{\partial a_0} =2\displaystyle\sum_{i=0}^{N}(y_i-a_{0}-a_{1}x_i)(-1)=0 a0E(a0,a1)=2i=0N(yia0a1xi)(1)=0

\qquad\qquad 可以写成: a 0 ∑ i = 0 N 1 + a 1 ∑ i = 0 N x i = ∑ i = 0 N y i \qquad a_0\sum\limits_{i=0}^{N}1+a_1\sum\limits_{i=0}^{N}x_i=\sum\limits_{i=0}^{N}y_i a0i=0N1+a1i=0Nxi=i=0Nyi

⟹ \qquad\qquad\qquad\qquad\quad\Longrightarrow   a 0 = 1 N ∑ i = 0 N ( y i − a 1 x i ) \ a_0=\dfrac{1}{N}\displaystyle\sum_{i=0}^{N}(y_i-a_1x_i)  a0=N1i=0N(yia1xi)

  • 误差函数 a 1 a_{1} a1 的偏导:

∂ E ( a 0 , a 1 ) ∂ a 1 = 2 ∑ i = 0 N ( y i − a 0 − a 1 x i ) ( − x i ) = 0 \qquad\qquad\dfrac{\partial E(a_0,a_1)}{\partial a_1} =2\displaystyle\sum_{i=0}^{N}(y_i-a_{0}-a_{1}x_i)(-x_i)=0 a1E(a0,a1)=2i=0N(yia0a1xi)(xi)=0

\qquad\qquad 可以写成: a 0 ∑ i = 0 N x i + a 1 ∑ i = 0 N x i 2 = ∑ i = 0 N y i x i \qquad a_0\sum\limits_{i=0}^{N}x_i+a_1\sum\limits_{i=0}^{N}x_i^2=\sum\limits_{i=0}^{N}y_ix_i a0i=0Nxi+a1i=0Nxi2=i=0Nyixi

⟹ \qquad\qquad\qquad\qquad\quad\Longrightarrow   a 1 = ∑ i = 0 N ( y i − a 0 ) x i ∑ i = 0 N x i 2 \ a_1=\dfrac{\sum\limits_{i=0}^{N}(y_i-a_0)x_i}{\sum\limits_{i=0}^{N}x_i^2}  a1=i=0Nxi2i=0N(yia0)xi

\qquad\qquad 代入 a 0 a_0 a0,可得到:   a 1 = ∑ i = 0 N y i ( x i − 1 N ∑ i = 0 N x i ) ∑ i = 0 N x i 2 − 1 N ( ∑ i = 0 N x i ) 2 \ a_1=\dfrac{\sum\limits_{i=0}^{N}y_i\left(x_i-\frac{1}{N}\sum\limits_{i=0}^{N}x_i\right)}{\sum\limits_{i=0}^{N}x_i^2-\frac{1}{N}\left(\sum\limits_{i=0}^{N}x_i\right)^2}  a1=i=0Nxi2N1(i=0Nxi)2i=0Nyi(xiN1i=0Nxi)

\qquad\qquad 若记样本的均值为 x ˉ = 1 N ∑ i = 0 N x i \bar x=\dfrac{1}{N}\sum\limits_{i=0}^{N}x_i xˉ=N1i=0Nxi,则   a 1 = ∑ i = 0 N y i ( x i − x ˉ ) ∑ i = 0 N x i 2 − N x ˉ 2 \ a_1=\dfrac{\sum\limits_{i=0}^{N}y_i\left(x_i-\bar x\right)}{\sum\limits_{i=0}^{N}x_i^2-N\bar x^2}  a1=i=0Nxi2Nxˉ2i=0Nyi(xixˉ)

\qquad 这种求解,实际上就是求4.1节中的解线性方程组 ( 1 ) (1) (1) 的过程。
\qquad

3. 最小二乘学习(离散情况的另一种描述)

\qquad 如下图所示,已知观测样本集 { x i , y i } ∣ 0 N \{\boldsymbol x_{i},y_{i}\}\big|_{0}^{N} {xi,yi} 0N,仍然采用线性模型:

\qquad\qquad φ ( x ) = ∑ n = 0 M a n φ n ( x ) = a 0 φ 0 ( x ) + a 1 φ 1 ( x ) + ⋯ + a M φ M ( x ) = θ T ϕ ( x ) = ϕ T ( x ) θ \begin{aligned}\varphi(\boldsymbol x)&=\sum\limits_{n=0}^{M}a_{n}\varphi_{n}(\boldsymbol x)\\ &=a_{0}\varphi_{0}(\boldsymbol x)+a_{1}\varphi_{1}(\boldsymbol x)+\cdots+a_{M}\varphi_{M}(\boldsymbol x)\\ &=\boldsymbol \theta^T\boldsymbol\phi(\boldsymbol x)=\boldsymbol\phi^T(\boldsymbol x)\boldsymbol \theta \end{aligned} φ(x)=n=0Manφn(x)=a0φ0(x)+a1φ1(x)++aMφM(x)=θTϕ(x)=ϕT(x)θ

\qquad 其中, θ = [ a 0 , a 1 , ⋯   , a M ] T \boldsymbol\theta=[a_0,a_1,\cdots,a_M]^T θ=[a0,a1,,aM]T
\qquad     ϕ ( x ) = [ φ 0 ( x ) , φ 1 ( x ) , ⋯   , φ M ( x ) ] T \boldsymbol\phi(\boldsymbol x)=[\varphi_0(\boldsymbol x),\varphi_1(\boldsymbol x),\cdots,\varphi_M(\boldsymbol x)]^T ϕ(x)=[φ0(x),φ1(x),,φM(x)]T

\qquad 在这里插入图片描述
\qquad 上图中,线性模型关于每个观测点 ( x i , y i ) (\boldsymbol x_{i},y_{i}) (xi,yi) ℓ 2 \ell_{2} 2 损失平方误差)为: [ φ ( x i ) − y i ] 2 [\varphi(\boldsymbol x_i)-y_i]^2 [φ(xi)yi]2

\qquad 将“数据集所有观测点上的平方误差之和”设为损失函数 (loss function) \text{(loss\ function)} (loss function)

\qquad\qquad J ( θ ) = [ φ ( x 0 ) − y 0 ] 2 + [ φ ( x 1 ) − y 1 ] 2 + ⋯ + [ φ ( x N ) − y N ] 2 = ∑ i = 0 N [ φ ( x i ) − y i ] 2 = ∑ i = 0 N [ θ T ϕ ( x i ) − y i ] 2 = ∥ Φ θ − y ∥ 2 2 \begin{aligned}J(\boldsymbol\theta)&=[\varphi(\boldsymbol x_0)-y_0]^2+[\varphi(\boldsymbol x_1)-y_1]^2+\cdots+[\varphi(\boldsymbol x_N)-y_N]^2\\ &=\displaystyle\sum_{i=0}^{N}[\varphi(\boldsymbol x_i)-y_i]^2\\ &=\displaystyle\sum_{i=0}^{N}[\boldsymbol \theta^T\boldsymbol\phi(\boldsymbol x_i)-y_i]^2\\ &=\parallel\Phi\boldsymbol\theta-\boldsymbol y \parallel_2^2\\ \end{aligned} J(θ)=[φ(x0)y0]2+[φ(x1)y1]2++[φ(xN)yN]2=i=0N[φ(xi)yi]2=i=0N[θTϕ(xi)yi]2=∥Φθy22

\qquad 其中, y = [ y 0 , y 1 , ⋯   , y N ] T \boldsymbol y=[y_0,y_1,\cdots,y_N]^T y=[y0,y1,,yN]T

\qquad     Φ = [ ϕ ( x 0 ) T ϕ ( x 1 ) T ⋮ ϕ ( x N ) T ] = [ φ 0 ( x 0 ) φ 1 ( x 0 ) ⋯ φ M ( x 0 ) φ 0 ( x 1 ) φ 1 ( x 1 ) ⋯ φ M ( x 1 ) ⋮ ⋮   ⋮ φ 0 ( x N ) φ 1 ( x N ) ⋯ φ M ( x N ) ] \Phi=\left[\begin{matrix}\boldsymbol\phi(\boldsymbol x_0)^T\\ \boldsymbol\phi(\boldsymbol x_1)^T\\ \vdots\\ \boldsymbol\phi(\boldsymbol x_N)^T \end{matrix}\right]=\left[\begin{matrix}\varphi_0(\boldsymbol x_0)&\varphi_1(\boldsymbol x_0)&\cdots&\varphi_M(\boldsymbol x_0)\\ \varphi_0(\boldsymbol x_1)&\varphi_1(\boldsymbol x_1)&\cdots&\varphi_M(\boldsymbol x_1)\\ \vdots&\vdots&\ &\vdots \\ \varphi_0(\boldsymbol x_N)&\varphi_1(\boldsymbol x_N)&\cdots&\varphi_M(\boldsymbol x_N)\end{matrix}\right] Φ= ϕ(x0)Tϕ(x1)Tϕ(xN)T = φ0(x0)φ0(x1)φ0(xN)φ1(x0)φ1(x1)φ1(xN) φM(x0)φM(x1)φM(xN)

\qquad
\qquad\qquad 这里的 Φ \Phi Φ 称为设计矩阵 (design matrix) \text{(design\ matrix)} (design matrix),其元素为 Φ n m = φ m ( x n ) \Phi_{nm}=\varphi_m(\boldsymbol x_n) Φnm=φm(xn)
\qquad
\qquad 损失函数 J ( θ ) J(\boldsymbol\theta) J(θ) 对系数 θ \boldsymbol\theta θ 求偏导:

\qquad\qquad ∂ J ( θ ) ∂ θ = 2 Φ T ( Φ θ − y ) = 0 \begin{aligned}\dfrac{\partial J(\boldsymbol\theta)}{\partial \boldsymbol\theta}&=2\Phi^T(\Phi\boldsymbol\theta-\boldsymbol y)=0 \end{aligned} θJ(θ)=2ΦT(Φθy)=0

\qquad 可得到:

\qquad\qquad Φ T Φ θ = Φ T y         ( 2 ) \Phi^T\Phi\boldsymbol\theta=\Phi^T\boldsymbol y\qquad\qquad\ \ \ \ \ \ \ (2) ΦTΦθ=ΦTy       (2)

\qquad 显然,线性方程组(2)和(1)是等价的,可求得:

\qquad\qquad θ = ( Φ T Φ ) − 1 Φ T y ( 3 ) \boldsymbol\theta=(\Phi^T\Phi)^{-1}\Phi^T\boldsymbol y\qquad\qquad(3) θ=(ΦTΦ)1ΦTy(3)

线性方程组(1)和(2)的左端系数矩阵 Φ T Φ \Phi^T\Phi ΦTΦ 存在是否可逆的问题,最小二乘解(3)是线性方程组(2)的解析解
Φ T Φ \Phi^T\Phi ΦTΦ 不可逆或者观测数据集非常庞大的时候,求解析解的方式难以实现,需要采用梯度下降法之类的最优化算法来求取近似解

\qquad

最小二乘解的几何意义

\qquad 最小二乘解的几何意义如下图描述:
在这里插入图片描述

From 《PRML》Fig 3.2

\qquad 考虑线性方程组 Φ θ − y = 0 \Phi\boldsymbol\theta-\boldsymbol y=0 Φθy=0,可表示为:

\qquad\qquad [ φ 0 , φ 1 , φ 2 , ⋯   , φ M ] [ a 0 a 1 a 2 ⋮ a M ] = y [\boldsymbol\varphi_0,\boldsymbol\varphi_1,\boldsymbol\varphi_2,\cdots,\boldsymbol\varphi_M]\left[\begin{matrix}a_{0}\\ a_{1}\\ a_{2}\\ \vdots\\ a_{M}\\ \end{matrix}\right]=\boldsymbol y [φ0,φ1,φ2,,φM] a0a1a2aM =y

\qquad 其中, φ i = [ φ i ( x 0 ) , φ i ( x 1 ) , φ i ( x 2 ) , ⋯   , φ i ( x N ) ] T \boldsymbol\varphi_i=[\varphi_i(\boldsymbol x_0),\varphi_i(\boldsymbol x_1),\varphi_i(\boldsymbol x_2),\cdots,\varphi_i(\boldsymbol x_N)]^T φi=[φi(x0),φi(x1),φi(x2),,φi(xN)]T
\qquad     y = [ y 0 , y 1 , y 2 , ⋯   , y N ] T \boldsymbol y=[y_0,y_1,y_2,\cdots,y_N]^T y=[y0,y1,y2,,yN]T

\qquad 图中的 S = C ( Φ ) \mathcal S=C(\Phi) S=C(Φ),表示矩阵 Φ = [ φ 0 , φ 1 , φ 2 , ⋯   , φ M ] \Phi=[\boldsymbol\varphi_0,\boldsymbol\varphi_1,\boldsymbol\varphi_2,\cdots,\boldsymbol\varphi_M] Φ=[φ0,φ1,φ2,,φM]列空间 (column space) \text{(column\ space)} (column space)

  • 如果 y ∈ S \boldsymbol y \in \mathcal S yS,线性方程组 Φ θ = y \Phi\boldsymbol\theta=\boldsymbol y Φθ=y 有唯一解
  • 如果 y ∉ S \boldsymbol y\notin \mathcal S y/S,线性方程组 Φ θ = y \Phi\boldsymbol\theta=\boldsymbol y Φθ=y 无解,只能到 S \mathcal S S 中找一个最接近 y \boldsymbol y y 的解,最小二乘解是指图中的 y ^ \hat{\boldsymbol y} y^(在 ℓ 2 \ell_{2} 2 范数下 ∥ y − y ^ ∥ 2 2 \parallel \boldsymbol y-\hat{\boldsymbol y} \parallel_{2}^{2} yy^22 的值最小, y ^ ∈ S \hat{\boldsymbol y}\in\mathcal S y^S

4. 最小二乘法实现曲线拟合

4.1 线性回归(解方程组1)

\qquad 以比较简单的曲线拟合问题为例,如果我们对于函数 f ( x ) f(x) f(x) 的了解只有一个观测样本集 { ( x i , y i ) } ∣ 0 N \{(x_{i},y_{i})\}\big|_{0}^{N} {(xi,yi)} 0N,如下图中绿色的 ‘+’ 所标记的这些数据点所示。

在这里插入图片描述

图1线性函数

\qquad 曲线拟合的目标是:基于这些观测数据 { x i , y i } ∣ 0 N \{x_{i},y_{i}\}\big|_{0}^{N} {xi,yi} 0N,用最佳平方逼近的方式来估计真实函数 f ( x ) f(x) f(x) 的表达式,观测值满足 y i = f ( x i ) + ε i y_i=f(x_i)+\varepsilon_i yi=f(xi)+εi

\qquad 令基函数为 φ n ( x ) = x n \varphi_{n}(x)=x^{n} φn(x)=xn,则多项式函数逼近为:

\qquad\qquad φ ( x ) = ∑ n = 0 M a n φ n ( x ) = a 0 + a 1 x + a 2 x 2 + ⋯ + a M x M \varphi(x)=\sum\limits_{n=0}^{M}a_{n}\varphi_{n}(x)=a_{0}+a_{1}x+a_{2}x^{2}+\cdots+a_{M}x^{M} φ(x)=n=0Manφn(x)=a0+a1x+a2x2++aMxM

\qquad 通过求出系数 ( a 0 , ⋯   , a M ) (a_{0},\cdots,a_{M}) (a0,,aM) 的值,用 φ ( x ) ≈ f ( x ) \varphi(x)\approx f(x) φ(x)f(x),实际上就是计算: ( f , φ k ) = ∑ n = 0 M a n ( φ n , φ k ) (f,\varphi_{k})=\displaystyle\sum_{n=0}^{M}a_{n}(\varphi_{n},\varphi_{k}) (f,φk)=n=0Man(φn,φk)

\qquad 曲线拟合问题描述的是离散的情况。考虑图1中的线性拟合问题,只需要计算两个系数 a 0 a_0 a0 a 1 a_1 a1,近似函数 φ ( x ) = a 0 φ 0 ( x ) + a 1 φ 1 ( x ) = a 0 + a 1 x \varphi(x)=a_{0}\varphi_{0}(x)+a_{1}\varphi_{1}(x)=a_{0}+a_{1}x φ(x)=a0φ0(x)+a1φ1(x)=a0+a1x

\qquad 也就是计算线性方程组的未知数 a 0 a_0 a0 a 1 a_1 a1

\qquad\qquad [ ( φ 0 , φ 0 ) ( φ 1 , φ 0 ) ( φ 0 , φ 1 ) ( φ 1 , φ 1 ) ] [ a 0 a 1 ] = [ ( f , φ 0 ) ( f , φ 1 ) ] \left[\begin{matrix}(\varphi_{0},\varphi_{0})&(\varphi_{1},\varphi_{0})\\ (\varphi_{0},\varphi_{1})&(\varphi_{1},\varphi_{1})\\ \end{matrix}\right] \left[\begin{matrix}a_{0}\\ a_{1}\\ \end{matrix}\right]=\left[\begin{matrix}(f,\varphi_{0})\\ (f,\varphi_{1})\\ \end{matrix}\right] [(φ0,φ0)(φ0,φ1)(φ1,φ0)(φ1,φ1)][a0a1]=[(f,φ0)(f,φ1)]

\qquad 其中, φ 0 ( x ) = 1 \varphi_{0}(x)=1 φ0(x)=1 φ 1 ( x ) = x \varphi_{1}(x)=x φ1(x)=x

\qquad 对于观测样本集 { ( x i , y i ) } ∣ 0 N \{(x_{i},y_{i})\}\big|_{0}^{N} {(xi,yi)} 0N,离散形式的内积为:

\qquad\qquad ( f , φ k ) = ∑ i = 0 N y i φ k ( x i ) = { ∑ i = 0 N y i ( k = 0 ) ∑ i = 0 N y i x i ( k = 1 ) (f,\varphi_{k})=\displaystyle\sum_{i=0}^{N}y_i\varphi_{k}(x_i)=\begin{cases}\displaystyle\sum_{i=0}^{N}y_i &(k=0)\\ \displaystyle\sum_{i=0}^{N}y_ix_i &(k=1)\end{cases} (f,φk)=i=0Nyiφk(xi)= i=0Nyii=0Nyixi(k=0)(k=1)

\qquad\qquad ( φ n , φ k ) = ∑ i = 0 N φ n ( x i ) φ k ( x i ) = { ∑ i = 0 N 1 ( n = 0 , k = 0 ) ∑ i = 0 N x i ( n = 0 , k = 1 ) ∑ i = 0 N x i ( n = 1 , k = 0 ) ∑ i = 0 N x i 2 ( n = 1 , k = 1 ) (\varphi_{n},\varphi_{k})=\displaystyle\sum_{i=0}^{N}\varphi_{n}(x_i)\varphi_{k}(x_i)=\left\{\begin{matrix}\displaystyle\sum_{i=0}^{N}1 &(n=0,k=0)\\ \displaystyle\sum_{i=0}^{N}x_i &(n=0,k=1)\\ \displaystyle\sum_{i=0}^{N}x_i &(n=1,k=0)\\ \displaystyle\sum_{i=0}^{N}x_i^2 &(n=1,k=1)\end{matrix}\right. (φn,φk)=i=0Nφn(xi)φk(xi)= i=0N1i=0Nxii=0Nxii=0Nxi2(n=0,k=0)(n=0,k=1)(n=1,k=0)(n=1,k=1)

\qquad 因此,线性方程组为(和第2节中的过程一样):

\qquad\qquad [ ∑ i = 0 N 1 ∑ i = 0 N x i ∑ i = 0 N x i ∑ i = 0 N x i 2 ] [ a 0 a 1 ] = [ ∑ i = 0 N y i ∑ i = 0 N y i x i ] \left[\begin{matrix}\sum\limits_{i=0}^{N}1&\sum\limits_{i=0}^{N}x_i\\ \sum\limits_{i=0}^{N}x_i&\sum\limits_{i=0}^{N}x_i^2\\ \end{matrix}\right] \left[\begin{matrix}a_{0}\\ a_{1}\\ \end{matrix}\right]=\left[\begin{matrix}\sum\limits_{i=0}^{N}y_i\\ \sum\limits_{i=0}^{N}y_ix_i\\ \end{matrix}\right] i=0N1i=0Nxii=0Nxii=0Nxi2 [a0a1]= i=0Nyii=0Nyixi

代码实现

import numpy as np
import matplotlib.pyplot as plt

def gen_lineardata(a,b,x):
    y = a*x + b
    y_noise = y + np.random.randn(len(x))*30
    return y, y_noise

def linear_regression(y_noise,x):
    a11 = len(x)
    a12 = np.sum(x)
    a22 = np.sum(np.power(x,2))
    f1 = np.sum(y_noise)
    f2 = np.sum(y_noise*x)
    coef = np.dot(np.linalg.inv(np.array([[a11,a12],[a12,a22]])),np.array([f1,f2]))
    return coef


if __name__ == '__main__':
    x = np.linspace(0,20,200)
    a = int(np.random.rand()*10)+1
    b = int(np.random.rand()*20)+1
    y, y_noise = gen_lineardata(a,b,x)
    plt.plot(x,y,'b')
    plt.plot(x,y_noise,'g+')
    
    coef = linear_regression(y_noise,x)
    a1 = coef[1]
    b1 = coef[0]
    print(coef)
    y1,y2 = gen_lineardata(a1,b1,x)
    plt.plot(x,y1,'r')
    plt.legend(labels=['original data','noise data','least-squares'],loc='upper left')
    plt.title('y='+str(a)+'x +'+str(b))
    plt.show()

某一次的实现结果:b=11.38812346, a=6.59033571(真实值为b=10,a=7)
在这里插入图片描述
\qquad 线性回归的实现也可以采用解方程组(2)的方式(M=1):

def linear_regression_approx(y_noise,x,M):
    design_matrix = np.asmatrix(np.ones(len(x))).T
    for i in range(1,M+1):
        arr = np.asmatrix(np.power(x,i)).T
        design_matrix  = np.concatenate((design_matrix ,arr),axis=1)
    
    coef = (design_matrix.T*design_matrix).I*(design_matrix.T*(np.asmatrix(y_noise).T))
    
    return np.asarray(coef)

\quad

4.2 梯度下降法实现线性回归

梯度下降法

  1. 求误差函数 E ( w ) E(\boldsymbol w) E(w) 关于参数 w \boldsymbol w w 的梯度 ∇ w E ( w ) \nabla_{\boldsymbol w}E(\boldsymbol w) wE(w)

\qquad 考虑训练集 { x i , y i } i = 0 N − 1 \{x_i, y_i\}_{i=0}^{N-1} {xi,yi}i=0N1上的线性回归问题:

( 1 ) \qquad(1) (1) 假设回归函数为 y = w 0 x + w 1 , w = [ w 0 , w 1 ] T y=w_0 x+w_1,\boldsymbol w=[w_0, w_1]^T y=w0x+w1,w=[w0,w1]T
( 2 ) \qquad(2) (2) 目标向量为 Y = [ y 0 , y 1 , ⋯   , y N − 1 ] T \boldsymbol Y=[y_0, y_1,\cdots,y_{N-1}]^T Y=[y0,y1,,yN1]T,输入样本为 X = [ x 0 x 1 ⋯ x N − 1 1 1 ⋯ 1 ] T \boldsymbol X=\begin{bmatrix}x_0&x_1&\cdots& x_{N-1} \\1&1&\cdots&1\end{bmatrix}^T X=[x01x11xN11]T

( 3 ) \qquad(3) (3) 误差函数为 E ( w ) = 1 N ∥ Y − X w ∥ 2 2 = 1 N ( Y − X w ) T ( Y − X w ) E(\boldsymbol w)=\frac{1}{N}\parallel \boldsymbol Y -\boldsymbol X\boldsymbol w\parallel_2^2=\frac{1}{N}(\boldsymbol Y -\boldsymbol X\boldsymbol w)^T(\boldsymbol Y -\boldsymbol X\boldsymbol w) E(w)=N1YXw22=N1(YXw)T(YXw)

\qquad 可求得:
∇ w E ( w ) = 1 N ( − 2 X T Y + 2 X T X w ) = − 2 N X T ( Y − X w ) \qquad\qquad\begin{aligned}\nabla_{\boldsymbol w}E(\boldsymbol w)&=\frac{1}{N}(-2\boldsymbol X^T\boldsymbol Y+2\boldsymbol X^T\boldsymbol X\boldsymbol w)\\ &=-\frac{2}{N}\boldsymbol X^T(\boldsymbol Y-\boldsymbol X\boldsymbol w)\end{aligned} wE(w)=N1(2XTY+2XTXw)=N2XT(YXw)

  1. 更新参数 w ( k + 1 ) = w ( k ) − η ∇ w E ( w ) \boldsymbol w^{(k+1)}=\boldsymbol w^{(k)}-\eta\nabla_{\boldsymbol w}E(\boldsymbol w) w(k+1)=w(k)ηwE(w),其中 η \eta η 为学习率

\qquad 因此,权向量的更新公式为:

w ( k + 1 ) = w ( k ) + η 2 N X T ( Y − X w ) \qquad\qquad\boldsymbol w^{(k+1)}=\boldsymbol w^{(k)}+\eta \frac{2}{N}X^T(\boldsymbol Y-\boldsymbol X\boldsymbol w) w(k+1)=w(k)+ηN2XT(YXw)

实现代码:

import numpy as np
import matplotlib.pyplot as plt
def gen_lineardata(a,b,x):
    y = a*x + b
    y_noise = y + np.random.randn(len(x))*30
    return y, y_noise
def linear_regression_approx(y_noise,x,M):
    design_matrix = np.asmatrix(np.ones(len(x))).T
    for i in range(1,M+1):
        arr = np.asmatrix(np.power(x,i)).T
        design_matrix  = np.concatenate((design_matrix ,arr),axis=1)    
    coef = (design_matrix.T*design_matrix).I*(design_matrix.T*(np.asmatrix(y_noise).T))    
    return np.asarray(coef)
def Gradient_descent(y_noise,x,eta=0.001): #学习率过大会导致求解过程无法收敛
    w0 = np.random.randn(2,1)
    num = len(x)
    xhat = np.concatenate((x.reshape(-1,1),np.ones((num,1))), axis=1)
    while True:        
        err = 2*(y_noise.reshape(-1,1) - xhat.dot(w0))/num
    #    print(np.sum(np.abs(err)))
        w1 = w0 + eta*np.dot(xhat.T, err)
        if np.sum(np.abs(w1-w0))<0.00001:
            break
        w0 = w1.copy()
    return  w1
if __name__ == '__main__':
    x = np.linspace(0,20,200)
    a = int(np.random.rand()*10)+1
    b = int(np.random.rand()*20)+1
    y, y_noise = gen_lineardata(a,b,x)    
    plt.plot(x,y_noise,'g.')    
    coef1 = linear_regression_approx(y_noise,x,1)
    print(coef1)
    a1 = coef1[1]
    b1 = coef1[0]
    y1,y2 = gen_lineardata(a1,b1,x)
    plt.plot(x,y1,'r')
    coef2 = Gradient_descent(y_noise,x)
    print(coef2)
    a1 = coef2[0]
    b1 = coef2[1]
    y1,y2 = gen_lineardata(a1,b1,x)
    plt.plot(x,y1,'k')
    plt.legend(labels=['noise data','linear regressor','gradient descent'],loc='upper left')
    plt.title('y='+str(a)+'x +'+str(b))
    plt.show()

运行结果为
coef1: [[11.01307632]
    [ 7.66747285]]       梯度下降法
coef2: [[ 7.66886398]
    [10.99451616]]      公式(3)的解析解
   
在这里插入图片描述
\qquad

4.3 周期函数的逼近(解方程组2)

\qquad 针对具有周期性质的数据集,更适合采用三角函数作为基函数,即采用公式:

\qquad\qquad φ ( x ) = a 0 + a 1 cos ⁡ x + b 1 sin ⁡ x + ⋯ + a M cos ⁡ ( M x ) + b M sin ⁡ ( M x ) \varphi(x)=a_{0}+a_{1}\cos x+b_{1}\sin x+\cdots+a_{M}\cos (Mx)+b_{M}\sin (Mx) φ(x)=a0+a1cosx+b1sinx++aMcos(Mx)+bMsin(Mx)

\qquad 根据公式(3)求出系数。

import numpy as np
import matplotlib.pyplot as plt

def gen_data(x):
    y = 2*np.sin(2*np.pi*x) + 3*np.cos(3*np.pi*x)
    y_noise = y + np.random.randn(len(x))
    return y, y_noise

def least_squares_approx(y_noise,x,M):
    
    design_matrix  = np.asmatrix(np.ones(len(x))).T
    for i in range(1,M+1):
        arr_sin = np.asmatrix(np.sin(i*x)).T
        design_matrix  = np.concatenate((design_matrix ,arr_sin),axis=1)
        arr_cos = np.asmatrix(np.cos(i*x)).T
        design_matrix  = np.concatenate((design_matrix ,arr_cos),axis=1)
    
    coef = (design_matrix.T*design_matrix).I*(design_matrix.T*(np.asmatrix(y_noise).T))
    return np.asarray(coef)

def approx_plot(coef,x,M):
    y = np.ones(len(x))*coef[0,0]
    for i in range(1,M+1):
        y = y + np.sin(i*x)*coef[2*i-1,0] + np.cos(i*x)*coef[2*i,0]

    plt.plot(x,y,'r')

if __name__ == '__main__':
    x = np.linspace(0,4,100)
    y, y_noise = gen_data(x)
    plt.plot(x,y,'b')
    plt.plot(x,y_noise,'g+')
    
    M = 8
    coef = least_squares_approx(y_noise,x,M)
    approx_plot(coef,x,M)
    plt.legend(labels=['original data','noise data','least square'],loc='upper left')
    plt.title('$y=2\sin(2\pi x)+3\cos(3\pi x)$')
    plt.show()

运行结果:
在这里插入图片描述

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值