训练数据为𝑁个输入数据
X
=
(
x
1
,
x
2
,
⋯
x
N
)
\mathbf{X} =\begin{pmatrix}\mathbf{x_1},\mathbf{x_2},\cdots\mathbf{x_N} \end{pmatrix}
X=(x1,x2,⋯xN)
及对应函数值
t
=
(
t
1
,
t
2
,
⋯
t
N
)
\mathbf{t} =\begin{pmatrix}{t_1},{t_2},\cdots{t_N} \end{pmatrix}
t=(t1,t2,⋯tN)
模型为线性回归模型
y
(
x
,
w
)
=
w
T
ϕ
(
x
)
y(\mathbf{x},\mathbf{w}) =\mathbf{w}^T\mathbf{\phi(x)}
y(x,w)=wTϕ(x)
t = ( t 0 t 1 ⋮ t N ) N × 1 \mathbf{t} =\begin{pmatrix} t_{0}\\ t_{1} \\ \vdots \\ t_{N} \\ \end{pmatrix}_{N \times 1} t= t0t1⋮tN N×1
w = ( w 0 w 1 ⋮ w M − 1 ) ( M − 1 ) × 1 \mathbf{w} =\begin{pmatrix} w_{0}\\ w_{1} \\ \vdots \\ w_{M-1} \\ \end{pmatrix}_{(M-1) \times 1} w= w0w1⋮wM−1 (M−1)×1
ϕ = ( ϕ T ( x 1 ) ϕ T ( x 2 ) ⋮ ϕ T ( x N ) ) = ( ϕ 0 ( x 1 ) ϕ 1 ( x 1 ) ϕ 2 ( x 1 ) ⋯ ϕ M − 1 ( x 1 ) ϕ 0 ( x 2 ) ϕ 1 ( x 2 ) ϕ 2 ( x 2 ) ⋯ ϕ M − 1 ( x 2 ) ⋮ ⋮ ⋱ ⋮ ϕ 0 ( x N ) ϕ 1 ( x N ) ϕ 2 ( x N ) ⋯ ϕ M − 1 ( x N ) ) N × M \mathbf{\phi}= \begin{pmatrix} \phi^T(x_1)\\ \phi^T(x_2)\\ \vdots \\ \phi^T(x_N)\\ \end{pmatrix}= \begin{pmatrix} \phi_{0}(x_1) & \phi_{1}(x_1) & \phi_{2}(x_1) & \cdots & \phi_{M-1}(x_1)\\ \phi_{0}(x_2) & \phi_{1}(x_2) & \phi_{2}(x_2) & \cdots & \phi_{M-1}(x_2)\\ \vdots & \vdots & \ddots & \vdots \\ \phi_{0}(x_N) & \phi_{1}(x_N) & \phi_{2}(x_N) & \cdots & \phi_{M-1}(x_N)\\ \end{pmatrix}_{N\times M} ϕ= ϕT(x1)ϕT(x2)⋮ϕT(xN) = ϕ0(x1)ϕ0(x2)⋮ϕ0(xN)ϕ1(x1)ϕ1(x2)⋮ϕ1(xN)ϕ2(x1)ϕ2(x2)⋱ϕ2(xN)⋯⋯⋮⋯ϕM−1(x1)ϕM−1(x2)ϕM−1(xN) N×M
因此平方和误差函数
E D ( w ) = 1 2 ∑ n = 1 N ( t n − w T ϕ ( x n ) ) 2 E_D(\mathbf{w})=\frac{1}{2} \sum_{n=1}^N (t_n-\mathbf{w}^T \mathbf{\phi}(\mathbf{x}_n))^2 ED(w)=21n=1∑N(tn−wTϕ(xn))2
求导得
∇ E D ( w ) = ∑ n = 1 N ( t n − w T ϕ ( x n ) ) ϕ ( x n ) T \nabla E_D(\mathbf{w})=\sum_{n=1}^N (t_n-\mathbf{w}^T \mathbf{\phi}(\mathbf{x}_n))\mathbf{\phi}(\mathbf{x}_n)^T ∇ED(w)=n=1∑N(tn−wTϕ(xn))ϕ(xn)T
合并可得
∇ E D ( w ) = ( ( t 0 t 1 ⋯ t N ) − ( w 0 w 1 ⋯ w M − 1 ) ( ϕ ( x 1 ) ϕ ( x 2 ) ⋯ ϕ ( x N ) ) ) ( ϕ T ( x 1 ) ϕ T ( x 2 ) ⋮ ϕ T ( x N ) ) \nabla E_D(\mathbf{w}) = \left( \begin{pmatrix} t_{0} &t_{1}&\cdots &t_{N} \end{pmatrix} - \begin{pmatrix} w_{0} & w_{1}& \cdots & w_{M-1} \end{pmatrix} \begin{pmatrix} \phi(x_1) & \phi(x_2)& \cdots & \phi(x_N) \end{pmatrix}\right) \begin{pmatrix} \phi^T(x_1)\\ \phi^T(x_2)\\ \vdots \\ \phi^T(x_N)\\ \end{pmatrix} ∇ED(w)=((t0t1⋯tN)−(w0w1⋯wM−1)(ϕ(x1)ϕ(x2)⋯ϕ(xN))) ϕT(x1)ϕT(x2)⋮ϕT(xN)
即
∇ E D ( w ) = ( ( t 0 t 1 ⋮ t N ) T − ( w 0 w 1 ⋮ w M − 1 ) T ( ϕ 0 ( x 1 ) ϕ 1 ( x 1 ) ϕ 2 ( x 1 ) ⋯ ϕ M − 1 ( x 1 ) ϕ 0 ( x 2 ) ϕ 1 ( x 2 ) ϕ 2 ( x 2 ) ⋯ ϕ M − 1 ( x 2 ) ⋮ ⋮ ⋱ ⋮ ϕ 0 ( x N ) ϕ 1 ( x N ) ϕ 2 ( x N ) ⋯ ϕ M − 1 ( x N ) ) T ) ( ϕ 0 ( x 1 ) ϕ 1 ( x 1 ) ϕ 2 ( x 1 ) ⋯ ϕ M − 1 ( x 1 ) ϕ 0 ( x 2 ) ϕ 1 ( x 2 ) ϕ 2 ( x 2 ) ⋯ ϕ M − 1 ( x 2 ) ⋮ ⋮ ⋱ ⋮ ϕ 0 ( x N ) ϕ 1 ( x N ) ϕ 2 ( x N ) ⋯ ϕ M − 1 ( x N ) ) \nabla E_D(\mathbf{w})= \left(\begin{pmatrix} t_{0}\\ t_{1} \\ \vdots \\ t_{N} \\ \end{pmatrix}^T - \begin{pmatrix} w_{0}\\ w_{1} \\ \vdots \\ w_{M-1} \\ \end{pmatrix}^T \begin{pmatrix} \phi_{0}(x_1) & \phi_{1}(x_1) & \phi_{2}(x_1) & \cdots & \phi_{M-1}(x_1)\\ \phi_{0}(x_2) & \phi_{1}(x_2) & \phi_{2}(x_2) & \cdots & \phi_{M-1}(x_2)\\ \vdots & \vdots & \ddots & \vdots \\ \phi_{0}(x_N) & \phi_{1}(x_N) & \phi_{2}(x_N) & \cdots & \phi_{M-1}(x_N)\\ \end{pmatrix}^T\right)\begin{pmatrix} \phi_{0}(x_1) & \phi_{1}(x_1) & \phi_{2}(x_1) & \cdots & \phi_{M-1}(x_1)\\ \phi_{0}(x_2) & \phi_{1}(x_2) & \phi_{2}(x_2) & \cdots & \phi_{M-1}(x_2)\\ \vdots & \vdots & \ddots & \vdots \\ \phi_{0}(x_N) & \phi_{1}(x_N) & \phi_{2}(x_N) & \cdots & \phi_{M-1}(x_N)\\ \end{pmatrix} ∇ED(w)= t0t1⋮tN T− w0w1⋮wM−1 T ϕ0(x1)ϕ0(x2)⋮ϕ0(xN)ϕ1(x1)ϕ1(x2)⋮ϕ1(xN)ϕ2(x1)ϕ2(x2)⋱ϕ2(xN)⋯⋯⋮⋯ϕM−1(x1)ϕM−1(x2)ϕM−1(xN) T ϕ0(x1)ϕ0(x2)⋮ϕ0(xN)ϕ1(x1)ϕ1(x2)⋮ϕ1(xN)ϕ2(x1)ϕ2(x2)⋱ϕ2(xN)⋯⋯⋮⋯ϕM−1(x1)ϕM−1(x2)ϕM−1(xN)
即
∇ E D ( w ) = ( t T − w T Φ T ) Φ \nabla E_D(\mathbf{w})=(\mathbf{t}^T-\mathbf{w}^T\mathbf{\Phi}^T)\mathbf{\Phi} ∇ED(w)=(tT−wTΦT)Φ
令其为0可得
t T Φ = w T Φ T Φ \mathbf{t}^T\mathbf{\Phi}=\mathbf{w}^T\mathbf{\Phi}^T\mathbf{\Phi} tTΦ=wTΦTΦ
( t T Φ ) T = ( w T Φ T Φ ) T (\mathbf{t}^T\mathbf{\Phi})^T=(\mathbf{w}^T\mathbf{\Phi}^T\mathbf{\Phi})^T (tTΦ)T=(wTΦTΦ)T
Φ T t = Φ T Φ w \mathbf{\Phi}^T\mathbf{t}=\mathbf{\Phi}^T\mathbf{\Phi}\mathbf{w} ΦTt=ΦTΦw
w M L = ( Φ T Φ ) − 1 Φ T t \mathbf{w}_{ML}=(\mathbf{\Phi}^T\mathbf{\Phi})^{-1}\mathbf{\Phi}^T\mathbf{t} wML=(ΦTΦ)−1ΦTt