机器学习二：最小二乘学习法

最新推荐文章于 2024-07-23 16:49:43 发布

十块钱什么的

最新推荐文章于 2024-07-23 16:49:43 发布

阅读量828

点赞数 1

分类专栏：笔记

本文链接：https://blog.csdn.net/ShihEuan/article/details/80664347

版权

笔记专栏收录该内容

41 篇文章 2 订阅

订阅专栏

最小二乘学习法

〇、目录

主要包含了线性模型的最小二乘法理论推导，和python程序验证。

一、原理推导

回归问题（Regression）中

最小二乘学习法（Least Square）是对模型的输出 $f_\theta(\vec x_i)$ 和训练集输出 $\{y_i\}_{i=1}^n$ 的平方误差

J L S (θ) = 1 2 \sum i = 1 n (f θ (x ⃗ i) - y i) 2

$J_{LS}(\theta)=\frac{1}{2}\sum_{i=1}^n(f_\theta(\vec x_i)-y_i)^2$

为最小时的参数 $\theta$ 进行学习的方法。

∥ f θ (x ⃗ i) - y i ∥ 2 (2) = ((f θ (x ⃗ i) - y i) T (f θ (x ⃗ i) - y i) - - - - - - - - - - - - - - - - - - - - - \sqrt) 2 = (f θ (x ⃗ i) - y i) T (f θ (x ⃗ i) - y i) = \sum i = 1 n (f θ (x ⃗ i) - y i) 2

$\|f_\theta(\vec x_i)-y_i\|_{(2)}^2 =(\sqrt {(f_\theta(\vec x_i)-y_i)^T(f_\theta(\vec x_i)-y_i)})^2 =(f_\theta(\vec x_i)-y_i)^T(f_\theta(\vec x_i)-y_i) =\sum_{i=1}^n(f_\theta(\vec x_i)-y_i)^2$

因此也叫 $l_2$ 损失最小化学习。

在求解最小值时，使用求微分令其为 $0$ 的 $\theta$ 值，因此在平方误差公式凑了一个 $\frac{1}{2}$ ，将微分得到的 $2$ 约去。

针对线性模型 $f_\theta(\vec x)=\vec\theta^T\phi(\vec x)$ ，先给出结果：

\nabla θ J L S = (\partial J L S \partial θ 1, \partial J L S \partial θ 2, \dots, \partial J L S \partial θ b) T = Φ T Φ θ ⃗ - Φ T y ⃗

$\nabla_\theta J_{LS}=(\frac{\partial J_{LS}}{\partial\theta_1},\frac{\partial J_{LS}}{\partial\theta_2},\cdots,\frac{\partial J_{LS}}{\partial\theta_b})^T=\Phi^T\Phi\vec\theta-\Phi^T\vec y$

推导过程：

J L S (θ ⃗) = 1 2 ∥ Φ θ ⃗ - y ⃗ ∥ 2

$J_{LS}(\vec\theta)=\frac{1}{2}\|\Phi\vec\theta-\vec y\|^2$

= 1 2 ∥ ⎛ ⎝ ⎜ ⎜ ϕ 1 (x ⃗ 1) ⋮ ϕ 1 (x ⃗ n) \dots ⋱ \dots ϕ b (x ⃗ 1) ⋮ ϕ b (x ⃗ n) ⎞ ⎠ ⎟ ⎟ ⎛ ⎝ ⎜ ⎜ θ 1 ⋮ θ b ⎞ ⎠ ⎟ ⎟ - ⎛ ⎝ ⎜ ⎜ y 1 ⋮ y n ⎞ ⎠ ⎟ ⎟ ∥ 2

$\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad =\frac{1}{2}\| \begin{pmatrix} \phi_1(\vec x_1) & \cdots & \phi_b(\vec x_1)\\ \vdots & \ddots & \vdots\\ \phi_1(\vec x_n) & \cdots & \phi_b(\vec x_n) \end{pmatrix} \begin{pmatrix} \theta_1 \\ \vdots \\ \theta_b \end{pmatrix} - \begin{pmatrix} y_1 \\ \vdots \\ y_n \end{pmatrix} \|^2$

= 1 2 ∥ ⎛ ⎝ ⎜ ⎜ θ 1 ϕ 1 (x ⃗ 1) + \dots + θ b ϕ b (x ⃗ 1) - y 1 ⋮ θ 1 ϕ 1 (x ⃗ n) + \dots + θ b ϕ b (x ⃗ n) - y n ⎞ ⎠ ⎟ ⎟ ∥ 2

$\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad =\frac{1}{2}\| \begin{pmatrix} \theta_1\phi_1(\vec x_1) + \cdots + \theta_b\phi_b(\vec x_1) - y_1\\ \vdots \\ \theta_1\phi_1(\vec x_n) + \cdots + \theta_b\phi_b(\vec x_n) - y_n \end{pmatrix} \|^2$

根据 $l_2$ -Norm可得：

= 1 2 (⎛ ⎝ ⎜ ⎜ θ 1 ϕ 1 (x ⃗ 1) + \dots + θ b ϕ b (x ⃗ 1) - y 1 ⋮ θ 1 ϕ 1 (x ⃗ n) + \dots + θ b ϕ b (x ⃗ n) - y n ⎞ ⎠ ⎟ ⎟ T ⎛ ⎝ ⎜ ⎜ θ 1 ϕ 1 (x ⃗ 1) + \dots + θ b ϕ b (x ⃗ 1) - y 1 ⋮ θ 1 ϕ 1 (x ⃗ n) + \dots + θ b ϕ b (x ⃗ n) - y n ⎞ ⎠ ⎟ ⎟ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  ⎷             ) 2

$=\frac{1}{2}(\sqrt{ \begin{pmatrix} \theta_1\phi_1(\vec x_1) + \cdots + \theta_b\phi_b(\vec x_1) - y_1\\ \vdots \\ \theta_1\phi_1(\vec x_n) + \cdots + \theta_b\phi_b(\vec x_n) - y_n \end{pmatrix} ^T \begin{pmatrix} \theta_1\phi_1(\vec x_1) + \cdots + \theta_b\phi_b(\vec x_1) - y_1\\ \vdots \\ \theta_1\phi_1(\vec x_n) + \cdots + \theta_b\phi_b(\vec x_n) - y_n \end{pmatrix} })^2$

= 1 2 ⎛ ⎝ ⎜ ⎜ θ 1 ϕ 1 (x ⃗ 1) + \dots + θ b ϕ b (x ⃗ 1) - y 1 ⋮ θ 1 ϕ 1 (x ⃗ n) + \dots + θ b ϕ b (x ⃗ n) - y n ⎞ ⎠ ⎟ ⎟ T ⎛ ⎝ ⎜ ⎜ θ 1 ϕ 1 (x ⃗ 1) + \dots + θ b ϕ b (x ⃗ 1) - y 1 ⋮ θ 1 ϕ 1 (x ⃗ n) + \dots + θ b ϕ b (x ⃗ n) - y n ⎞ ⎠ ⎟ ⎟

$=\frac{1}{2} \begin{pmatrix} \theta_1\phi_1(\vec x_1) + \cdots + \theta_b\phi_b(\vec x_1) - y_1\\ \vdots \\ \theta_1\phi_1(\vec x_n) + \cdots + \theta_b\phi_b(\vec x_n) - y_n \end{pmatrix} ^T \begin{pmatrix} \theta_1\phi_1(\vec x_1) + \cdots + \theta_b\phi_b(\vec x_1) - y_1\\ \vdots \\ \theta_1\phi_1(\vec x_n) + \cdots + \theta_b\phi_b(\vec x_n) - y_n \end{pmatrix}$

= 1 2 [(θ 1 ϕ 1 (x ⃗ 1) + \dots + θ b ϕ b (x ⃗ 1) - y 1) 2 + \dots + (θ 1 ϕ 1 (x ⃗ n) + \dots + θ b ϕ b (x ⃗ n) - y n) 2]

$\quad =\frac{1}{2}[(\theta_1\phi_1(\vec x_1) + \cdots + \theta_b\phi_b(\vec x_1) - y_1)^2+\cdots+(\theta_1\phi_1(\vec x_n) + \cdots + \theta_b\phi_b(\vec x_n) - y_n)^2]$

因此，

\nabla θ J L S = 1 2 ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ 2 (θ 1 ϕ 1 (x ⃗ 1) + \dots + θ b ϕ b (x ⃗ 1) - y 1) ϕ 1 (x ⃗ 1) + \dots + 2 (θ 1 ϕ 1 (x ⃗ n) + \dots + θ b ϕ b (x ⃗ n) - y n) ϕ 1 (x ⃗ n) ⋮ 2 (θ 1 ϕ 1 (x ⃗ 1) + \dots + θ b ϕ b (x ⃗ 1) - y 1) ϕ b (x ⃗ 1) + \dots + 2 (θ 1 ϕ 1 (x ⃗ n) + \dots + θ b ϕ b (x ⃗ n) - y n) ϕ b (x ⃗ n)) ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟

$\nabla_\theta J_{LS}=\frac{1}{2} \begin{pmatrix} 2(\theta_1\phi_1(\vec x_1) + \cdots + \theta_b\phi_b(\vec x_1) - y_1)\phi_1(\vec x_1)+\cdots+2(\theta_1\phi_1(\vec x_n) + \cdots + \theta_b\phi_b(\vec x_n) - y_n)\phi_1(\vec x_n) \\ \vdots \\ 2(\theta_1\phi_1(\vec x_1) + \cdots + \theta_b\phi_b(\vec x_1) - y_1)\phi_b(\vec x_1)+\cdots+2(\theta_1\phi_1(\vec x_n) + \cdots + \theta_b\phi_b(\vec x_n) - y_n)\phi_b(\vec x_n)) \end{pmatrix}$

利用矩阵乘法，可将求导得到的 $\phi_1(\vec x_1)$ 等提出来：

\nabla θ J L S = ⎛ ⎝ ⎜ ⎜ ϕ 1 (x ⃗ 1) ⋮ ϕ b (x ⃗ 1) \dots ⋱ \dots \dots ϕ 1 (x ⃗ n) ⋮ \dots ϕ b (x ⃗ n) ⎞ ⎠ ⎟ ⎟ \cdot (Φ θ ⃗ - y ⃗) = Φ T Φ θ ⃗ - Φ T y ⃗

$\nabla_\theta J_{LS}= \begin{pmatrix} \phi_1(\vec x_1) & \cdots & \cdots \phi_1(\vec x_n)\\ \vdots & \ddots & \vdots \\ \phi_b(\vec x_1) & \cdots & \cdots \phi_b(\vec x_n) \end{pmatrix} \cdot (\Phi\vec\theta-\vec y) =\Phi^T\Phi\vec\theta-\Phi^T\vec y$

即得。

通过求解 $\Phi^T\Phi\vec\theta-\Phi^T\vec y=0$ 即可得到通过最小二乘法学习的参数 $\vec\theta$ ，但在大多数情况下，由于数据量远远大于参数数量，因此 $\Phi$ 是一个奇异矩阵，并非方阵，我们不能通过简单的通过他的逆矩阵求出解，因此这里需要引入一个伪逆矩阵的新概念，用来解决此问题。

二、实践

学习了最小二乘法的原理之后，通过python程序来巩固一下。
本程序根据上面对线性模型的推导，基函数使用三角多项式形式，即：

ϕ (x) = (1, sin x 2, cos x 2, sin 2 x 2, cos 2 x 2, \dots, sin 15 x 2, cos 15 x 2) T

$\phi(x) = (1,\sin\frac{x}{2},\cos\frac{x}{2},\sin\frac{2x}{2},\cos\frac{2x}{2},\cdots,\sin\frac{15x}{2},\cos\frac{15x}{2})^T$

而数据我们使用近似的 $sinc(x) = \frac{\sin(\pi x)}{\pi x}$ 函数加上随机误差来生成得到。

程序如下：

    #!/usr/bin/env python3
    # -*- coding: utf-8 -*-

    __author__ = 'shiheuan'

    import numpy as np
    import matplotlib.pyplot as plt


    def LeastSquares():
        N = 1000
        n = 50 # len(x)
        x = np.array([np.linspace(-3.,3.,n)])
        # print(x)

        X = np.array([np.linspace(-3.,3.,N)])
        # 图像形态 + 随机误差
        y = np.sin(np.pi*x)/(np.pi*x)+0.1*x+0.05*np.random.randn(1,n)

        P = np.ones([31,N])
        p = np.ones([31,n])

        for i in range(15):
            P[2*i+1,:] = np.sin((i+1)/2*X)
            P[2*i+2,:] = np.cos((i+1)/2*X)
            p[2*i+1,:] = np.sin((i+1)/2*x)
            p[2*i+2,:] = np.cos((i+1)/2*x)

        print(p)
        p = p.T
        print(p)
        P = P.T
        pp = np.linalg.pinv(p)
        pp = np.dot(p.T, p)
        # print(pp)
        # pinv 与 inv 的区别
        ppi = np.linalg.pinv(pp)
        # print(ppi)
        pp_ = np.dot(ppi, p.T)
        # print(pp_)
        t = np.dot(pp_, y.T)
        # print(t)
        # print(y)
        F = np.dot(P, t).T

        plt.plot(x[0,:],y[0,:],'.')
        plt.plot(X[0,:],F[0,:])
        plt.show()

    LeastSquares()