线性回归梯度下降法推导

最新推荐文章于 2022-10-18 22:59:42 发布

蓬某某

最新推荐文章于 2022-10-18 22:59:42 发布

阅读量711

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/wang_yunpeng/article/details/103400672

版权

机器学习专栏收录该内容

10 篇文章 1 订阅

订阅专栏

返回目录

输入与输出满足线性关系，且输出为一系列连续的值。
假设函数：
$h(\vec{x}) = \vec{\theta}^T\vec{x}$
其中：
$\begin{aligned} \vec{x}=[x_0, x_1, ...,x_n]^T\in\mathbb R^{(n+1)\times1} \\ \vec{\theta}=[\theta_0, \theta_1, ...,\theta_n]^T\in\mathbb R^{(n+1)\times1} \\ （x_0=1,n为特征个数） \end{aligned}$
代价函数：
$\vec{\theta}) = \frac{1}{2m}\sum_{i=1}^{i=m}(h(\vec{x}^{(i)})-y^{(i)})^2$
其中：
$\begin{aligned} \vec{y}=[y^{(1)},y^{(2)}, ...,y^{(m)}]^T\in\mathbb R^{(m\times1)} \\ （m为测试样本个数） \end{aligned}$
梯度下降方向更新 $\theta$ ：
$\theta_j := \theta_j-\alpha\frac{\partial J( \vec{\theta})}{\partial \theta_j}$
因为 $J(\theta)$ 为凸函数，存在极小值点。当 ${\partial J(\theta)} / {\partial \theta_j} > 0$ 时，此时 $\theta$ 在最优值右边，更新使 $\theta$ 值减小。当 ${\partial J(\theta)} / {\partial \theta_j} < 0$ 时，此时 $\theta$ 在最优值左边，更新使 $\theta$ 值增大。
$\begin{aligned} \theta_j :&= \theta_j-\alpha\frac{1}{m}\sum_{i=1}^{i=m}( \vec{\theta}^T\vec{x}^{(i)}-y^{(i)})\vec{x}_j^{(i)} \\ &= \theta_j-\frac{\alpha}{m} \begin{pmatrix} \vec{\theta}^T\vec{x}^{(1)}-y^{(1)} & \vec{\theta}^T\vec{x}^{(2)}-y^{(2)} & ...& \vec{\theta}^T\vec{x}^{(m)}-y^{(m)} \end{pmatrix} \begin{pmatrix} \vec{x}_j^{(1)} \\\vec{x}_j^{(2)} \\... \\\vec{x}_j^{(m)} \end{pmatrix} \\ &= \theta_j-\frac{\alpha}{m} \begin{pmatrix} \vec{\theta}^T\vec{x}^{(1)}-y^{(1)} \\ \vec{\theta}^T\vec{x}^{(2)}-y^{(2)} \\ ...\\ \vec{\theta}^T\vec{x}^{(m)}-y^{(m)} \end{pmatrix}^T \begin{pmatrix} \vec{x}_j^{(1)} \\\vec{x}_j^{(2)} \\... \\\vec{x}_j^{(m)} \end{pmatrix} \\ \end{aligned}$
所以：
$\begin{aligned} \begin{pmatrix} \theta_0 \\\theta_1\\... \\\theta_n \end{pmatrix} ^T &:= \begin{pmatrix} \theta_0 \\\theta_1\\... \\\theta_n \end{pmatrix} ^T -\frac{\alpha}{m} \begin{pmatrix} \vec{\theta}^T\vec{x}^{(1)}-y^{(1)} \\ \vec{\theta}^T\vec{x}^{(2)}-y^{(2)} \\ ...\\ \vec{\theta}^T\vec{x}^{(m)}-y^{(m)} \end{pmatrix} ^T \begin{pmatrix} \vec{x}_0^{(1)} & \vec{x}_1^{(1)} & ...&\vec{x}_n^{(1)}\\ \vec{x}_0^{(2)} & \vec{x}_1^{(2)} & ...&\vec{x}_n^{(2)}\\ ... & ...&...&...\\ \vec{x}_0^{(m)} & \vec{x}_1^{(m)} & ...&\vec{x}_n^{(m)} \end{pmatrix} \\ &:= \begin{pmatrix} \theta_0 \\\theta_1\\... \\\theta_n \end{pmatrix} ^T -\frac{\alpha}{m} \begin{pmatrix} (\vec{x}^{(1)})^T \vec{\theta}-y^{(1)} \\ (\vec{x}^{(2)})^T \vec{\theta}-y^{(2)} \\ ...\\(\vec{x}^{(m)})^T \vec{\theta}-y^{(m)} \end{pmatrix} ^T \begin{pmatrix} x_{(0, 0)} & x_{(0, 1)} & ...&x_{(0, n)}\\ x_{(1, 0)} & x_{(1, 1)}& ...&x_{(1, n)}\\ ... & ...&...&...\\ x_{(m-1, 0)} & x_{(m-1,1)} & ...&x_{(m-1, n)} \end{pmatrix} \end{aligned}$
得到 $\vec{\theta}$ 更新的向量表达形式：
$\begin{aligned} \vec{\theta}^T &:= \vec{\theta}^T-\frac{\alpha}{m}({X} \vec{\theta}-\vec{y})^T{X} \\ \vec{\theta} &:= \vec{\theta}-\frac{\alpha}{m}{X}^T({X} \vec{\theta}-\vec{y}) \end{aligned}$
代码实现如下：

import numpy as np
import matplotlib.pyplot as plt

def linear_regression(x_in, y_in, alpha=0.01, epsilon=1e-5):
    sample_num = y_in.shape[0]
    return gradient_descent(sample_num, x_in, y_in, cost_function_lr, alpha, epsilon)
    
def cost_function_lr(sample_num, theta, x_in, y_in):
    diff = (x_in*theta)-y_in
    j_theta = diff.T*diff/sample_num
    partial_theta = (x_in.T*diff)/sample_num
    return (j_theta, partial_theta)
    
def gradient_descent(sample_num, x_in, y_in, cost_function, alpha, epsilon):
    theta = np.mat(np.zeros((x_in.shape[1],1)))
    pre_theta = 0xFFFFFFFF;
    count = 5000
    while count:
        (j_theta, partial_theta) = cost_function(sample_num, theta, x_in, y_in)
        if j_theta < epsilon or np.fabs(partial_theta).all() < epsilon:
            break
        theta -= alpha*partial_theta
        if j_theta > pre_theta:
            alpha /= 10
        pre_theta = j_theta;
        count -= 1
    if not count:
        print('get max count')
    return theta
    
if __name__ == '__main__':
    m = 30
    x0 = np.ones((m, 1))
    x1 = np.arange(1, m+1).reshape(m, 1)
    x_in = np.mat(np.hstack((x0, x1)))
    theta = np.mat([5, 0.5]).reshape(2, 1)
    y_in = x_in*theta + np.random.randn(m).reshape(m, 1)

    res_theta_gd = linear_regression(x_in, y_in, 0.001, 1e-5)
    
    plt.scatter(x1, np.array(y_in))
    plt.plot(x1, np.array(x_in*res_theta_gd), color='r')
    diff = x_in*res_theta_gd-y_in
    plt.title('cost: %f' % ((diff.T*diff)[0][0] / (2*m)))
    plt.show()

结果分析：
线性回归梯度下降法
归一化：
对于 $\vec{x}$ 的各个特征取值范围进行归一化处理，提高收敛速度。
$x_j = \frac{x_j-min(x_j^{(i)})}{max(x_j^{(i)})-min(x_j^{(i)})}$
标准化：
$x_j = \frac{x_j-u}{\sigma}$
其中：
$\begin{aligned} &i=1, 2, .., m&(m为样本个数)\\ &j =0, 1,2,..,n&(n为特征个数)\\ &u为均值\\ &\sigma为标准差 \end{aligned}$