线性回归与岭回归参数向量公式推导

最新推荐文章于 2022-06-22 15:15:12 发布

Arsener_gong

最新推荐文章于 2022-06-22 15:15:12 发布

阅读量1.2k

点赞数 1

分类专栏：机器学习文章标签：线性回归岭回归

本文链接：https://blog.csdn.net/qq_38032064/article/details/90379935

版权

机器学习专栏收录该内容

4 篇文章 1 订阅

订阅专栏

线性回归是一种常用的机器学习回归算法，其原理是通过输入和输出学习回归函数，确定回归参数向量 $\mathbf{w}$ 和截距 $b$ 。对于新的样本 $\mathbf{x}$ ，其预测值为 $\hat{y}=\mathbf{x}\mathbf{w}+b$ 。式中， $\mathbf{x}$ 是 $n - 1$ 维横向量， $\mathbf{x}=[x_1,x_2,...,x_{n-1}]$ ， $\mathbf{w}$ 是 $n - 1$ 维列向量， $\mathbf{w}=[w_1,w_2,...,w_{n-1}]^\mathrm{T}$ 。
通常，为了方便表示，将 $b$ 写入 $\mathbf{w}$ 中， $\mathbf{x}$ 中也在末尾增加一个1，因此 $\mathbf{w}$ 、 $\mathbf{x}$ 、 $\hat{y}$ 分别变为：
$\mathbf{w}=[w_1,w_2,...,w_{n-1},b]^\mathrm{T}$ $\mathbf{x}=[x_1,x_2,...,x_{n-1},1]$ $\hat{y}=\mathbf{x}\mathbf{w}$
线性回归的成本函数为：
$\frac{1}{2}\sum_{i=1}^m(y_i-\hat{y_i})^2$
其中 $m$ 为样本的数量。从成本函数可以看出，线性回归的求解是希望求得平方误差的最小值，即找到一条直线尽量拟合所有样本点。原理在这里不过多赘述，本文主要进行 $\mathbf{w}$ 的公式推导。
在推导之前，需要简单介绍矩阵求导规则。

矩阵求导规则

行向量对元素求导
设 $\mathbf{y}=\begin{bmatrix}y_{1},y_{2},\dots,y_{n}\end{bmatrix}$ 是 $n$ 维行向量， $x$ 是元素，则 $\frac{\partial\mathbf{y}}{\partial x}=\begin{bmatrix}\frac{\partial y_{1}}{\partial x},\frac{\partial y_{2}}{\partial x},\dots,\frac{\partial y_{n}}{\partial x} \end{bmatrix}$
列向量对元素求导
设 $\mathbf{y}=\begin{bmatrix}y_{1}\\ y_{2}\\ \vdots\\ y_{n} \end{bmatrix}$ 是 $n$ 维列向量， $x$ 是元素，则 $\frac{\partial\mathbf{y}}{\partial x}=\begin{bmatrix}\frac{\partial y_{1}}{\partial x}\\ \frac{\partial y_{2}}{\partial x}\\ \vdots\\ \frac{\partial y_{n}}{\partial x} \end{bmatrix}$
元素对行向量求导
设 $y$ 是元素， $\mathbf{x}=\begin{bmatrix}x_{1},x_{2},\dots,x_{n}\end{bmatrix}$ 是 $n$ 维行向量，则 $\frac{\partial y}{\partial \mathbf{x}}=\begin{bmatrix}\frac{\partial y}{\partial x_{1}},\frac{\partial y}{\partial x_{2}},\dots,\frac{\partial y}{\partial x_{n}} \end{bmatrix}$
元素对列向量求导
设 $y$ 是元素， $\mathbf{x}=\begin{bmatrix}x_{1}\\ x_{2}\\ \vdots\\ x_{n} \end{bmatrix}$ 是 $n$ 维列向量，则 $\frac{\partial y}{\partial \mathbf{x}}=\begin{bmatrix}\frac{\partial y}{\partial x_{1}}\\ \frac{\partial y}{\partial x_{2}}\\ \vdots\\ \frac{\partial y}{\partial x_{n}} \end{bmatrix}$

以上只简单介绍了元素和向量之间相互求导的方法。更详细的有关矩阵求导的内容可以参考一下两篇博客：

第二篇博客中的图片取自维基百科，因此访问时需要踩个梯子。
总结几个重要的求导公式（分母布局结果，分子布局的结果为分母布局结果的转置）：

$\frac{\partial Ax}{\partial x}= A^\mathrm{T}$
$\frac{\partial x^\mathrm TA}{\partial x}= A$
$\frac{\partial x^\mathrm TAx}{\partial x}= Ax+A^\mathrm{T}x$

其中 $x$ 为列向量， $A$ 为与 $x$ 无关的行向量。

线性回归参数向量公式推导

本文推导过程采用分母布局。首先将成本函数矩阵化，表示为 $\mathbf w$ 的函数：
$\begin{aligned} J(\mathbf w) &=\frac{1}{2}\sum_{i=1}^m(y_i-\hat{y_i})^2\\ &=\frac{1}{2}\sum_{i=1}^m(y_i-\mathbf x_i\mathbf{w})^2\\ &=\frac{1}{2}(\mathbf Y-\mathbf X\mathbf w)^\mathrm T(\mathbf Y-\mathbf X\mathbf w)\\ &=\frac{1}{2}(\mathbf Y^\mathrm T-\mathbf w^\mathrm T\mathbf X^\mathrm T)(\mathbf Y-\mathbf X\mathbf w)\\ &=\frac{1}{2}(\mathbf w^\mathrm T\mathbf X^\mathrm T\mathbf X\mathbf w-\mathbf w^\mathrm T\mathbf X^\mathrm T\mathbf Y - \mathbf Y^\mathrm T\mathbf X\mathbf w + \mathbf Y^\mathbf T\mathbf Y) \end{aligned}$
$\begin{aligned} \frac{\partial J(\mathbf w)}{\partial\mathbf w}&=\frac{1}{2}[\mathbf X^\mathrm T\mathbf X\mathbf w + (\mathbf X^\mathrm T\mathbf X)^\mathrm T\mathbf w-\mathbf X^\mathrm T\mathbf Y-(\mathbf Y^\mathrm T\mathbf X)^\mathrm T]\\ &=\mathbf X^\mathrm T\mathbf X\mathbf w-\mathbf X^\mathrm T\mathbf Y \end{aligned}$
令其等于0，则
$\mathbf X^\mathrm T\mathbf X\mathbf w-\mathbf X^\mathrm T\mathbf Y=0$ $\mathbf w=(\mathbf X^\mathrm T\mathbf X)^{-1}\mathbf X^\mathrm T\mathbf Y$

岭回归参数向量公式推导

岭回归其实就是L2正则化，即在成本函数后面加上对参数向量的L2正则化项，因此，成本函数变为：
$\begin{aligned} J(\mathbf w) &=\frac{1}{2}(\mathbf Y^\mathrm T-\mathbf w^\mathrm T\mathbf X^\mathrm T)(\mathbf Y-\mathbf X\mathbf w)+\frac{\lambda}{2}||\mathbf w||^2\\ &=\frac{1}{2}(\mathbf Y^\mathrm T-\mathbf w^\mathrm T\mathbf X^\mathrm T)(\mathbf Y-\mathbf X\mathbf w) + \frac{\lambda}{2}\mathbf w^\mathrm T\mathbf w \end{aligned}$
$\begin{aligned} \frac{\partial J(\mathbf w)}{\partial\mathbf w}&=\mathbf X^\mathrm T\mathbf X\mathbf w-\mathbf X^\mathrm T\mathbf Y+\frac{\lambda}{2}(\mathbf w+\mathbf w)\\ &=\mathbf X^\mathrm T\mathbf X\mathbf w-\mathbf X^\mathrm T\mathbf Y+\lambda\mathbf w\\ &=(\mathbf X^\mathrm T\mathbf X\mathbf +\lambda\mathbf I)\mathbf w-\mathbf X^\mathrm T\mathbf Y \end{aligned}$
令其等于0，则
$(\mathbf X^\mathrm T\mathbf X\mathbf +\lambda\mathbf I)\mathbf w-\mathbf X^\mathrm T\mathbf Y=0$ $\mathbf w=(\mathbf X^\mathrm T\mathbf X+\lambda\mathbf I)^{-1}\mathbf X^\mathrm T\mathbf Y$
公式中的 $\lambda\mathbf I$ 就是岭回归的“岭”。
其实，加入 $\lambda\mathbf I$ 的作用除了可以防止过拟合、进行正则化的操作，还可以防止由于 $\mathbf X^\mathrm T\mathbf X$ 不可逆导致参数 $\mathbf w$ 无法求解。

Arsener_gong

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
线性回归与岭回归参数向量公式推导

线性回归是一种常用的机器学习回归算法，其原理是通过输入和输出学习回归函数，确定回归参数向量w\mathbf{w}w和截距bbb。对于新的样本x\mathbf{x}x，其预测值为y^=xw+b\hat{y}=\mathbf{x}\mathbf{w}+by^=xw+b。式中，x\mathbf{x}x是n−1n-1n−1维横向量，x=(x1,x2,...,xn−1)\mathbf{x}=(x_1,x_...
复制链接

扫一扫