ML(1) Linear Regression

最新推荐文章于 2022-04-05 20:28:07 发布

ZJ_Frank

最新推荐文章于 2022-04-05 20:28:07 发布

阅读量153

点赞数

分类专栏：数据结构与算法文章标签：线性回归

本文链接：https://blog.csdn.net/ZJ_11701/article/details/114288853

版权

线性回归是机器学习中最基础的算法之一。本文从确定性和概率性两个角度探讨标准、鲁棒、岭回归和拉索回归，并介绍了广义线性回归。通过调整损失函数，可以解决过拟合问题，例如使用L1范数降低异常值影响，使用L2范数或L1范数控制参数权重，以避免过拟合。

摘要由CSDN通过智能技术生成

Introduction

Linear regression is perhaps the most fundamental algorithm in machine learning. In this setting, given a dataset $D=\{(x^i,y^i)|x^i\in \mathbb{R}^n, y^i\in\mathbb{R} \}_{i=1}^m$ (x is feature, y is label) we fit a model of the form $h_\theta(x) = \theta^T\phi(x)$ , where $\theta$ is the parameter vector, $\phi(x)$ is a transformed vector (for example, $\phi(x) = [1,x_1,x_2,...,x_1x_2,...,x_nx_{n-1}]$ ). That is, the model is linear IN TERMS OF parameters instead of input vector $x$ , as feature transformation is allowed.

Our goal is to fit the model $h_\theta(x) = \theta^T\phi(x)$ as good as possible. That is, after tuning our parameters, given an unseen $x^*$ , we should be able to make $h_\theta(x^*)\to y^*$ . In a nutshell, find the BEST $\theta$ .

Sometimes, our model might fit the training dataset well, yet failed to generalize to unseen data. This introduces the problem of OVERFITTING. To address this, we could use robust linear regression, ridge regression, lasso regression.

In what follows, I will derive the various linear regression (standard, robust, ridge, lasso) from 2 perspectives (deterministic and probabilistic). Also, generalized linear regression will be discussed.

Deterministic perspective

Intuitively, we could let our cost function to be $J(\theta)=\frac{1}{2}\sum_i^m (h_\theta(x^i)-y^i)^2$ , another name for it is residual sum of squares (RSS) or sum of squared errors (SSE). Clearly, J is a convex function.

Then, the (standard) linear regression is formulated as $\theta^* := \arg \min_\theta J(\theta)$ [How to solve it? 1. gradient descent algorithm; 2. Analytically set $\partial J/\partial \theta=0$ . We have a particular nice solution if $\bar{x} = [1,x], h_\theta(x)=\theta^T\bar{x} \Rightarrow \partial J/\partial \theta = \sum_i^m(\bar{x}_i^T-y_i)\bar{x}_i=X^TX\theta - X^Ty=0\Rightarrow \theta^* = (X^TX)^{-1}X^Ty$