ML(1) Linear Regression

线性回归是机器学习中最基础的算法之一。本文从确定性和概率性两个角度探讨标准、鲁棒、岭回归和拉索回归,并介绍了广义线性回归。通过调整损失函数,可以解决过拟合问题,例如使用L1范数降低异常值影响,使用L2范数或L1范数控制参数权重,以避免过拟合。
摘要由CSDN通过智能技术生成

Introduction

Linear regression is perhaps the most fundamental algorithm in machine learning. In this setting, given a dataset D = { ( x i , y i ) ∣ x i ∈ R n , y i ∈ R } i = 1 m D=\{(x^i,y^i)|x^i\in \mathbb{R}^n, y^i\in\mathbb{R} \}_{i=1}^m D={ (xi,yi)xiRn,yiR}i=1m (x is feature, y is label) we fit a model of the form h θ ( x ) = θ T ϕ ( x ) h_\theta(x) = \theta^T\phi(x) hθ(x)=θTϕ(x), where θ \theta θ is the parameter vector, ϕ ( x ) \phi(x) ϕ(x) is a transformed vector (for example, ϕ ( x ) = [ 1 , x 1 , x 2 , . . . , x 1 x 2 , . . . , x n x n − 1 ] \phi(x) = [1,x_1,x_2,...,x_1x_2,...,x_nx_{n-1}] ϕ(x)=[1,x1,x2,...,x1x2,...,xnxn1]). That is, the model is linear IN TERMS OF parameters instead of input vector x x x, as feature transformation is allowed.

Our goal is to fit the model h θ ( x ) = θ T ϕ ( x ) h_\theta(x) = \theta^T\phi(x) hθ(x)=θTϕ(x) as good as possible. That is, after tuning our parameters, given an unseen x ∗ x^* x, we should be able to make h θ ( x ∗ ) → y ∗ h_\theta(x^*)\to y^* hθ(x)y. In a nutshell, find the BEST θ \theta θ.

Sometimes, our model might fit the training dataset well, yet failed to generalize to unseen data. This introduces the problem of OVERFITTING. To address this, we could use robust linear regression, ridge regression, lasso regression.

In what follows, I will derive the various linear regression (standard, robust, ridge, lasso) from 2 perspectives (deterministic and probabilistic). Also, generalized linear regression will be discussed.

Deterministic perspective

Intuitively, we could let our cost function to be J ( θ ) = 1 2 ∑ i m ( h θ ( x i ) − y i ) 2 J(\theta)=\frac{1}{2}\sum_i^m (h_\theta(x^i)-y^i)^2 J(θ)=21im(hθ(xi)yi)2, another name for it is residual sum of squares (RSS) or sum of squared errors (SSE). Clearly, J is a convex function.

Then, the (standard) linear regression is formulated as θ ∗ : = arg ⁡ min ⁡ θ J ( θ ) \theta^* := \arg \min_\theta J(\theta) θ:=argminθJ(θ) [How to solve it? 1. gradient descent algorithm; 2. Analytically set ∂ J / ∂ θ = 0 \partial J/\partial \theta=0 J/θ=0. We have a particular nice solution if x ˉ = [ 1 , x ] , h θ ( x ) = θ T x ˉ ⇒ ∂ J / ∂ θ = ∑ i m ( x ˉ i T − y i ) x ˉ i = X T X θ − X T y = 0 ⇒ θ ∗ = ( X T X ) − 1 X T y \bar{x} = [1,x], h_\theta(x)=\theta^T\bar{x} \Rightarrow \partial J/\partial \theta = \sum_i^m(\bar{x}_i^T-y_i)\bar{x}_i=X^TX\theta - X^Ty=0\Rightarrow \theta^* = (X^TX)^{-1}X^Ty xˉ=[1,x],

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值