线性回归（理论篇）

最新推荐文章于 2023-05-15 17:52:14 发布

初沏的茶

最新推荐文章于 2023-05-15 17:52:14 发布

阅读量414

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/ChuQiDeCha/article/details/80427613

版权

机器学习专栏收录该内容

12 篇文章 0 订阅

订阅专栏

线性回归（理论篇）

线性模型

线性模型（Linear Model）是机器学习中应用最广泛的模型，指通过样本特征的线性组合来进行预测的模型。给定一个n维样本 $\textbf{x} =[x_{1},x_{2},···,x_{n}]^{T}$ ，其线性组合函数为:

h θ (x) = θ 1 x 1 + θ 2 x 2 + \cdot \cdot \cdot + θ n x n + b = (θ; b) T (x; 1)

$h_{\theta}(\textbf{x})=\theta_{1}x_{1} +\theta_{2}x_{2}+···+\theta_{n}x_{n} + b = (\theta;b)^{T}(\textbf{x};1)$

线性回归

给定数据集 $D={(\textbf{x}_{1},y_{1}),(\textbf{x}_{2},y_{2}),···,(\textbf{x}_{N},y_{N})}$ ，其中 $\textbf{x}$ 是一个m维向量， $y_{i}\in\Re$ 。线性回归(linear regression)试图用一个线性模型以尽可能准确地预测实值输出值。

极大似然估计

用 $\hat{y_{i}}$ 表示第i样本的预测值，则估计误差：

ε i = y i^- y i

$\varepsilon_{i}=\hat{y_{i}}-y_{i}$
根据中心极限定理，误差

εi ε i $\varepsilon_{i}$ 是独立同分布的，且符合均值为0方差为

σ2 σ 2 $\sigma^2$ 的高斯分布。则：

p (ε i) = 1 2 π - - \sqrt σ e x p (- ε 2 i 2 σ 2)

$p(\varepsilon_{i}) = \frac{1}{\sqrt{2\pi}\sigma}exp({-\frac{\varepsilon_{i}^{2}}{2\sigma^2}})$

p (y i | x i, θ) = 1 2 π - - \sqrt σ e x p (- ( y i - θ T x i ) 2 2 σ 2)

$p(y_{i}|\textbf{x}_{i},\theta)=\frac{1}{\sqrt{2\pi}\sigma}exp({-\frac{(y_{i}-\theta^{T}\textbf{x}_{i})^2}{2\sigma^2}})$

采用极大似然估计时，似然函数为：

L (y 1, y 2, \cdot \cdot \cdot, y N | x 1, x 2, \cdot \cdot \cdot, x N, θ) = \prod N p (y i | x i, θ) = \prod N 1 2 π - - \sqrt σ e x p (- ( y i - θ T x i ) 2 2 σ 2)

$L(y_{1},y_{2},···,y_{N}|\textbf{x}_{1},\textbf{x}_{2},···,\textbf{x}_{N},\theta) = \prod^{N}p(y_{i}|\textbf{x}_{i},\theta)\\=\prod^{N}\frac{1}{\sqrt{2\pi}\sigma}exp({-\frac{(y_{i}-\theta^{T}\textbf{x}_{i})^2}{2\sigma^2}})$

对数似然函数为：

l (y 1, y 2, \cdot \cdot \cdot, y N | x 1, x 2, \cdot \cdot \cdot, x N, θ) = l o g L (y 1, y 2, \cdot \cdot \cdot, y N | x 1, x 2, \cdot \cdot \cdot, x N, θ) = \sum N l o g (1 2 π - - \sqrt σ e x p (- ( y i - θ T x i ) 2 2 σ 2)) = N l o g 1 2 π - - \sqrt σ - 1 σ 2 1 2 \sum N (y i - θ T x i) 2

$l(y_{1},y_{2},···,y_{N}|\textbf{x}_{1},\textbf{x}_{2},···,\textbf{x}_{N},\theta) = logL(y_{1},y_{2},···,y_{N}|\textbf{x}_{1},\textbf{x}_{2},···,\textbf{x}_{N},\theta)\\ =\sum^{N}log(\frac{1}{\sqrt{2\pi}\sigma}exp({-\frac{(y_{i}-\theta^{T}\textbf{x}_{i})^2}{2\sigma^2}}))\\ =Nlog\frac{1}{\sqrt{2\pi}\sigma} - \frac{1}{\sigma^2}\frac{1}{2}\sum^{N}(y_{i}-\theta^{T}\textbf{x}_{i})^2$
令

J (θ) = 1 2 \sum N (θ T x i - y i) 2

$J(\theta)=\frac{1}{2}\sum^{N}(\theta^{T}\textbf{x}_{i}-y_{i})^2$
则求

l(y1,y2,⋅⋅⋅,yN|x1,x2,⋅⋅⋅,xN,θ) l ( y 1 , y 2 , · · · , y N | x 1 , x 2 , · · · , x N , θ ) $l(y_{1},y_{2},···,y_{N}|\textbf{x}_{1},\textbf{x}_{2},···,\textbf{x}_{N},\theta)$ 最大即求：

J(θ) J ( θ ) $J(\theta)$ 最小。

J(θ) J ( θ ) $J(\theta)$ 称为线性回归的目标函数。

参数解析解

对 $J(\theta)$ 求导得：

\nabla J ( θ j ) θ j = \sum N (x 2 i j θ j - x i j y i) = (x 1 j x 2 j ⋮ x N j) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ x 1 j x 2 j ⋮ x N j ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ θ j - (x 1 j x 2 j \cdot \cdot \cdot x N j) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ y 1 y 2 ⋮ y N ⎞ ⎠ ⎟ ⎟ ⎟ ⎟

$\frac{\nabla J(\theta_{j})}{\theta_{j}}=\sum^N(x_{ij}^2\theta_{j}-x_{ij}y_{i}) \\ =\begin{pmatrix} x_{1j} & x_{2j} & \vdots & x_{Nj} \end{pmatrix} \begin{pmatrix} x_{1j} \\ x_{2j} \\ \vdots \\ x_{Nj} \end{pmatrix} \theta_{j} - \begin{pmatrix} x_{1j} & x_{2j} & ··· & x_{Nj} \end{pmatrix} \begin{pmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{N} \end{pmatrix}$
其中

j∈1,2,⋅⋅⋅,M j ∈ 1 , 2 , · · · , M $j\in{1,2,···,M}$ ，

xij x i j $x_{ij}$ 表示第i个样本的第j维。令倒数等于0,并写出矩阵形式得：

⎛ ⎝ ⎜ ⎜ ⎜ ⎜ x 11 x 12 ⋮ x 1 N x 21 x 22 ⋮ x 2 N \cdot \cdot \cdot \cdot \cdot \cdot ⋮ \cdot \cdot \cdot x N 1 x N 2 ⋮ x N N ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ x 11 x 21 ⋮ x N 1 x 12 x 22 ⋮ x N 2 \cdot \cdot \cdot \cdot \cdot \cdot ⋮ \cdot \cdot \cdot x 1 N x 2 N ⋮ x N N ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ θ 1 θ 2 ⋮ θ N ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ - ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ x 11 x 12 ⋮ x 1 N x 21 x 22 ⋮ x 2 N \cdot \cdot \cdot \cdot \cdot \cdot ⋮ \cdot \cdot \cdot x N 1 x N 2 ⋮ x N N ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎛ ⎝ ⎜ ⎜ ⎜ y 1 y 2 \cdot \cdot \cdot y N ⎞ ⎠ ⎟ ⎟ ⎟ = 0

$\begin{pmatrix} x_{11} & x_{21} & ··· & x_{N1} \\ x_{12} & x_{22} & ··· & x_{N2} \\ \vdots & \vdots & \vdots& \vdots \\ x_{1N} & x_{2N} & ··· & x_{NN} \end{pmatrix} \begin{pmatrix} x_{11} & x_{12} & ··· & x_{1N} \\ x_{21} & x_{22} & ··· & x_{2N} \\ \vdots & \vdots & \vdots& \vdots \\ x_{N1} & x_{N2} & ··· & x_{NN} \end{pmatrix} \begin{pmatrix} \theta{1} \\ \theta_{2} \\ \vdots \\ \theta_{N} \end{pmatrix} - \begin{pmatrix} x_{11} & x_{21} & ··· & x_{N1} \\ x_{12} & x_{22} & ··· & x_{N2} \\ \vdots & \vdots & \vdots& \vdots \\ x_{1N} & x_{2N} & ··· & x_{NN} \end{pmatrix} \begin{pmatrix} y_{1} \\ y_{2} \\ ··· \\ y_{N} \end{pmatrix} = \textbf{0}$
即：

X T X θ - X T Y = 0

$X^TX\theta-X^TY=0$
使用最小二乘法得到解析解：

θ = (X T X) - 1 X T Y

$\theta =(X^TX)^{-1}X^TY$
为了防止过拟合或者

XTX X T X $X^TX$ 不可逆，增加

λ λ $\lambda$ 扰动：

θ = (X T X + λ I) - 1 X T Y

$\theta =(X^TX+\lambda I)^{-1}X^TY$

线性回归的复杂度惩罚因子

增加L1正则的目标函数为（lasso）：
$J (θ) = 1 2 \sum N (θ T x i - y i) 2 + λ \sum M | θ j |$ $J(\theta)=\frac{1}{2}\sum^{N}(\theta^{T}\textbf{x}_{i}-y_{i})^2 + \lambda\sum^M|\theta_{j}|$
通常L1正则求解出的参数是稀疏的。
增加L2正则的目标函数为(rige)：
$J (θ) = 1 2 \sum N (θ T x i - y i) 2 + λ \sum M θ 2 j$ $J(\theta)=\frac{1}{2}\sum^{N}(\theta^{T}\textbf{x}_{i}-y_{i})^2 + \lambda\sum^M\theta_{j}^2$
L1与L2正则混合的目标函数为(ElasticNet):
$J (θ) = 1 2 \sum N (θ T x i - y i) 2 + ρ \sum M | θ j | + (1 - ρ) \sum M θ 2 j$ $J(\theta)=\frac{1}{2}\sum^{N}(\theta^{T}\textbf{x}_{i}-y_{i})^2 + \rho\sum^M|\theta_{j}| + (1-\rho)\sum^M\theta_{j}^2$

初沏的茶

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
线性回归（理论篇）

线性回归（理论篇）线性模型线性模型（Linear Model）是机器学习中应用最广泛的模型，指通过样本特征的线性组合来进行预测的模型。给定一个n维样本x=[x1,x2,⋅⋅⋅,xn]Tx=[x1,x2,···,xn]T\textbf{x} =[x_{1},x_{2},···,x_{n}]^{T}，其线性组合函数为: hθ(x)=θ1x1+θ2x2+⋅⋅⋅+θnxn+b=(θ;b)T(x;...
复制链接

扫一扫