概述
1、回归与分类的区别:
- 回归:在一个区间中求解具体值
- 分类:得到一个分类值
2、线性回归问题:寻找一条最适合的线最好地拟合数据
3、整合成矩阵形式:高效
h
θ
(
x
)
=
θ
T
X
\ {h_\theta}(x) = {\theta ^T}X
hθ(x)=θTX
- 将X扩增一列全1向量,与 θ 0 {\theta_0} θ0相乘得到偏置量
推导目标函数
1、误差( ε \varepsilon ε):真实值与预测值的差异
- 每个样本 x i x_i xi的误差 ε ( i ) \varepsilon^{(i)} ε(i)是独立同分布,且服从均值为0,方差为 θ 2 \theta^2 θ2的高斯分布
2、似然函数的推导
- 预测值: y ( i ) = θ T x ( i ) + ε ( i ) \ y^{(i)}={\theta^T}{x^{(i)}}+\varepsilon^{(i)} y(i)=θTx(i)+ε(i)(1)
- ε ( i ) \varepsilon^{(i)} ε(i)的概率分布: p ( ε ( i ) ) = 1 2 π σ e x p [ − ( ε ( i ) ) 2 2 σ 2 ] \ p(\varepsilon^{(i)})=\frac{1}{{\sqrt {2\pi } \sigma }}exp[-\frac{(\varepsilon^{(i)})^2}{2\sigma^2}] p(ε(i))=2πσ1exp[−2σ2(ε(i))2](2)
- 将(2)代入(1),得:
p ( y ( i ) ∣ x ( i ) ; θ ) = 1 2 π σ e x p [ − ( y ( i ) − θ T x ( i ) ) 2 2 σ 2 ] \ p(y^{(i)}|x^{(i)};\theta)=\frac{1}{{\sqrt {2\pi } \sigma }}exp[-\frac{(y^{(i)}-{\theta^T}{x^{(i)}})^2}{2\sigma^2}] p(y(i)∣x(i);θ)=2πσ1exp[−2σ2(y(i)−θTx(i))2] - 似然函数:使预测值恰好为真实值的概率尽可能大的参数估计
L ( θ ) = ∏ i = 1 m p ( y ( i ) ∣ x ( i ) ; θ ) \ L(\theta)=\prod\limits_{i = 1}^m p(y^{(i)}|x^{(i)};\theta) L(θ)=i=1∏mp(y(i)∣x(i);θ) - 对数似然函数:将乘法转换为加法,简化计算
l o g L ( θ ) = m l o g ( 1 2 π σ ) − 1 σ 2 ⋅ 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 \ logL(\theta)=mlog(\frac{1}{{\sqrt {2\pi } \sigma }})-\frac{1}{{{\sigma ^2}}} \cdot \frac{1}{2}\sum\limits_{i = 1}^m {(y^{(i)}-{\theta^T}{x^{(i)}})^2} logL(θ)=mlog(2πσ1)−σ21⋅21i=1∑m(y(i)−θTx(i))2- 目标:使 L ( θ ) \ L(\theta) L(θ)最大,即 l o g L ( θ ) \ logL(\theta) logL(θ)最大
- m l o g ( 1 2 π σ ) \ mlog(\frac{1}{{\sqrt {2\pi } \sigma }}) mlog(2πσ1)是大于0的常数, 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 \ \frac{1}{2}\sum\limits_{i = 1}^m {(y^{(i)}-{\theta^T}{x^{(i)}})^2} 21i=1∑m(y(i)−θTx(i))2也大于0。因此 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 \ \frac{1}{2}\sum\limits_{i = 1}^m {(y^{(i)}-{\theta^T}{x^{(i)}})^2} 21i=1∑m(y(i)−θTx(i))2越小,则 l o g L ( θ ) \ logL(\theta) logL(θ)越大
- 目标函数:
J ( θ ) = 1 2 ∑ i = 1 m ( y ( i ) − θ T x ( i ) ) 2 \ J(\theta)=\frac{1}{2}\sum\limits_{i = 1}^m {(y^{(i)}-{\theta^T}{x^{(i)}})^2} J(θ)=21i=1∑m(y(i)−θTx(i))2- 最小二乘形式
- 目标:预测值是真实值的可能性越大, L ( θ ) \ L(\theta) L(θ)越大, J ( θ ) J(\theta) J(θ)越小
求解参数值
1、对目标函数
J
(
θ
)
=
1
2
(
X
θ
−
y
)
T
(
X
θ
−
y
)
J(\theta)=\frac{1}{2}(X\theta-y)^T(X\theta-y)
J(θ)=21(Xθ−y)T(Xθ−y)求偏导:
∇
θ
J
(
θ
)
=
∇
θ
[
1
2
(
X
θ
−
y
)
T
(
X
θ
−
y
)
]
=
X
T
X
θ
−
X
T
y
\ {\nabla _\theta }J(\theta)={\nabla _\theta }[\frac{1}{2}(X\theta-y)^T(X\theta-y)]=X^TX\theta-X^Ty
∇θJ(θ)=∇θ[21(Xθ−y)T(Xθ−y)]=XTXθ−XTy
2、求
J
(
θ
)
J(\theta)
J(θ)的极小值:
- 令 ∇ θ J ( θ ) = 0 \ {\nabla _\theta }J(\theta)=0 ∇θJ(θ)=0,求解 θ \theta θ
- θ = ( X T X ) − 1 X T y \theta=(X^TX)^{-1}X^Ty θ=(XTX)−1XTy
3、很多情况下, θ \theta θ无法直接求解,线性回归是特例
评估方法
最常用的评估项:
R
2
=
1
−
∑
i
=
1
m
(
y
i
∧
−
y
i
)
2
∑
i
=
1
m
(
y
i
−
y
i
‾
)
2
\ R^2=1-\frac{\sum\limits_{i = 1}^m {(\mathop {{y_i}}\limits^ \wedge - {y_i})^2 }}{\sum\limits_{i = 1}^m {({y_i} - \overline {{y_i}} )^2 }}
R2=1−i=1∑m(yi−yi)2i=1∑m(yi∧−yi)2
- ∑ i = 1 m ( y i ∧ − y i ) 2 \sum\limits_{i = 1}^m {(\mathop {{y_i}}\limits^ \wedge - {y_i})^2 } i=1∑m(yi∧−yi)2:残差平方和
- ∑ i = 1 m ( y i − y i ‾ ) 2 \sum\limits_{i = 1}^m {({y_i} - \overline {{y_i}} )^2 } i=1∑m(yi−yi)2:类似方差项
- R 2 R^2 R2越接近1,模型拟合得越好