1.线性回归
1.1 线性模型(Linear Model)
给定数据集 D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ( x 3 , y 3 ) , . . . , ( x m , y m ) } D=\left \{ \left (x_{1}\mathbf{},y_{1} \right ),\left ( x_{2},y_{2} \right ),\left (x_{3},y_{3} \right ),...,\left (x_{m},y_{m} \right ) \right \} D={ (x1,y1),(x2,y2),(x3,y3),...,(xm,ym)},其中 x i x_{i} xi有 d d d个特征,表示为 x i = ( x i 1 ; x i 1 ; . . . ; x i d ) x_{i}=\left ( x_{i1}; x_{i1};...;x_{id}\right ) xi=(xi1;xi1;...;xid),线性回归模型是通过 d d d个特征的线性组合对 y y y值进行拟合,即 f ( x i ) = w T x i + b f(x_{i})=w^{T}x_{i}+b f(xi)=wTxi+b,使得 f ( x i ) ≃ y i f(x_{i})\simeq y_{i} f(xi)≃yi
1.2 损失函数(Loss Function):Square Loss
-
定义: L ( f ( x ) , y ) = ( f ( x ) − y ) 2 L(f(x),y)=(f(x)-y)^2 L(f(x),y)=(f(x)−y)2
-
为什么使用square loss作为损失函数:
(1)记误差 ε = y i − y i ^ \varepsilon=y_{i}-\hat{y_{i}} ε=yi−yi^,假设误差独立同分布,根据中心极限定理, ε ∼ ( μ , σ 2 ) \varepsilon\sim(\mu,\sigma^2) ε∼(μ,σ2),得到 f ( ε ) = 1 σ 2 π e x p ( − ( ε − μ ) 2 2 σ 2 ) f(\varepsilon)=\frac{1}{\sigma\sqrt{2\pi}}exp(-\frac{(\varepsilon-\mu)^2}{2\sigma^2}) f(ε)=σ2π1exp(−2σ2(ε−μ)2),求 μ \mu μ和 σ 2 \sigma^2 σ2的极大似然估计,
L ( μ , σ 2 ) = ∏ i = 1 m 1 σ 2 π e x p ( − ( ε − μ ) 2 2 σ 2 ) L(\mu,\sigma^2)=\prod_{i=1}^{m}\frac{1}{\sigma\sqrt{2\pi}}exp(-\frac{(\varepsilon-\mu)^2}{2\sigma^2}) L(μ,σ2)=∏i=1mσ2π1exp(−2σ2(ε−μ)2),等式两边取对数,得到,
l o g L ( μ , σ 2 ) = − m 2 l o g 2 π − m 2 l o g σ 2 − ( ε − μ ) 2 2 σ 2 logL(\mu,\sigma^2)=-\frac{m}{2}log2\pi-\frac{m}{2}log\sigma^2-\frac{(\varepsilon-\mu)^2}{2\sigma^2} logL(μ,σ2)=−2mlog2π−2mlogσ2−2σ2(ε−μ)2,对 μ \mu μ和 σ 2 \sigma^2 σ2求偏导,得到
∂ L ∂ μ = 1 σ 2 ( ε − μ ) \frac{\partial{L}}{\partial{\mu}}=\frac{1}{\sigma^2}(\varepsilon-\mu) ∂μ∂L=σ21(ε−μ),
∂ L ∂ σ 2 = m 2 σ 2 + ( ε − μ ) 2 2 σ 4 \frac{\partial{L}}{\partial{\sigma^2}}=\frac{m}{2\sigma^2}+\frac{(\varepsilon-\mu)^2}{2\sigma^4} ∂σ2∂L