ML Note 1.1 - Regression

连续性随机变量的学习问题称为 regression. 回归问题中最常见的是 linear regression
y ∣ x ; θ = h ( x ) + ϵ y|x;\theta = h(x) + \epsilon yx;θ=h(x)+ϵ

其中 ϵ ∼ N ( 0 , σ 2 ) \epsilon \sim N(0, \sigma^2) ϵN(0,σ2) 称为 error term。应用 GLM 可以得出正态分布对应的1
h ( x ) = θ T x h(x) = \theta^Tx h(x)=θTx

再应用 MLE 可以得到 log likelihood
l ( θ ) = ∑ i = 1 m log ⁡ 1 2 π σ exp ⁡ ( − ( y ( i ) − h ( x ( i ) ) ) 2 2 σ 2 ) = − 1 2 σ 2 ∑ i = 1 m ( y ( i ) − h ( x ( i ) ) ) 2 + C = − 1 σ 2 ⋅ 1 2 ∑ i = 1 m ϵ i 2 + C \begin{array}{rcl} l(\theta) &=& \sum\limits_{i=1}^m\log\frac{1}{\sqrt{2\pi}\sigma}\exp\big(-\frac{(y^{(i)}-h(x^{(i)}))^2}{2\sigma^2}\big)\\ &=& -\frac{1}{2\sigma^2}\sum\limits_{i=1}^m(y^{(i)} - h(x^{(i)}))^2 + C\\ &=& -\frac{1}{\sigma^2}\cdot\frac{1}{2}\sum\limits_{i=1}^m\epsilon_i^2 + C \end{array} l(θ)===i=1mlog2π σ1exp(2σ2(y(i)h(x(i)))2)2σ21i=1m(y(i)h(x(i)))2+Cσ2121i=1mϵi2+C

定义 cost function
J ( θ ) = 1 2 ∑ i = 1 m ϵ i 2 J(\theta) = \frac{1}{2}\sum\limits_{i=1}^m\epsilon_i^2 J(θ)=21i=1mϵi2

则最小化代价函数和原问题是等价的。Normal equation2 给出了 closed-form 解
θ = ( X T X ) − 1 X T y ⃗ \theta = (X^TX)^{-1}X^T\vec{y} θ=(XTX)1XTy

如果 X T X X^TX XTX 不可逆,可能的原因有

  • 训练集中有重复变量
  • 样本数量小于参数个数

无论哪种原因,删除一部分变量总是有效的。如果可以确定所有参数都是不相关的,那么可以选择增加样本数量或者使用正则化后的正规方程
θ = ( X T X + λ [ 0 0 0 ⋯ 0 0 1 0 ⋯ 0 0 0 1 ⋯ 0 ⋮ ⋱ ⋮ 0 0 0 ⋯ 1 ] ( n + 1 ) × ( n + 1 ) ) − 1 X T y ⃗ \theta = \left(X^TX+ \lambda\left[\begin{array}{ccccc} 0 & 0 & 0 & \cdots & 0\\ 0 & 1 & 0 & \cdots & 0\\ 0 & 0 & 1 & \cdots & 0\\ & \vdots && \ddots & \vdots\\ 0 & 0 & 0 & \cdots & 1\\ \end{array}\right]_{(n+1)\times(n+1)} \right)^{-1}X^T\vec{y} θ=XTX+λ0000010000100001(n+1)×(n+1)1XTy

模型检验

为了评价模型的拟合优度,定义
SSE (the sum of squares due to error) = 2 J ( θ ) SSR (sum of squares of the regression) = ( m − 1 ) s h 2 SST (total sum of squares) = ( m − 1 ) s y 2 r 2 (coefficient of determination) = 1 − SSE / SST \begin{array}{rlcl} \text{SSE} & \text{(the sum of squares due to error)} &=& 2J(\theta)\\ \text{SSR} & \text{(sum of squares of the regression)} &=& (m - 1)s_h^2\\ \text{SST} & \text{(total sum of squares)} &=& (m - 1)s_y^2\\ r^2 & \text{(coefficient of determination)} &=& 1 - \text{SSE} / \text{SST} \end{array} SSESSRSSTr2(the sum of squares due to error)(sum of squares of the regression)(total sum of squares)(coefficient of determination)====2J(θ)(m1)sh2(m1)sy21SSE/SST

可以证明平方和分解公式
SST = SSE + SSR \text{SST} = \text{SSE} + \text{SSR} SST=SSE+SSR

通过观察发现 SSE 受到 θ 0 , θ 2 , … , θ n \theta_0, \theta_2, \dots, \theta_n θ0,θ2,,θn n + 1 n + 1 n+1 个条件制约3,因此其自由度
DFE (degrees-of-freedom in the error) = m − n − 1 \begin{array}{rlcl} \text{DFE} & \text{(degrees-of-freedom in the error)} &=& m - n - 1 \end{array} DFE(degrees-of-freedom in the error)=mn1

而 SSR 和 SST 的自由度分别为 n n n m − 1 m - 1 m14。由此定义
MSE (mean squared error) = SSE / DFE RMSE (root mean squared error) = MSE adj- r 2 (df adjusted  r 2 ) = 1 − MSE / s y 2 \begin{array}{rlcl} \text{MSE} & \text{(mean squared error)} &=& \text{SSE} / \text{DFE}\\ \text{RMSE} & \text{(root mean squared error)} &=& \sqrt{\text{MSE}}\\ \text{adj-}r^2 & \text{(df adjusted }r^2) &=& 1 - \text{MSE} / s_y^2 \end{array} MSERMSEadj-r2(mean squared error)(root mean squared error)(df adjusted r2)===SSE/DFEMSE 1MSE/sy2

可以检验 H 0 : θ ⃗ = 0 H_0 : \vec\theta = 0 H0:θ =0
F = SSR / n MSE ∼ F ( n , DFE ) F = \frac{\text{SSR} / n}{\text{MSE}} \sim F(n, \text{DFE}) F=MSESSR/nF(n,DFE)

如果检验显著,则应继续对每个参数进行 t 检验 H 0 : θ i = 0 H_0 : \theta_i = 0 H0:θi=0. 因为 MSE 为 σ 2 \sigma^2 σ2 的无偏估计,因此记
σ ^ 2 = MSE \hat{\sigma}^2 = \text{MSE} σ^2=MSE

由公式
C o v ( θ ) = σ ^ 2 ( X T X ) − 1 Cov(\theta) = \hat{\sigma}^2\left(X^TX\right)^{-1} Cov(θ)=σ^2(XTX)1

可以得到 σ j 2 = V a r ( θ i ) \sigma_j^2 = Var(\theta_i) σj2=Var(θi),易知
θ ^ j ∼ N ( θ j , σ j 2 ) \hat\theta_j \sim N(\theta_j, \sigma_j^2) θ^jN(θj,σj2)

构造枢轴量
θ ^ j − θ j σ j ∼ t ( DFE ) \frac{\hat\theta_j - \theta_j}{\sigma_j} \sim t(\text{DFE}) σjθ^jθjt(DFE)

可以得到单个自变量对目标变量的拟合程度。关于变量预测的置信区间可以查看多元线性回归的预测

一元线性回归

对于 X = [ x 1 ] X = \left[\begin{array}{cc}x&1\end{array}\right] X=[x1] 可以解得
θ = [ ∑ i = 1 m x i 2 ∑ i = 1 m x i ∑ i = 1 m x i m ] − 1 [ x T 1 ] y = 1 m ∑ i = 1 m x i 2 − ( ∑ i = 1 m x i ) 2 [ m − ∑ i = 1 m x i − ∑ i = 1 m x i ∑ i = 1 m x i 2 ] [ ∑ i = 1 m x i y i ∑ i = 1 m y i ] \begin{array}{rcl} \theta &=& \left[\begin{array}{cc} \sum\limits_{i=1}^m x_i^2 & \sum\limits_{i=1}^m x_i\\ \sum\limits_{i=1}^m x_i & m \end{array}\right]^{-1}\left[\begin{array}{c} x^T\\1 \end{array}\right]y\\ &=& \frac{1}{m\sum\limits_{i=1}^m x_i^2 - \left(\sum\limits_{i=1}^m x_i\right)^2}\left[\begin{array}{cc} m & -\sum\limits_{i=1}^m x_i\\ -\sum\limits_{i=1}^m x_i & \sum\limits_{i=1}^m x_i^2 \end{array}\right]\left[\begin{array}{c} \sum\limits_{i=1}^m x_iy_i\\\sum\limits_{i=1}^m y_i \end{array}\right] \end{array} θ==i=1mxi2i=1mxii=1mxim1[xT1]ymi=1mxi2(i=1mxi)21mi=1mxii=1mxii=1mxi2i=1mxiyii=1myi

引入记号
l x = ( n − 1 ) s x 2 l x y = ( n − 1 ) s x y \begin{array}{rcl} l_x &=& (n-1)s_x^2\\ l_{xy} &=& (n-1)s_{xy} \end{array} lxlxy==(n1)sx2(n1)sxy

将等式
∑ i = 1 m x i = m x ˉ ∑ i = 1 m y i = m y ˉ ∑ i = 1 m x i 2 = l x + m x ˉ 2 ∑ i = 1 m x i y i = l x y + m x ˉ y ˉ \begin{array}{rcl} \sum\limits_{i=1}^m x_i &=& m\bar x\\ \sum\limits_{i=1}^m y_i &=& m\bar y\\ \sum\limits_{i=1}^m x_i^2 &=& l_x + m\bar{x}^2\\ \sum\limits_{i=1}^m x_iy_i &=& l_{xy} + m\bar x\bar y \end{array} i=1mxii=1myii=1mxi2i=1mxiyi====mxˉmyˉlx+mxˉ2lxy+mxˉyˉ

带入 θ \theta θ 表达式
θ = 1 m l x [ m − m x ˉ − m x ˉ l x + m x ˉ 2 ] [ l x y + m x ˉ y ˉ m y ˉ ] = [ l x y / l x y ˉ − x ˉ θ 1 ] \begin{array}{rcl} \theta &=& \frac{1}{ml_x}\left[\begin{array}{cc} m & -m\bar x\\ -m\bar x & l_x + m\bar{x}^2 \end{array}\right]\left[\begin{array}{c} l_{xy} + m\bar x\bar y\\m\bar y \end{array}\right]\\ &=& \left[\begin{array}{c} l_{xy} / l_x\\ \bar y - \bar x\theta_1 \end{array}\right] \end{array} θ==mlx1[mmxˉmxˉlx+mxˉ2][lxy+mxˉyˉmyˉ][lxy/lxyˉxˉθ1]

可以证明
SSR = θ 1 2 l x r 2 = ρ x y 2 \begin{array}{rcl} \text{SSR} &=& \theta_1^2 l_x\\ r^2 &=& \rho_{xy}^2 \end{array} SSRr2==θ12lxρxy2

LWR

Locally weighted regression 是一种非参数估计方法。与线性回归不同,LWR 使用的代价函数引入了 weight w w w. 一般取
w ( i ) = exp ⁡ ( − 1 2 ( x ( i ) − x ) T Σ − 1 ( x ( i ) − x ) ) w^{(i)} = \exp\big(-\frac{1}{2}(x^{(i)} - x)^T\Sigma^{-1}(x^{(i)} - x)\big) w(i)=exp(21(x(i)x)TΣ1(x(i)x))

其中 Σ \Sigma Σ 称为 bandwidth. 修改后的代价函数为
J ( θ ) = ∑ i = 1 m w ( i ) ( y ( i ) − h ( x ( i ) ) ) 2 J(\theta) = \sum\limits_{i = 1}^m w^{(i)}(y^{(i)} - h(x^{(i)}))^2 J(θ)=i=1mw(i)(y(i)h(x(i)))2

相应的,模型检验中定义的各个统计量也需要考虑 w w w 的影响。

参考

多元线性回归中的 T 检验怎样理解?其 p 值为什么划定在 0.05?


  1. 指数分布族(Exponential Family)相关公式推导及在变分推断中的应用 ↩︎

  2. Define the design matrix
    X m × ( n + 1 ) = ( ( x ( 1 ) ) T ( x ( 2 ) ) T ⋯ ( x ( m ) ) T ) X_{m\times(n+1)} = \left(\begin{array}{c} (x^{(1)})^T\\ (x^{(2)})^T\\ \cdots\\ (x^{(m)})^T \end{array}\right) Xm×(n+1)=(x(1))T(x(2))T(x(m))T Let
    y ⃗ = ( y ( 1 ) y ( 2 ) ⋯ y ( m ) ) \vec{y} = \left(\begin{array}{c} y^{(1)}\\ y^{(2)}\\ \cdots\\ y^{(m)} \end{array}\right) y =y(1)y(2)y(m) Thus
    J ( θ ) = 1 2 ( X θ − y ⃗ ) T ( X θ − y ⃗ ) ∇ θ J = 1 2 ∇ θ ( θ T X T X θ − θ T X T y ⃗ − y ⃗ T X θ + y ⃗ T y ⃗ ) \begin{array}{rcl} J(\theta) &=& \frac{1}{2}(X\theta-\vec{y})^T(X\theta-\vec{y})\\\\ \nabla_\theta J &=& \frac{1}{2}\nabla_\theta(\theta^TX^TX\theta - \theta^TX^T\vec{y} - \vec{y}^TX\theta + \vec{y}^T\vec{y}) \end{array} J(θ)θJ==21(Xθy )T(Xθy )21θ(θTXTXθθTXTy y TXθ+y Ty ) since every term is simply a real number
    ∇ θ J = 1 2 ∇ θ t r ( θ T X T X θ − θ T X T y ⃗ − y ⃗ T X θ ) = 1 2 ( ∇ θ t r ( θ T X T X θ ) − 2 ∇ θ t r ( y ⃗ T X θ ) ) \begin{array}{rcl} \nabla_\theta J &=& \frac{1}{2}\nabla_\theta tr(\theta^TX^TX\theta - \theta^TX^T\vec{y} - \vec{y}^TX\theta)\\\\ &=& \frac{1}{2}(\nabla_\theta tr(\theta^TX^TX\theta) - 2\nabla_\theta tr(\vec{y}^TX\theta)) \end{array} θJ==21θtr(θTXTXθθTXTy y TXθ)21(θtr(θTXTXθ)2θtr(y TXθ)) By the properties of matrix derivative
    ∇ θ t r ( y ⃗ T X θ ) = ∇ ( θ T ) T t r ( θ T X T y ⃗ ) = ( ∇ θ T t r ( θ T X T y ⃗ ) ) T = X T y ⃗ ∇ θ t r ( θ T X T X θ ) = ∇ θ t r ( θ θ T X T X ) = ∇ θ t r ( θ I θ T X T X ) = 2 X T X θ \begin{array}{rcl} \nabla_\theta tr(\vec{y}^TX\theta) &=& \nabla_{(\theta^T)^T} tr(\theta^TX^T\vec{y})\\\\ &=& (\nabla_{\theta^T} tr(\theta^TX^T\vec{y}))^T\\\\ &=& X^T\vec{y}\\\\ \nabla_\theta tr(\theta^TX^TX\theta) &=& \nabla_\theta tr(\theta\theta^TX^TX)\\\\ &=& \nabla_\theta tr(\theta I\theta^TX^TX)\\\\ &=& 2X^TX\theta \end{array} θtr(y TXθ)θtr(θTXTXθ)======(θT)Ttr(θTXTy )(θTtr(θTXTy ))TXTy θtr(θθTXTX)θtr(θIθTXTX)2XTXθ Therefore
    ∇ θ J = X T X θ − X T y ⃗ \begin{array}{rcl} \nabla_\theta J &=& X^TX\theta - X^T\vec{y} \end{array} θJ=XTXθXTy Set ∇ θ J ≡ 0 \nabla_\theta J \equiv 0 θJ0
    X T X θ = X T y ⃗ X^TX\theta = X^T\vec{y} XTXθ=XTy over. ↩︎

  3. 统计| 自由度(degree of freedom) ↩︎

  4. 详解方差分析表(ANOVA)(二) —— SST、SSE、SSR和它们的自由度 ↩︎

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

LutingWang

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值