线性回归公式推导

推导损失函数

根据初始化式子:
h θ ( x ) = θ 0 x 0 + θ 1 x 1 + θ 2 x 2 + . . . h_{\theta }\left ( x \right )= \theta _{0}x_{0}+\theta _{1}x_{1}+\theta _{2}x_{2}+... hθ(x)=θ0x0+θ1x1+θ2x2+...
简化后可以得到:
h θ ( x ) = ∑ i = 0 n θ i x i = ( θ 0 θ 1 ⋮ θ n ) ∗ ( x 0 x 1 ⋯ x n ) = Θ T x h_{\theta}\left(x\right)=\sum_{i=0}^{n}\theta_{i}x_{i}=\begin{pmatrix}\theta_{0}\\\theta_{1}\\\vdots\\\theta_{n}\\\end{pmatrix}*\begin{pmatrix}x_{0}&x_{1}&\cdots&x_{n}\end{pmatrix}=\Theta^{T}x hθ(x)=i=0nθixi=θ0θ1θn(x0x1xn)=ΘTx
又知存在独立同分布的误差项: ε ( i ) \varepsilon ^{(i)} ε(i)
得出下列式子: y ( i ) = Θ T x ( i ) + ε ( i ) y ^{(i)}=\Theta^{T}x ^{(i)}+ \varepsilon ^{(i)} y(i)=ΘTx(i)+ε(i)
误差项符合数学期望为0、方差为 σ 2 \sigma^{2} σ2正太分布(忘记的可以回头稍微了解一下高斯分布的公式):
P ( ε i ) = 1 2 π σ e x p ( − ( ε i ) 2 2 σ 2 ) P\left(\varepsilon_{i}\right)=\frac{1}{\sqrt{2\pi}\sigma}exp\left(-\frac{\left(\varepsilon_{i}\right)^{2}}{2\sigma^{2}}\right) P(εi)=2π σ1exp(2σ2(εi)2)
ε ( i ) \varepsilon ^{(i)} ε(i)带入上边式子:
P ( y i ∣ x i ; θ ) = 1 2 π σ e x p ( − ( y i − Θ T x i ) 2 2 σ 2 ) P\left(y^{i}|x^{i};\theta\right)=\frac{1}{\sqrt{2\pi}\sigma}exp\left(-\frac{\left(y^{i}-\Theta^{T}x^{i}\right)^{2}}{2\sigma^{2}}\right) P(yixi;θ)=2π σ1exp(2σ2(yiΘTxi)2)
在这里引入似然函数的概念:
L ( θ ) = ∏ i = 1 m p ( y i ∣ x i ; θ ) L\left ( \theta \right )=\prod_{i=1}^{m}p\left ( y^{i}|x^{i};\theta \right ) L(θ)=i=1mp(yixi;θ)
需要求出似然函数的最大值,开始化简似然函数,两边同时取对数,化简函数:
l ( θ ) = l n L ( θ ) = l n ∏ i = 1 m p ( y i ∣ x i ; θ ) = l n ∏ i = 1 m ( 1 2 π σ e x p ( − ( y i − Θ T x i ) 2 2 σ 2 ) ) l\left(\theta\right)=lnL\left(\theta\right)=ln\prod_{i=1}^{m}p\left(y^{i}|x^{i};\theta\right)=ln\prod_{i=1}^{m}\left(\frac{1}{\sqrt{2\pi\sigma}}exp\left(-\frac{\left(y^{i}-\Theta^{T}x^{i}\right)^{2}}{2\sigma^{2}}\right)\right) l(θ)=lnL(θ)=lni=1mp(yixi;θ)=lni=1m(2πσ 1exp(2σ2(yiΘTxi)2))
= ∑ i = 1 m l n ( 1 2 π σ e x p ( − ( y i − Θ T x i ) 2 2 σ 2 ) ) =\sum_{i=1}^{m}ln\left(\frac{1}{\sqrt{2\pi}\sigma}exp\left(-\frac{\left(y^{i}-\Theta^{T}x^{i}\right)^{2}}{2\sigma^{2}}\right)\right) =i=1mln(2π σ1exp(2σ2(yiΘTxi)2))
= ∑ i = 1 m ( l n 1 2 π σ + l n e x p ( − ( y i − Θ T x i ) 2 2 σ 2 ) ) =\sum_{i=1}^{m}\left(ln\frac{1}{\sqrt{2\pi}\sigma}+lnexp\left(-\frac{\left(y^{i}-\Theta^{T}x^{i}\right)^{2}}{2\sigma^{2}}\right)\right) =i=1m(ln2π σ1+lnexp(2σ2(yiΘTxi)2))
= m l n 1 2 π σ + ( − ∑ i = 1 m ( ( y i − Θ T x i ) 2 2 σ 2 ) ) =mln\frac{1}{\sqrt{2\pi}\sigma}+\left(-\sum_{i=1}^{m}\left(\frac{\left(y^{i}-\Theta^{T}x^{i}\right)^{2}}{2\sigma^{2}}\right)\right) =mln2π σ1+(i=1m(2σ2(yiΘTxi)2))
= m l n 1 2 π σ − 1 σ 2 ⋅ 1 2 ∑ i = 1 m ( y i − Θ T x i ) 2 =mln\frac{1}{\sqrt{2\pi}\sigma}-\frac{1}{\sigma^{2}}\cdot\frac{1}{2}\sum_{i=1}^{m}\left(y^{i}-\Theta^{T}x^{i}\right)^{2} =mln2π σ1σ2121i=1m(yiΘTxi)2
这是以上化简用到的简单对数公式
这里需要求出似然函数的最大值,引入损失函数 J ( θ ) J\left ( \theta \right ) J(θ),当损失函数最小时候,似然函数取最大值:
J ( θ ) = 1 2 ∑ i = 1 m ( y i − Θ T x i ) 2 J\left ( \theta \right )=\frac{1}{2}\sum_{i=1}^{m}\left(y^{i}-\Theta^{T}x^{i}\right)^{2} J(θ)=21i=1m(yiΘTxi)2

求解损失函数最优解

  1. 矩阵法求解
    第一步展开函数,第二步求偏导
    1. J ( θ ) = 1 2 ( X Θ − Y ) T ⋅ ( X Θ − Y ) J\left ( \theta \right )= \frac{1}{2}\left ( X\Theta -Y \right )^{T}\cdot \left ( X\Theta -Y \right ) J(θ)=21(XΘY)T(XΘY)
    = 1 2 ( Θ T X T − Y T ) ⋅ ( X Θ − Y ) =\frac{1}{2}\left ( \Theta ^{T}X^{T}-Y^{T} \right )\cdot \left ( X\Theta -Y \right ) =21(ΘTXTYT)(XΘY)
    = 1 2 ( Θ T X T X Θ − Θ T X T Y − Y T X Θ + Y T Y ) =\frac{1}{2}\left ( \Theta ^{T}X^{T} X\Theta - \Theta ^{T}X^{T}Y - Y^{T}X\Theta+Y^{T}Y \right ) =21(ΘTXTXΘΘTXTYYTXΘ+YTY)
    2. ∂ J ( θ ) ∂ θ = 1 2 ( 2 X T X Θ − X T Y − ( Y T X ) T + 0 ) = ( X T X Θ − X T Y ) = 0 \frac{\partial J\left ( \theta \right )}{\partial \theta }=\frac{1}{2}\left( 2X^{T} X \Theta - X^{T} Y-\left ( Y^{T}X \right )^{T}+0\right )=\left ( X^{T} X \Theta - X^{T} Y\right )=0 θJ(θ)=21(2XTXΘXTY(YTX)T+0)=(XTXΘXTY)=0
    Θ = ( X T X ) − 1 X T Y \Theta =\left ( X^{T} X\right )^{-1}X^{T}Y Θ=(XTX)1XTY
    可以求出 Θ \Theta Θ,但是这里有个前提条件是需要可逆的

  2. 梯度下降法求最优解
    梯度下降算法的基本公式: θ j = θ j − α ∂ J ( θ ) ∂ θ j \theta _{j} = \theta _{j} - \alpha \frac{\partial J\left ( \theta \right )}{\partial \theta j} θj=θjαθjJ(θ)

    根据步长、学习率 α \alpha α沿着梯度、偏导数 ∂ J ( θ ) ∂ θ j \frac{\partial J\left ( \theta \right )}{\partial \theta j} θjJ(θ)的反方向动态更新 θ j \theta _{j} θj,以求得到最优解

∂ J ( θ ) ∂ θ j = 1 2 m ∑ i = 1 m ( 2 ( y i − h θ ( x i ) ) ∂ ( y i − h θ ( x i ) ) ∂ θ j ) \frac{\partial J\left ( \theta \right )}{\partial \theta j} = \frac{1}{2m}\sum_{i=1}^{m}\left ( 2\left ( y^{i} -h_{\theta }\left ( x^{i} \right )\right ) \frac{\partial \left ( y^{i} -h_{\theta }\left ( x^{i} \right )\right ) }{\partial\theta_{j}}\right ) θjJ(θ)=2m1i=1m(2(yihθ(xi))θj(yihθ(xi)))
= − 1 m ∑ i = 1 m ( ( y i − h θ ( x i ) ) x j i ) =-\frac{1}{m}\sum_{i=1}^{m}\left (\left ( y^{i} -h_{\theta }\left ( x^{i} \right ) \right )x_{j}^{i}\right ) =m1i=1m((yihθ(xi))xji)
带入基本公式可得到批量梯度下降式子:
θ j = θ j − α m ∑ i = 1 m ( ( h θ ( x i ) − y i ) x j i ) \theta _{j} = \theta _{j} - \frac{\alpha}{m}\sum_{i=1}^{m}\left (\left ( h_{\theta }\left ( x^{i} \right )-y^{i} \right )x_{j}^{i}\right ) θj=θjmαi=1m((hθ(xi)yi)xji)
通过不断更新参数,最终找出 θ \theta θ的最优解

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值