机器学习笔记03-求导规则与梯度下降算法推导

导数简介

关于导数的定义,网上已经有很多了,本文主要说明一些常见函数的导数以及一些推导过程。之前看过一篇知乎,对如何理解导数讲的挺全面的,有兴趣的朋友可以看看这篇文章—>如何理解导数的概念
关于导数最简单也最通俗易懂的一个定义就是:曲线的切线的斜率。什么意思呢?
(1)首先理解切线,曲线上两点的连线确定一条割线,当这两个点足够靠近时,割线就变成了切线。
在这里插入图片描述
(2)当B点慢慢靠近A点时,割线的斜率一直在发生变化。当B点足够靠近A点时(“极限的思想”),斜率也越来越接近一个值,这个值就叫作导数
(3)我们一直在说导数是该点切线的斜率,可实际上,我们并不能画不出切线,然后去求取该切线的斜率直接得到导数。为什么呢?因为过一个点我们可以画出很多条线,哪条才是真正的切线呢,我们不得而知。或者说我们确定不了与要求取的点足够靠近的点在哪里,所以我们无法直接做出切线来。
(4)所以说,切线是算出来的,不是画出来的。
在这里插入图片描述

导数的基本公式

数学定义: f ′ ( x ) = lim ⁡ h → 0 f ( x + h ) − f ( x ) h f^{\prime}(x)=\lim _{h \rightarrow 0} \frac{f(x+h)-f(x)}{h} f(x)=h0limhf(x+h)f(x)
以下为常见的一些函数的导数,其中 c 和 a c和a ca表示常数,
( c ) ′ = 0 (c)^{\prime}=0 (c)=0 ( c x ) ′ = c (c x)^{\prime}=c (cx)=c ( x a ) ′ = a x a − 1 \left(x^{a}\right)^{\prime}=a x^{a-1} (xa)=axa1 ( c v ) ′ = − c v ′ v 2 \left(\frac{c}{v}\right)^{\prime}=-\frac{c v^{\prime}}{v^{2}} (vc)=v2cv ( log ⁡ a x ) ′ = 1 x ln ⁡ a \left(\log _{a} x\right)^{\prime}= \frac{1}{x \ln a} (logax)=xlna1 ( a x ) ′ = a x ln ⁡ a \left(a^{x}\right)^{\prime}=a^{x} \ln a (ax)=axlna

导数的四则运算

函数 u ( x ) , v ( x ) u(x),v(x) u(x)v(x)在点 x x x处可导,则:
[ u ( x ) ± v ( x ) ] ′ = u ( x ) ′ ± v ( x ) ′ [u(x) \pm v(x)]^{\prime}=u(x)^{\prime} \pm v(x)^{\prime} [u(x)±v(x)]=u(x)±v(x) [ c u ( x ) ] ′ = c u ′ ( x ) , c 为 常 数 [c u(x)]^{\prime}=c u^{\prime}(x),c为常数 [cu(x)]=cu(x),c [ u ( x ) ⋅ v ( x ) ] ′ = u ′ ( x ) ⋅ v ( x ) + u ( x ) ⋅ v ′ ( x ) \left.\left[u(x\right) \cdot v(x)\right]^{\prime}=u^{\prime}(x) \cdot v(x)+u(x) \cdot v^{\prime}(x) [u(x)v(x)]=u(x)v(x)+u(x)v(x) [ u ( x ) v ( x ) ] ′ = u ′ ( x ) ⋅ v ( x ) − u ( x ) ⋅ v ′ ( x ) [ v ( x ) ] 2 \left[\frac{u(x)}{v(x)}\right]^{\prime}=\frac{u^{\prime}(x) \cdot v(x)-u(x) \cdot v^{\prime}(x)}{[v(x)]^{2}} [v(x)u(x)]=[v(x)]2u(x)v(x)u(x)v(x)
当时老师讲的时候好像是这么说的(方便记忆):
针对第三个公式乘法:前导后不导加上后导前不导
针对第四个除法公式:上导下不导减去下导上不导,除以下不导的平方。

针对梯度下降算法推导

θ j : = θ j − α ∂ ∂ θ j J ( θ 0 , θ 1 ) , f o r ( j = 1 a n d j = 0 ) \theta_{j}:=\theta_{j}-\alpha \frac{\partial}{\partial \theta_{j}} J_{\left(\theta_{0}, \theta_{1}\right)},for(j=1 and j=0) θj:=θjαθjJ(θ0,θ1),for(j=1andj=0)其中 ∂ ∂ θ j J ( θ 0 , θ 1 ) \frac{\partial}{\partial \theta_{j}} J_{\left(\theta_{0}, \theta_{1}\right)} θjJ(θ0,θ1)其实就是计算的梯度, : = := :=表示赋值, α \alpha α表示学习率(步长)。注意 θ 0 \theta_{0} θ0 θ 1 \theta_{1} θ1应该同时更新。
(1)公式代换:公式中使用的点乘( ⋅ \cdot )表示乘法,用法可能不严谨。
θ j : = θ j − α ∂ ∂ θ j J ( θ 0 , θ 1 ) : = θ j − α ∂ ∂ θ j ⋅ 1 2 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) 2 : = θ j − α ∂ ∂ θ j ⋅ 1 2 m ∑ i = 1 m ( θ 0 + θ 1 x ( i ) − y ( i ) ) 2 \begin{aligned} \theta_{j} &:=\theta_{j}-\alpha \frac{\partial}{\partial \theta_{j}} J_{\left(\theta_{0}, \theta_{1}\right)} \\ &:=\theta_{j}-\alpha \frac{\partial}{\partial \theta_{j}} \cdot \frac{1}{2 m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)})-y(i)\right)^{2} \\ &:=\theta_{j}-\alpha \frac{\partial}{\partial \theta_{j}} \cdot \frac{1}{2 m} \sum_{i=1}^{m}\left(\theta_{0}+\theta_{1} x^{(i)}-y^{(i)}\right)^{2} \end{aligned} θj:=θjαθjJ(θ0,θ1):=θjαθj2m1i=1m(hθ(x(i))y(i))2:=θjαθj2m1i=1m(θ0+θ1x(i)y(i))2
因此,当 j = 0 j=0 j=0时,即可得到 θ 0 \theta_{0} θ0,当 j = 1 j=1 j=1时,即可得到 θ 1 \theta_{1} θ1
(1)当 j = 0 j=0 j=0时, θ j \theta_{j} θj = θ 0 \theta_{0} θ0,此时需要注意公式中的求导 ∂ ∂ θ j \frac{\partial}{\partial \theta_{j}} θj即变为 ∂ ∂ θ 0 \frac{\partial}{\partial \theta_{0}} θ0,也就是说求取的是关于 θ 0 \theta_{0} θ0的导数,其余参数相当于常数项。
在这里插入图片描述

好了,现在将里边的平方项展开(使用上边讲述的公式,切记自变量是 θ 0 \theta_{0} θ0,其余参数均为常数项):
θ 0 : = θ 0 − α ∂ ∂ θ 0 1 2 m ∑ i = 1 m [ θ 0 2 + 2 ( θ 1 x ( i ) − y ( i ) ) θ 0 + ( θ 1 x ( i ) − y ( i ) ) 2 ] : = θ 0 − α 1 2 m ∑ i = 1 m [ 2 θ 0 + 2 ( θ 1 x ( i ) − y ( i ) ) ] : = θ 0 − α 1 m ∑ i = 1 m ( θ 0 + θ 1 x ( i ) − y ( i ) ) : = θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) \theta_{0}:=\theta_{0}-\alpha \frac{\partial}{\partial \theta_{0}} \frac{1}{2 m} \sum_{i=1}^{m}\left[\theta_{0}^{2}+2\left(\theta_{1} x^{(i)}-y^{(i)}\right) \theta_{0}+\left(\theta_{1} x^{(i)}-y^{(i)}\right)^{2}\right] \\ :=\theta_{0}-\alpha \frac{1}{2 m} \sum_{i=1}^{m}\left[2\theta_{0}+2\left(\theta_{1} x^{(i)}-y^{(i)}\right) \right]\\ :=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(\theta_{0}+\theta_{1} x^{(i)}-y^{(i)}\right)\\ :=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)})-y^{(i)}\right) θ0:=θ0αθ02m1i=1m[θ02+2(θ1x(i)y(i))θ0+(θ1x(i)y(i))2]:=θ0α2m1i=1m[2θ0+2(θ1x(i)y(i))]:=θ0αm1i=1m(θ0+θ1x(i)y(i)):=θ0αm1i=1m(hθ(x(i))y(i))
(2)当 j = 1 j=1 j=1时, θ j \theta_{j} θj = θ 1 \theta_{1} θ1,此时需要注意公式中的求导 ∂ ∂ θ j \frac{\partial}{\partial \theta_{j}} θj即变为 ∂ ∂ θ 1 \frac{\partial}{\partial \theta_{1}} θ1,也就是说求取的是关于 θ 1 \theta_{1} θ1的导数,其余参数相当于常数项。
在这里插入图片描述

同上,将平方项先展开(此时的自变量变为 θ 1 \theta_{1} θ1,其余参数均为常数项):
θ 1 : = θ 1 − α ∂ ∂ θ 1 1 2 m ⋅ ∑ i = 1 m [ ( θ 0 − y ( i ) ) 2 + 2 θ 1 x ( i ) ⋅ ( θ 0 − y ( i ) ) + θ 1 2 ⋅ ( x ( i ) ) 2 ] : = θ 1 − α 1 2 m ⋅ ∑ i = 1 m [ 0 + 2 x ( i ) ⋅ ( θ 0 − y ( i ) ) + 2 θ 1 ⋅ ( x ( i ) ) 2 ] : = θ 1 − α 1 2 m ⋅ ∑ i = 1 m [ 2 x ( i ) ⋅ [ θ 0 − y ( i ) + θ 1 x ( i ) ] ] : = θ 1 − α 1 m ⋅ ∑ i = 1 m ( θ 0 − y ( i ) + θ 1 x ( i ) ) ⋅ x ( i ) : = θ 1 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x ( i ) \theta_{1} := \theta_{1} - \alpha \frac{\partial}{\partial \theta_{1}} \frac{1}{2 m} \cdot \sum_{i=1}^{m}\left[\left(\theta_{0}-y^{(i)}\right)^{2}+2 \theta_{1} x^{(i)} \cdot\left(\theta_{0}-y^{(i)}\right)+\theta_{1}^{2} \cdot\left(x^{(i)}\right)^{2}\right]\\ :=\theta_{1} - \alpha \frac{1}{2 m} \cdot \sum_{i=1}^{m}\left[0+2 x^{(i)} \cdot\left(\theta_{0}-y^{(i)}\right)+2\theta_{1} \cdot\left(x^{(i)}\right)^{2}\right]\\ :=\theta_{1} - \alpha \frac{1}{2 m} \cdot \sum_{i=1}^{m}\left[2 x^{(i)} \cdot[\theta_{0}-y^{(i)}+\theta_{1} x^{(i)}]\right]\\ :=\theta_{1} - \alpha \frac{1}{m} \cdot \sum_{i=1}^{m}\left(\theta_{0}-y^{(i)}+\theta_{1} x^{(i)}\right)\cdot x^{(i)}\\ :=\theta_{1}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)})-y^{(i)}\right) \cdot x^{(i)} θ1:=θ1αθ12m1i=1m[(θ0y(i))2+2θ1x(i)(θ0y(i))+θ12(x(i))2]:=θ1α2m1i=1m[0+2x(i)(θ0y(i))+2θ1(x(i))2]:=θ1α2m1i=1m[2x(i)[θ0y(i)+θ1x(i)]]:=θ1αm1i=1m(θ0y(i)+θ1x(i))x(i):=θ1αm1i=1m(hθ(x(i))y(i))x(i)

以最终可以推出以下结果:
θ 0 : = θ 0 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) θ 1 : = θ 1 − α 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) ⋅ x ( i ) \begin{aligned} \theta_{0} &:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)})-y^{(i)}\right) \\ \theta_{1} &:=\theta_{1}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}(x^{(i)})-y^{(i)}\right) \cdot x^{(i)} \end{aligned} θ0θ1:=θ0αm1i=1m(hθ(x(i))y(i)):=θ1αm1i=1m(hθ(x(i))y(i))x(i)等价于: θ 0 : = θ 0 − α 1 m ∑ i = 1 m ( θ 0 + θ 1 x ( i ) − y ( i ) ) θ 1 : = θ 1 − α 1 m ∑ i = 1 m ( θ 0 + θ 1 x ( i ) − y ( i ) ) ⋅ x ( i ) \begin{aligned} \theta_{0} &:=\theta_{0}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(\theta_{0}+\theta_{1} x^{(i)}-y^{(i)}\right) \\ \theta_{1} &:=\theta_{1}-\alpha \frac{1}{m} \sum_{i=1}^{m}\left(\theta_{0}+\theta_{1} x^{(i)}-y^{(i)}\right) \cdot x^{(i)} \end{aligned} θ0θ1:=θ0αm1i=1m(θ0+θ1x(i)y(i)):=θ1αm1i=1m(θ0+θ1x(i)y(i))x(i)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值