记录一下逻辑回归的参数求解推导过程:
损失函数
线性回归的表达式为: f ( x ) = w x + b f(x) = wx+b f(x)=wx+b,为了消除后面的 b b b,令 θ = [ w b ] , x = [ x 1 ] T \theta = [w \quad b], x = [x \quad 1]^T θ=[wb],x=[x1]T,则 f ( x ) = θ x f(x) = \theta x f(x)=θx
将其转换为逻辑回归模型: y = σ ( f ( x ) ) = σ ( θ x ) = 1 1 + e − θ x y=\sigma(f({x}))=\sigma\left({\theta} {x}\right)=\frac{1}{1+e^{-{\theta} {x}}} y=σ(f(x))=σ(θx)=1+e−θx1
我们把单个样本看作一个事件,那么这个时间发生的概率为:
P
(
y
∣
x
)
=
{
p
,
y
=
1
1
−
p
,
y
=
0
P(y \mid {x})=\left\{\begin{array}{r} p, y=1 \\ 1-p, y=0 \end{array}\right.
P(y∣x)={p,y=11−p,y=0
它等价于:
P
(
y
i
∣
x
i
)
=
p
y
i
(
1
−
p
)
1
−
y
i
P\left(y_{i} \mid \boldsymbol{x}_{i}\right)=p^{y_{i}}(1-p)^{1-y_{i}}
P(yi∣xi)=pyi(1−p)1−yi
如果我们采集到了一组数据一共N个,
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
(
x
3
,
y
3
)
…
(
x
N
,
y
N
)
}
,
\left\{\left({x}_{1}, y_{1}\right),\left({x}_{2}, y_{2}\right),\left({x}_{3}, y_{3}\right) \ldots\left({x}_{N}, y_{N}\right)\right\},
{(x1,y1),(x2,y2),(x3,y3)…(xN,yN)}, 这个合成在一起的合事件发生的总概率如下:
P
t
o
t
a
l
=
P
(
y
1
∣
x
1
)
P
(
y
2
∣
x
2
)
P
(
y
3
∣
x
3
)
…
P
(
y
N
∣
x
N
)
=
∏
i
=
1
N
p
y
i
(
1
−
p
)
1
−
y
i
F
(
θ
)
=
l
n
(
P
t
o
t
a
l
)
=
∑
i
=
1
N
l
n
(
p
y
i
(
1
−
p
)
1
−
y
i
)
=
∑
i
=
1
N
y
i
l
n
p
+
(
1
−
y
i
)
l
n
(
1
−
p
)
其
中
p
=
1
1
+
e
−
θ
x
\begin{aligned} P_{total} &= P(y_1|x_1)P(y_2|x_2)P(y_3|x_3) \ldots P(y_N|x_N) \\ &= \prod_{i=1}^{N} p^{y_{i}}(1-p)^{1-y_{i}} \\ F(\theta) &= ln(P_{total}) = \sum_{i=1}^N ln(p^{y_{i}}(1-p)^{1-y_{i}}) \\ &= \sum_{i=1}^N y_ilnp + (1-y_i)ln(1-p) \\ 其中 p &= \frac{1}{1+e^{-{\theta} {x}}} \end{aligned}
PtotalF(θ)其中p=P(y1∣x1)P(y2∣x2)P(y3∣x3)…P(yN∣xN)=i=1∏Npyi(1−p)1−yi=ln(Ptotal)=i=1∑Nln(pyi(1−p)1−yi)=i=1∑Nyilnp+(1−yi)ln(1−p)=1+e−θx1
为了符合损失函数的含义,将其定义为为:
L
(
θ
)
=
−
F
(
θ
)
L(\theta) = -F(\theta)
L(θ)=−F(θ)
推导
∂ L ∂ θ = ∂ L ∂ p × ∂ p ∂ θ \frac{\partial L}{\partial \theta} = \frac{\partial L}{\partial p} \times \frac{\partial p}{\partial \theta} ∂θ∂L=∂p∂L×∂θ∂p
先求
∂
p
∂
θ
\frac{\partial p}{\partial \theta}
∂θ∂p :
p
′
=
(
1
1
+
e
−
θ
x
)
′
=
−
1
(
1
+
e
−
θ
x
)
2
⋅
e
−
θ
x
⋅
−
x
=
1
1
+
e
−
θ
x
⋅
e
−
θ
x
1
+
e
−
θ
x
⋅
x
=
p
(
1
−
p
)
x
\begin{aligned} p' &= (\frac{1}{1+e^{-\theta x}})' \\ &= \frac{-1}{(1+e^{-\theta x})^2} \cdot e^{-\theta x} \cdot -x \\ &= \frac{1}{1+e^{-\theta x}} \cdot \frac{e^{-\theta x}}{1+e^{-\theta x}} \cdot x \\ &= p(1-p)x \end{aligned}
p′=(1+e−θx1)′=(1+e−θx)2−1⋅e−θx⋅−x=1+e−θx1⋅1+e−θxe−θx⋅x=p(1−p)x
求
∂
F
∂
θ
\frac{\partial F}{\partial \theta}
∂θ∂F :
∇
F
(
θ
)
=
∇
(
∑
i
=
1
N
y
i
l
n
p
+
(
1
−
y
i
)
l
n
(
1
−
p
)
)
=
∂
F
∂
p
×
∂
p
∂
θ
=
(
∑
i
=
1
N
y
i
1
p
+
(
1
−
y
i
)
−
1
1
−
p
)
p
′
=
∑
i
=
1
N
y
i
(
1
−
p
)
x
i
−
(
1
−
y
i
)
p
x
i
=
∑
i
=
1
N
(
y
i
−
p
)
x
i
\begin{aligned} \nabla F(\theta) &= \nabla (\sum_{i=1}^N y_ilnp + (1-y_i)ln(1-p)) \\ &= \frac{\partial F}{\partial p} \times \frac{\partial p}{\partial \theta} \\ &= (\sum_{i=1}^N y_i \frac{1}{p} + (1-y_i)\frac{-1}{1-p}) p' \\ &= \sum_{i=1}^N y_i(1-p)x_i - (1-y_i)px_i \\ &= \sum_{i=1}^N (y_i-p) x_i \\ \end{aligned}
∇F(θ)=∇(i=1∑Nyilnp+(1−yi)ln(1−p))=∂p∂F×∂θ∂p=(i=1∑Nyip1+(1−yi)1−p−1)p′=i=1∑Nyi(1−p)xi−(1−yi)pxi=i=1∑N(yi−p)xi
因此
∂
L
∂
θ
=
∑
i
=
1
N
(
p
−
y
i
)
x
i
\frac{\partial L}{\partial \theta} = \sum_{i=1}^N (p-y_i)x_i
∂θ∂L=∑i=1N(p−yi)xi
梯度更新
通过反向传播,
θ
\theta
θ 的更新过程如下:
θ
:
=
θ
−
α
∑
i
=
1
N
(
1
1
+
e
−
θ
x
i
)
x
i
\theta := \theta - \alpha \sum_{i=1}^N (\frac{1}{1+e^{-\theta x_i}}) x_i
θ:=θ−αi=1∑N(1+e−θxi1)xi