文章目录
LR算法
逻辑回归主要用于二元分类问题,逻辑函数为
f
(
x
;
θ
)
=
1
1
+
e
−
θ
T
x
f (x;\theta)= \frac{1}{1+e^{-{\theta}^Tx}}
f(x;θ)=1+e−θTx1
推导损失函数:
P
(
y
=
1
∣
x
;
θ
)
=
f
(
x
;
θ
)
=
1
1
+
e
−
θ
T
x
P(y = 1|x;\theta) = f(x;\theta)=\frac{1}{1+e^{-{\theta}^Tx}}
P(y=1∣x;θ)=f(x;θ)=1+e−θTx1
P
(
y
=
0
∣
x
;
θ
)
=
1
−
P
(
y
=
1
∣
x
;
θ
)
P(y = 0|x;\theta) = 1-P(y = 1|x;\theta)
P(y=0∣x;θ)=1−P(y=1∣x;θ)
每个样本相互独立,则似然函数为:
L
(
θ
)
=
∏
i
∈
{
1
,
…
,
N
}
.
y
(
i
)
=
1
P
(
y
=
1
∣
x
(
i
)
;
θ
)
⋅
∏
i
∈
{
1
,
…
,
N
}
.
y
(
i
)
=
0
P
(
y
=
0
∣
x
(
i
)
;
θ
)
L(\theta) = \prod_{i\in\{1,\ldots,N\}.y^{(i)}=1}P(y = 1|x^{(i)};\theta) \cdot\prod_{i\in\{1,\ldots,N\}.y^{(i)}=0}P(y = 0|x^{(i)};\theta)
L(θ)=i∈{1,…,N}.y(i)=1∏P(y=1∣x(i);θ)⋅i∈{1,…,N}.y(i)=0∏P(y=0∣x(i);θ)
L
(
θ
)
=
∏
i
∈
{
1
,
…
,
N
}
.
y
(
i
)
=
1
P
(
y
=
1
∣
x
(
i
)
;
θ
)
⋅
∏
i
∈
1
,
…
,
N
}
.
y
(
i
)
=
0
(
1
−
P
(
y
=
1
∣
x
(
i
)
;
θ
)
)
L(\theta) = \prod_{i\in\{1,\ldots,N\}.y^{(i)}=1}P(y = 1|x^{(i)};\theta) \cdot\prod_{i\in{1,\ldots,N\}.y^{(i)}=0}}(1-P(y = 1|x^{(i)};\theta))
L(θ)=i∈{1,…,N}.y(i)=1∏P(y=1∣x(i);θ)⋅i∈1,…,N}.y(i)=0∏(1−P(y=1∣x(i);θ))对其取对数:
J
(
θ
)
=
−
l
n
L
(
θ
)
=
−
∑
i
=
1
N
y
(
i
)
l
n
(
P
(
y
=
1
∣
x
(
i
)
;
θ
)
)
+
(
1
−
y
(
i
)
)
l
n
(
1
−
P
(
y
=
1
∣
x
(
i
)
;
θ
)
)
J(\theta)= -lnL(\theta)=-\sum_{i=1}^Ny^{(i)}ln(P(y = 1|x^{(i)};\theta)) +(1-y^{(i)})ln(1-P(y = 1|x^{(i)};\theta))
J(θ)=−lnL(θ)=−i=1∑Ny(i)ln(P(y=1∣x(i);θ))+(1−y(i))ln(1−P(y=1∣x(i);θ))
梯度下降法:
对于
f
(
z
)
=
1
1
+
e
z
f (z)= \frac{1}{1+e^{z}}
f(z)=1+ez1的导函数是
f
′
(
z
)
=
f
(
z
)
(
1
−
f
(
z
)
)
f'(z)=f(z)(1-f(z))
f′(z)=f(z)(1−f(z))
链式求导法则求得参数
θ
\theta
θ:
θ
=
θ
−
α
⋅
(
f
(
x
i
;
θ
)
−
y
i
)
x
i
\theta=\theta-\alpha\cdot(f(x_i;\theta)-y_i)x_i
θ=θ−α⋅(f(xi;θ)−yi)xi
算法优化:随机梯度下降法
当大数据量时,上述方法需要遍历所有样本,造成梯度下降缓慢。
随机梯度下降是根据步长选择一部分样本进行梯度下降,再向最优点前进,最终在最优点附近震荡。