梯度下降求解过程
线性回归
机器学习按目标函数进行迭代,使目标函数结果接近最小值
梯度下降,目标函数:
J
(
θ
0
,
θ
1
)
=
1
2
m
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
\displaystyle J(\theta_0,\theta_1)=\frac{1}{2m}\sum_{i=1}^m\big(h_\theta(x_i)-y_i\big)
J(θ0,θ1)=2m1i=1∑m(hθ(xi)−yi)
批量梯度下降: ∂ J ( θ ) ∂ θ j = − 1 m ∑ i = 1 m ( y i − h θ ( x i ) ) x i j \displaystyle \frac{\partial{J(\theta)}}{\partial{\theta_j}}=-\frac{1}{m}\sum_{i=1}^m(y_i-h_\theta(x_i))x_{ij} ∂θj∂J(θ)=−m1i=1∑m(yi−hθ(xi))xij
θ
j
′
=
θ
j
+
1
m
∑
i
=
1
m
(
y
i
−
h
θ
(
x
i
)
)
x
i
j
\theta_j'=\theta_j+\frac{1}{m}\sum_{i=1}^m(y_i-h_\theta(x_i))x_{ij}
θj′=θj+m1i=1∑m(yi−hθ(xi))xij
批量梯度下降使用所有样本,速度很慢,容易得到最优解
随机梯度下降:
θ
j
′
=
θ
j
+
(
y
i
−
h
θ
(
x
i
)
)
x
i
j
\theta_j'=\theta_j+(y_i-h_\theta(x_i))x_{ij}
θj′=θj+(yi−hθ(xi))xij
每次找一个样本,迭代速度快,但不一定每次都朝着收敛的方向
小批量梯度下降法:
θ
j
′
=
θ
−
α
1
10
∑
k
=
i
i
+
9
(
h
θ
(
x
k
)
−
y
k
)
x
k
j
\theta_j'=\theta-\alpha\frac{1}{10}\sum_{k=i}^{i+9}(h_\theta(x_k)-y_k)x_{kj}
θj′=θ−α101k=i∑i+9(hθ(xk)−yk)xkj
每次更新选择一小部分数据迭代,以上两种方式结合
逻辑回归
分类数据的回归分析Logistic regression
Sigmoid函数:
g
(
z
)
=
1
1
+
e
−
z
g(z)=\frac{1}{1+e^{-z}}
g(z)=1+e−z1
自变量取值为任意实数,值域[0,1]
将实数域内的值,映射到了0-1区间,完成了由值到概率的转换
预测函数: h θ ( x ) = g ( θ T x ) = 1 1 + e − θ T x h_\theta(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}} hθ(x)=g(θTx)=1+e−θTx1
其中函数:
θ
0
+
θ
1
x
1
+
,
⋯
,
+
θ
n
x
n
=
∑
i
=
1
n
θ
i
x
i
=
θ
T
x
\theta_0+\theta_1x_1+,\cdots,+\theta_nx_n=\displaystyle \sum_{i=1}^n\theta_ix_i=\theta^Tx
θ0+θ1x1+,⋯,+θnxn=i=1∑nθixi=θTx
分类任务:
p
(
y
=
1
∣
x
;
θ
)
=
h
θ
(
x
)
⋯
①
p
(
y
=
0
∣
x
;
θ
)
=
1
−
h
θ
(
x
)
⋯
②
p
(
y
∣
x
;
θ
)
=
(
h
θ
(
x
)
)
y
(
1
−
h
θ
(
x
)
)
1
−
y
⋯
③
\begin{aligned} p(y=1|x;\theta)&=h_\theta(x)&\cdots①\\p(y=0|x;\theta)&=1-h_\theta(x) &\cdots②\\p(y|x;\theta)&=(h_\theta(x))^y(1-h_\theta(x))^{1-y}&\cdots③\end{aligned}
p(y=1∣x;θ)p(y=0∣x;θ)p(y∣x;θ)=hθ(x)=1−hθ(x)=(hθ(x))y(1−hθ(x))1−y⋯①⋯②⋯③
二分类任务1,2 整合为3
似然函数: L ( θ ) = ∏ i = 1 m p ( y i ∣ x i ; θ ) = ∏ i = 1 m ( h θ ( x i ) ) y i ( 1 − h θ ( x i ) ) 1 − y i L(\theta)=\prod_{i=1}^mp(y_i|x_i;\theta)=\prod_{i=1}^m(h_\theta(x_i))^{y_i}(1-h_\theta(x_i))^{1-y_i} L(θ)=i=1∏mp(yi∣xi;θ)=i=1∏m(hθ(xi))yi(1−hθ(xi))1−yi
对数似然: l ( θ ) = l o g L ( θ ) = ∑ i = 1 m ( y i l o g h θ ( x i ) + ( 1 − y i ) l o g ( 1 − h θ ( x i ) ) ) l(\theta)=logL(\theta)=\sum_{i=1}^m\bigg(y_ilogh_\theta(x_i)+(1-y_i)log(1-h_\theta(x_i))\bigg) l(θ)=logL(θ)=i=1∑m(yiloghθ(xi)+(1−yi)log(1−hθ(xi)))
引入函数: J ( θ ) = − 1 m l ( θ ) J(\theta)=-\frac{1}{m}l(\theta) J(θ)=−m1l(θ)求解 J ( θ ) J(\theta) J(θ)的最小值
求解过程:
l
(
θ
)
=
l
o
g
L
(
θ
)
=
∑
i
=
1
m
(
y
i
l
o
g
h
θ
(
x
i
)
+
l
o
g
(
1
−
h
θ
(
x
i
)
)
)
∂
J
(
θ
)
∂
θ
j
=
−
1
m
∑
i
=
1
m
(
y
i
1
h
θ
(
x
i
)
∂
∂
θ
j
h
θ
(
x
i
)
−
(
1
−
y
i
)
1
1
−
h
θ
(
x
i
)
∂
∂
θ
j
h
θ
(
x
i
)
)
=
−
1
m
∑
i
=
1
m
(
y
i
1
g
(
θ
T
x
i
)
−
(
1
−
y
i
)
1
1
−
g
(
θ
i
x
)
)
∂
∂
θ
j
g
(
θ
T
x
i
)
=
−
1
m
∑
i
=
1
m
(
y
i
1
g
(
θ
T
x
i
)
−
(
1
−
y
i
)
1
1
−
g
(
θ
T
x
i
)
)
g
(
θ
T
x
i
)
(
1
−
g
(
θ
T
x
i
)
)
∂
∂
θ
j
θ
T
x
i
=
−
1
m
∑
i
=
1
m
(
y
i
(
1
−
g
(
θ
T
x
i
)
)
−
(
1
−
y
i
)
g
(
θ
T
x
i
)
)
x
i
j
=
−
1
m
∑
i
=
1
m
(
y
i
−
g
(
θ
T
x
i
)
)
x
i
j
=
1
m
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
x
i
j
\begin{aligned}l(\theta)=logL(\theta)&=\sum_{i=1}^m\bigg(y_ilogh_\theta(x_i)+log(1-h_\theta(x_i))\bigg)\\ \frac{\partial J(\theta)}{\partial_{\theta_j}}&=-\frac{1}{m}\sum_{i=1}^m\bigg(y_i\frac{1}{h_\theta(x_i)}\frac{\partial}{\partial_{\theta_j}}h_\theta(x_i)-(1-y_i)\frac{1}{1-h_\theta(x_i)}\frac{\partial}{\partial_{\theta_j}}h_\theta(x_i)\bigg)\\ &=-\frac{1}{m}\sum_{i=1}^m\bigg(y_i\frac{1}{g(\theta^Tx_i)}-(1-y_i)\frac{1}{1-g(\theta^x_i)}\bigg)\frac{\partial}{\partial_{\theta_j}}g(\theta^Tx_i)\\ &=-\frac{1}{m}\sum_{i=1}^m\bigg(y_i\frac{1}{g(\theta^Tx_i)}-(1-y_i)\frac{1}{1-g(\theta^Tx_i)}\bigg)g(\theta^Tx_i)(1-g(\theta^Tx_i))\frac{\partial}{\partial\theta_j}\theta^Tx_i\\ &=-\frac{1}{m}\sum_{i=1}^m\bigg(y_i(1-g(\theta^Tx_i))-(1-y_i)g(\theta^Tx_i)\bigg)x_{i}^j\\ &=-\frac{1}{m}\sum_{i=1}^m\big(y_i-g(\theta^Tx_i)\big)x_i^j\\ &=\frac{1}{m}\sum_{i=1}{m}\big(h_\theta(x_i)-y_i\big)x_i^j \end{aligned}
l(θ)=logL(θ)∂θj∂J(θ)=i=1∑m(yiloghθ(xi)+log(1−hθ(xi)))=−m1i=1∑m(yihθ(xi)1∂θj∂hθ(xi)−(1−yi)1−hθ(xi)1∂θj∂hθ(xi))=−m1i=1∑m(yig(θTxi)1−(1−yi)1−g(θix)1)∂θj∂g(θTxi)=−m1i=1∑m(yig(θTxi)1−(1−yi)1−g(θTxi)1)g(θTxi)(1−g(θTxi))∂θj∂θTxi=−m1i=1∑m(yi(1−g(θTxi))−(1−yi)g(θTxi))xij=−m1i=1∑m(yi−g(θTxi))xij=m1i=1∑m(hθ(xi)−yi)xij
逻辑回归参数更新:
θ
j
=
θ
j
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
x
i
j
\theta_j=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x_i)-y_i)x_i^j
θj=θj−αm1i=1∑m(hθ(xi)−yi)xij