逻辑回归
回归:输入输出均为连续变量;
分类:输出为离散变量;
联合概率计算最大似然函数,即调整当前超参数,使之符合训练数据的概率最大。
评价回归函数
设置超参数,描述联合概率:
g
(
w
T
x
)
=
1
1
+
e
−
z
=
1
1
+
e
w
T
x
{
P
(
y
=
1
)
=
g
(
w
T
x
)
P
(
y
=
0
)
=
1
−
g
(
w
T
x
)
⇒
P
(
T
r
u
e
)
=
(
g
(
w
,
x
i
)
)
y
i
∗
(
1
−
g
(
w
i
,
x
i
)
)
1
−
y
i
⇒
L
(
w
⃗
)
=
∏
i
=
1
m
P
(
T
r
u
e
)
⇒
L
o
s
s
(
w
⃗
)
=
−
1
m
L
(
w
⃗
)
\begin{alignedat} a&g(w^Tx) = \frac {1}{1+e^{-z}} = \frac {1}{1+e^{w^Tx}}\\ &\begin{cases} P(y=1) &= g(w^Tx)\\ P(y=0) &= 1-g(w^Tx) \end{cases}\\ \Rightarrow &P( True ) = (g(w,x_i))^{y_i}*(1-g(w_i,xi))^{1-y_i}\\ \Rightarrow &L(\vec w) = \prod_{i=1}^mP(True)\\ \Rightarrow &Loss(\vec w) = -{1\over m}L(\vec w) \end{alignedat}
⇒⇒⇒g(wTx)=1+e−z1=1+ewTx1{P(y=1)P(y=0)=g(wTx)=1−g(wTx)P(True)=(g(w,xi))yi∗(1−g(wi,xi))1−yiL(w)=i=1∏mP(True)Loss(w)=−m1L(w)
其中,y是真实值。P表示当前超参数时,各情况概率,用以评价当前超参数。此时损失函数描述了变量w的变化规律。
推导似然函数 L 及损失函数:
h
θ
(
x
)
=
g
(
θ
;
x
)
=
1
1
+
e
θ
T
x
L
(
θ
)
=
∏
i
=
1
m
(
h
θ
(
x
i
)
)
y
i
(
1
−
h
θ
(
x
i
)
)
1
−
y
i
⇒
l
o
g
L
(
θ
)
=
∑
i
=
1
m
y
i
l
o
g
(
h
θ
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
⇒
δ
δ
θ
j
l
o
g
L
(
θ
)
=
−
1
m
∑
i
=
1
m
(
y
i
1
h
θ
(
x
i
)
δ
δ
θ
h
θ
(
x
i
)
−
(
1
−
y
i
)
1
1
−
h
θ
(
x
i
)
δ
δ
θ
j
h
θ
(
x
i
)
)
=
−
1
m
∑
i
=
1
m
[
y
i
1
h
θ
(
x
i
)
−
(
1
−
y
i
)
1
1
−
h
θ
(
x
i
)
]
δ
δ
θ
j
h
θ
(
x
i
)
=
−
1
m
∑
i
=
1
m
[
y
i
1
h
θ
(
x
i
)
−
(
1
−
y
i
)
1
1
−
h
θ
(
x
i
)
]
h
θ
(
x
i
)
(
1
−
h
θ
(
x
i
)
)
δ
δ
θ
j
θ
T
x
i
=
−
1
m
∑
i
=
1
m
[
y
i
(
1
−
h
θ
(
x
i
)
)
−
(
1
−
y
i
)
h
θ
(
x
i
)
]
δ
δ
θ
j
θ
T
x
i
=
−
1
m
∑
i
=
1
m
[
y
i
(
1
−
h
θ
(
x
i
)
)
−
(
1
−
y
i
)
h
θ
(
x
i
)
]
x
i
j
=
1
m
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
x
i
j
\begin{alignedat}a h_\theta(x) &= g(\theta; x) = \frac{1}{1+e^{\theta ^T x}} \\ L(\theta) &= \prod_{i=1}^{m}(h_\theta(x_i))^{y_i}(1-h_{\theta}(x_i))^{1-y_i}\\ \Rightarrow logL(\theta) &= \sum_{i=1}^my_ilog(h_{\theta}(x_i))+(1-y_i)log(1-h_{\theta}(x_i))\\ \Rightarrow \frac{\delta}{\delta_{\theta_j}}logL(\theta) &= -\frac{1}{m}\sum_{i=1}^m( y_i\frac{1}{h_\theta(x_i)}\frac{\delta}{\delta_{\theta}}h_\theta(x_i) - (1-y_i)\frac{1}{1-h_{\theta}(x_i)}\frac{\delta}{\delta_{\theta_j}}h_\theta(x_i) )\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i\frac{1}{h_\theta{(x_i)}}-(1-y_i)\frac{1}{1-h_\theta(x_i)}] \frac{\delta}{\delta_{\theta_j}}h_\theta(x_i)\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i\frac{1}{h_\theta{(x_i)}}-(1-y_i)\frac{1}{1-h_\theta(x_i)}] h_\theta(x_i)(1-h_\theta(x_i))\frac{\delta}{\delta_{\theta_j}}\theta^Tx_i\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i(1-h_\theta(x_i))-(1-y_i)h_\theta{(x_i)}] \frac{\delta}{\delta_{\theta_j}}\theta^Tx_i\\ &= -\frac{1}{m}\sum_{i=1}^m[y_i(1-h_\theta(x_i))-(1-y_i)h_\theta{(x_i)}]{x_i}_j\\ &= \frac{1}{m}\sum_{i=1}^m(h_\theta(x_i)-y_i){x_i}_j\\ \end{alignedat}
hθ(x)L(θ)⇒logL(θ)⇒δθjδlogL(θ)=g(θ;x)=1+eθTx1=i=1∏m(hθ(xi))yi(1−hθ(xi))1−yi=i=1∑myilog(hθ(xi))+(1−yi)log(1−hθ(xi))=−m1i=1∑m(yihθ(xi)1δθδhθ(xi)−(1−yi)1−hθ(xi)1δθjδhθ(xi))=−m1i=1∑m[yihθ(xi)1−(1−yi)1−hθ(xi)1]δθjδhθ(xi)=−m1i=1∑m[yihθ(xi)1−(1−yi)1−hθ(xi)1]hθ(xi)(1−hθ(xi))δθjδθTxi=−m1i=1∑m[yi(1−hθ(xi))−(1−yi)hθ(xi)]δθjδθTxi=−m1i=1∑m[yi(1−hθ(xi))−(1−yi)hθ(xi)]xij=m1i=1∑m(hθ(xi)−yi)xij
更新超参数
上例中求得了针对变量的**偏导数**,实际变量变化时候,更新方向也要依据偏导数进行更新:
θ
j
=
θ
j
−
α
1
m
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
x
i
j
\theta_j = \theta_j-\alpha\frac1{m}\sum_{i=1}^m(h_\theta(x_i)-y_i){x_i}_j
θj=θj−αm1i=1∑m(hθ(xi)−yi)xij
多分类的softmax
其中的概率函数表示:
h
θ
(
x
(
i
)
)
=
[
p
(
y
(
i
)
=
1
∣
x
(
i
)
;
θ
)
p
(
y
(
i
)
=
2
∣
x
(
i
)
;
θ
)
.
.
p
(
y
(
i
)
=
k
∣
x
(
i
)
;
θ
)
;
]
=
1
∑
j
=
1
k
e
θ
j
T
x
(
i
)
[
e
θ
1
T
x
(
i
)
e
θ
2
T
x
(
i
)
.
e
θ
k
T
x
(
i
)
]
\begin{alignedat}a h_\theta(x^{(i)}) &= \begin{bmatrix} p(y^{(i)} = 1|x^{(i)};\theta)\\ p(y^{(i)} = 2|x^{(i)};\theta)\\ .\\. p(y^{(i)} = k|x^{(i)};\theta); \end{bmatrix} &= {1 \over {\sum_{j=1}^k}e^{\theta^T_jx^{(i)}}} \begin{bmatrix} e^{\theta^T_1x^{(i)}}\\ e^{\theta^T_2x^{(i)}}\\ .\\ e^{\theta^T_kx^{(i)}}\\ \end{bmatrix} \end{alignedat}
hθ(x(i))=⎣⎢⎢⎡p(y(i)=1∣x(i);θ)p(y(i)=2∣x(i);θ)..p(y(i)=k∣x(i);θ);⎦⎥⎥⎤=∑j=1keθjTx(i)1⎣⎢⎢⎢⎡eθ1Tx(i)eθ2Tx(i).eθkTx(i)⎦⎥⎥⎥⎤