数据集:
T
=
{
(
x
⃗
1
,
y
1
)
,
(
x
⃗
2
,
y
2
)
,
⋯
(
x
⃗
N
,
y
N
)
}
T = \{(\vec{x}_1,y_1),(\vec{x}_2,y_2),\cdots(\vec{x}_{N},y_N)\}
T={(x1,y1),(x2,y2),⋯(xN,yN)}
x
i
⃗
∈
X
⊆
R
n
,
y
i
∈
Y
=
{
0
,
1
}
\qquad\quad\vec{x_i} \in \mathcal{X} \subseteq\ \mathbb{R^n},y_i \in \mathcal{Y}=\{0,1\}
xi∈X⊆ Rn,yi∈Y={0,1}
求解已知
x
⃗
\vec{x}
x 对应的
y
y
y,即:
P
(
y
∣
x
⃗
)
P(y\mid\vec{x})
P(y∣x) ,也即:
P
(
y
=
1
∣
x
⃗
)
P(y=1\mid\vec{x})
P(y=1∣x)
对数概率函数:
P
(
y
=
1
∣
x
⃗
)
=
1
1
+
e
−
(
w
⃗
T
⋅
x
⃗
+
b
)
P(y=1\mid\vec{x}) = \frac{1}{1+e^{-(\vec{w}^{T}\cdot\vec{x}+b)}}
P(y=1∣x)=1+e−(wT⋅x+b)1记:
w
~
⃗
=
(
w
⃗
,
b
)
T
\vec{\tilde{w}}=(\vec{w},b)^{T}
w~=(w,b)T
    
x
~
⃗
=
(
x
⃗
,
1
)
T
\quad\,\,\,\,\vec{\tilde{x}}=(\vec{x},1)^{T}
x~=(x,1)T
D
(
x
~
⃗
)
=
P
(
y
=
1
∣
x
~
⃗
)
=
1
1
+
e
−
(
w
~
⃗
T
⋅
x
~
⃗
)
=
e
w
~
⃗
T
⋅
x
~
⃗
1
+
e
w
~
⃗
T
⋅
x
~
⃗
D(\vec{\tilde{x}})=P(y=1\mid\vec{\tilde{x}}) = \frac{1}{1+e^{-(\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}})}}=\frac{e^{\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}}}{1+e^{\vec{\tilde{w}}^T\cdot\vec{\tilde{x}}}}
D(x~)=P(y=1∣x~)=1+e−(w~T⋅x~)1=1+ew~T⋅x~ew~T⋅x~同理:
1
−
D
(
x
~
⃗
)
=
P
(
y
=
0
∣
x
~
⃗
)
=
1
1
+
e
w
~
⃗
T
⋅
x
~
⃗
1-D(\vec{\tilde{x}})=P(y=0\mid\vec{\tilde{x}}) =\frac{1}{1+e^{\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}}}
1−D(x~)=P(y=0∣x~)=1+ew~T⋅x~1似然函数:
S
(
w
~
⃗
∣
X
)
=
∏
i
=
1
N
[
D
(
x
~
⃗
i
)
]
y
i
[
1
−
D
(
x
~
⃗
i
)
]
1
−
y
i
,
y
i
∈
{
0
,
1
}
S(\vec{\tilde{w}}\mid X) = \prod_{i=1}^{N} [D(\vec{\tilde{x}}_i)]^{y_i}[1-D(\vec{\tilde{x}}_i)]^{_1- y_i}\hspace{0.1cm}_,\hspace{0.1cm}y_i\in\{0,1\}
S(w~∣X)=i=1∏N[D(x~i)]yi[1−D(x~i)]1−yi,yi∈{0,1}对数似然函数:
L
(
w
~
⃗
)
=
log
S
(
w
~
⃗
∣
X
)
L(\vec{\tilde{w}})=\log S(\vec{\tilde{w}}\mid X)
L(w~)=logS(w~∣X)
⇒
L
(
w
~
⃗
)
=
∑
i
N
[
y
i
log
D
(
x
~
⃗
i
)
+
(
1
−
y
i
)
log
(
1
−
D
(
x
~
⃗
i
)
)
]
\Rightarrow L(\vec{\tilde{w}})=\sum_{i}^{N}\left[y_i\log D(\vec{\tilde{x}}_i)+(1-y_i)\log (1-D(\vec{\tilde{x}}_i))\right]
⇒L(w~)=i∑N[yilogD(x~i)+(1−yi)log(1−D(x~i))]
⇒
L
(
w
~
⃗
)
=
∑
i
N
[
y
i
log
D
(
x
~
⃗
i
)
1
−
D
(
x
~
⃗
i
)
+
log
(
1
−
D
(
x
~
⃗
i
)
)
]
\Rightarrow L(\vec{\tilde{w}})=\sum_{i}^{N}\left[y_i\log \frac{D(\vec{\tilde{x}}_i)}{1-D(\vec{\tilde{x}}_i)}+\log (1-D(\vec{\tilde{x}}_i))\right]
⇒L(w~)=i∑N[yilog1−D(x~i)D(x~i)+log(1−D(x~i))]
⇒
L
(
w
~
⃗
)
=
∑
i
N
[
y
i
(
w
~
⃗
T
⋅
x
~
⃗
i
)
−
log
(
1
+
e
w
~
⃗
T
⋅
x
~
⃗
i
)
]
\Rightarrow L(\vec{\tilde{w}})=\sum_{i}^{N}\left[y_i(\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}_i)-\log (1+e^{\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}_i})\right ]
⇒L(w~)=i∑N[yi(w~T⋅x~i)−log(1+ew~T⋅x~i)]
极大似然估计法:
w
~
⃗
∗
=
arg
max
w
~
⃗
T
∑
i
N
[
y
i
(
w
~
⃗
T
⋅
x
~
⃗
i
)
−
log
(
1
+
e
w
~
⃗
T
⋅
x
~
⃗
i
)
]
\vec{\tilde{w}}^{*}=\arg\max_{\vec{\tilde{w}}^{T}}\sum_{i}^{N}\left[y_i(\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}_i)-\log (1+e^{\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}}_i})\right ]
w~∗=argw~Tmaxi∑N[yi(w~T⋅x~i)−log(1+ew~T⋅x~i)]
梯度下降法:
最 大 化 : L ( w ~ ⃗ ) = ∑ i N [ y i log D ( x ~ ⃗ i ) + ( 1 − y i ) log ( 1 − D ( x ~ ⃗ i ) ) ] 最大化:L(\vec{\tilde{w}})=\sum_{i}^{N}\left[y_i\log D(\vec{\tilde{x}}_i)+(1-y_i)\log (1-D(\vec{\tilde{x}}_i))\right] 最大化:L(w~)=i∑N[yilogD(x~i)+(1−yi)log(1−D(x~i))] 令 : J ( w ~ ⃗ ) = − 1 N L ( w ~ ⃗ ) 令: J(\vec{\tilde{w}})=-\frac{1}{N}L(\vec{\tilde{w}}) 令:J(w~)=−N1L(w~) 最 小 化 : J ( w ~ ⃗ ) = − 1 N ∑ i N [ y i log D ( x ~ ⃗ i ) + ( 1 − y i ) log ( 1 − D ( x ~ ⃗ i ) ) ] 最小化:J(\vec{\tilde{w}})=-\frac{1}{N}\sum_{i}^{N}\left[y_i\log D(\vec{\tilde{x}}_i)+(1-y_i)\log (1-D(\vec{\tilde{x}}_i))\right] 最小化:J(w~)=−N1i∑N[yilogD(x~i)+(1−yi)log(1−D(x~i))] ∂ J ( w ~ ⃗ ) ∂ w ~ ⃗ j = − 1 N ∑ i N [ y i 1 D ( x ~ ⃗ i ) ∂ D ( x ~ ⃗ i ) ∂ w ~ ⃗ j − ( 1 − y i ) 1 1 − D ( x ~ ⃗ i ) ∂ D ( x ~ ⃗ i ) ∂ w ~ ⃗ j ] \frac{\partial J(\vec{\tilde{w}})}{\partial \vec{\tilde{w}}_j }=-\frac{1}{N}\sum_{i}^{N}\left[y_i \frac{1}{D(\vec{\tilde{x}}_i)} \frac{\partial D(\vec{\tilde{x}}_i) }{\partial\vec{\tilde{w}}_j }-(1-y_i)\frac{1}{1-D(\vec{\tilde{x}}_i)}\frac{\partial D(\vec{\tilde{x}}_i) }{\partial\vec{\tilde{w}}_j }\right] ∂w~j∂J(w~)=−N1i∑N[yiD(x~i)1∂w~j∂D(x~i)−(1−yi)1−D(x~i)1∂w~j∂D(x~i)] = − 1 N ∑ i N ( y i 1 D ( x ~ ⃗ i ) − ( 1 − y i ) 1 1 − D ( x ~ ⃗ i ) ) ∂ D ( x ~ ⃗ i ) ∂ w ~ ⃗ j =-\frac{1}{N}\sum_{i}^{N}\left(y_i \frac{1}{D(\vec{\tilde{x}}_i)} -(1-y_i)\frac{1}{1-D(\vec{\tilde{x}}_i)}\right)\frac{\partial D(\vec{\tilde{x}}_i) }{\partial\vec{\tilde{w}}_j } =−N1i∑N(yiD(x~i)1−(1−yi)1−D(x~i)1)∂w~j∂D(x~i) = − 1 N ∑ i N ( y i 1 D ( x ~ ⃗ i ) − ( 1 − y i ) 1 1 − D ( x ~ ⃗ i ) ) D ( x ~ ⃗ i ) ( 1 − D ( x ~ ⃗ i ) ) ∂ ( w ~ ⃗ T ⋅ x ~ ⃗ ) ∂ w ~ ⃗ j =-\frac{1}{N}\sum_{i}^{N}\left(y_i \frac{1}{D(\vec{\tilde{x}}_i)} -(1-y_i)\frac{1}{1-D(\vec{\tilde{x}}_i)}\right)D(\vec{\tilde{x}}_i)(1-D(\vec{\tilde{x}}_i))\frac{\partial (\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}})}{\partial \vec{\tilde{w}}_j} =−N1i∑N(yiD(x~i)1−(1−yi)1−D(x~i)1)D(x~i)(1−D(x~i))∂w~j∂(w~T⋅x~) = − 1 N ∑ i N ( y i ( 1 − D ( x ~ ⃗ i ) ) − ( 1 − y i ) D ( x ~ ⃗ i ) ) ∂ ( w ~ ⃗ T ⋅ x ~ ⃗ ) ∂ w ~ ⃗ j =-\frac{1}{N}\sum_{i}^{N}\left(y_i (1-D(\vec{\tilde{x}}_i)) -(1-y_i)D(\vec{\tilde{x}}_i)\right)\frac{\partial (\vec{\tilde{w}}^{T}\cdot\vec{\tilde{x}})}{\partial \vec{\tilde{w}}_j} =−N1i∑N(yi(1−D(x~i))−(1−yi)D(x~i))∂w~j∂(w~T⋅x~) = − 1 N ∑ i N ( y i − D ( x ~ ⃗ i ) ) x ~ ⃗ i j =-\frac{1}{N}\sum_{i}^{N}\left( y_i -D(\vec{\tilde{x}}_i)\right) \vec{\tilde{x}}_i^{j} =−N1i∑N(yi−D(x~i))x~ij = 1 N ∑ i N ( D ( x ~ ⃗ i ) − y i ) x ~ ⃗ i j =\frac{1}{N}\sum_{i}^{N}\left( D(\vec{\tilde{x}}_i) - y_i \right) \vec{\tilde{x}}_i^{j} =N1i∑N(D(x~i)−yi)x~ij ⇒ 1 N ( D ( X ~ ) − y ⃗ ) X ~ ⋅ j \Rightarrow \frac{1}{N}(D(\tilde{X})-\vec{y})\tilde{X}_{\cdot j} ⇒N1(D(X~)−y)X~⋅j最后: w ~ ⃗ j = w ~ ⃗ j − 1 N η ( D ( X ~ ) − y ⃗ ) T X ~ ⋅ j \vec{\tilde{w}}_{j}=\vec{\tilde{w}}_{j} - \frac{1}{N}\eta(D(\tilde{X})-\vec{y})^{T}\tilde{X}_{\cdot j} w~j=w~j−N1η(D(X~)−y)TX~⋅j 其 中 , D ( X ~ ) = e X ~ ⋅ w ~ ⃗ 1 + e X ~ ⋅ w ~ ⃗ 其中,D(\tilde{X})=\frac{e^{\tilde{X}\cdot\vec{\tilde{w}}}}{1+e^{\tilde{X}\cdot\vec{\tilde{w}}}} 其中,D(X~)=1+eX~⋅w~eX~⋅w~
python实例-1
import numpy as np
import matplotlib.pyplot as plt
def computeCost(X, y, w_e):
return 1.0 / (2 * len(y)) * np.dot((np.dot(X, w_e) - y).T, (np.dot(X, w_e) - y))
def D(X,w_e):
return np.exp(np.dot(X, w_e)) / (1 + np.exp(-np.dot(X, w_e)))
def gradientDescent(X, y, w_e, alpha):
for j in range(len(w_e)):
w_e[j] = w_e[j] - alpha * 1.0 / len(y) * np.dot((D(X,w_e) - y).T, X[:, j])
return w_e
X = np.random.rand(10, 5)
m = np.ones((10, 1))
X = np.concatenate((X, m), axis=1)
w = np.array([[1.0], [2.0], [3.0], [4.0], [5.0], [10.0]])
y = np.dot(X, w)
a = np.mean(y, axis=0)
for i in range(10):
if y[i][0] > a:
y[i][0] = 1
else: y[i][0] = 0
w_e = np.zeros_like(w)
cost = []
for i in range(10000):
w_e = gradientDescent(X, y, w_e, 0.001)
cost.append(computeCost(X, y, w_e)[0, 0])
print(y)
print(D(X,w_e))
fig = plt.figure()
ax1 = fig.add_subplot(2, 1, 1)
ax1.plot(range(10000), cost,label='lost')
ax1.set_yscale('log')
ax1.set_xlabel("Iteration")
ax1.set_ylabel("Loss")
ax1.legend(loc='best')
plt.show()
[[1.]
[0.]
[1.]
[0.]
[0.]
[1.]
[0.]
[1.]
[1.]
[1.]]
[[0.63103198]
[0.54273564]
[0.72331236]
[0.55213718]
[0.41564281]
[0.78018675]
[0.49371131]
[0.59834743]
[0.67121118]
[0.86004243]]
损失函数:
增加至迭代30000轮数
import numpy as np
import matplotlib.pyplot as plt
def computeCost(X, y, w_e):
return 1.0 / (2 * len(y)) * np.dot((np.dot(X, w_e) - y).T, (np.dot(X, w_e) - y))
def D(X,w_e):
return np.exp(np.dot(X, w_e)) / (1 + np.exp(-np.dot(X, w_e)))
def gradientDescent(X, y, w_e, alpha):
for j in range(len(w_e)):
w_e[j] = w_e[j] - alpha * 1.0 / len(y) * np.dot((D(X,w_e) - y).T, X[:, j])
return w_e
X = np.random.rand(10, 5)
m = np.ones((10, 1))
X = np.concatenate((X, m), axis=1)
w = np.array([[1.0], [2.0], [3.0], [4.0], [5.0], [10.0]])
y = np.dot(X, w)
a = np.mean(y, axis=0)
for i in range(10):
if y[i][0] > a:
y[i][0] = 1
else: y[i][0] = 0
w_e = np.zeros_like(w)
cost = []
for i in range(30000):
w_e = gradientDescent(X, y, w_e, 0.001)
cost.append(computeCost(X, y, w_e)[0, 0])
print(y)
print(D(X,w_e))
fig = plt.figure()
ax1 = fig.add_subplot(2, 1, 1)
ax1.plot(range(30000), cost,label='lost')
ax1.set_yscale('log')
ax1.set_xlabel("Iteration")
ax1.set_ylabel("Loss")
ax1.legend(loc='best')
plt.show()
损失有点上升!
可以接着减小学习率
import numpy as np
import matplotlib.pyplot as plt
def computeCost(X, y, w_e):
return 1.0 / (2 * len(y)) * np.dot((np.dot(X, w_e) - y).T, (np.dot(X, w_e) - y))
def D(X,w_e):
return np.exp(np.dot(X, w_e)) / (1 + np.exp(-np.dot(X, w_e)))
def gradientDescent(X, y, w_e, alpha):
for j in range(len(w_e)):
w_e[j] = w_e[j] - alpha * 1.0 / len(y) * np.dot((D(X,w_e) - y).T, X[:, j])
return w_e
X = np.random.rand(10, 5)
m = np.ones((10, 1))
X = np.concatenate((X, m), axis=1)
w = np.array([[1.0], [2.0], [3.0], [4.0], [5.0], [10.0]])
y = np.dot(X, w)
a = np.mean(y, axis=0)
for i in range(10):
if y[i][0] > a:
y[i][0] = 1
else: y[i][0] = 0
w_e = np.zeros_like(w)
cost = []
for i in range(30000):
w_e = gradientDescent(X, y, w_e, 0.0001)
cost.append(computeCost(X, y, w_e)[0, 0])
print(y)
print(D(X,w_e))
fig = plt.figure()
ax1 = fig.add_subplot(2, 1, 1)
ax1.plot(range(30000), cost,label='lost')
ax1.set_yscale('log')
ax1.set_xlabel("Iteration")
ax1.set_ylabel("Loss")
ax1.legend(loc='best')
plt.show()