算法思路
线性回归可以对数据进行拟合,但是对于分类问题,就不是怎么
w
o
r
k
work
work。
我们可以借用同样的思路,取拟合他的其他特征,比如概率,但是概率
∈
[
0
,
1
]
\in[0,1]
∈[0,1],所以这里拟合数据的对数几率函数,即:
ln
p
′
1
−
p
′
=
w
T
x
+
b
\ln\frac{p'}{1-p'}=w^Tx+b
ln1−p′p′=wTx+b
这就是逻辑斯蒂回归,同时也被人称为对数几率回归。
然后我们采用最大似然估计,来估计出
w
,
b
w,b
w,b的值即可。
模型定义
输入:
X
=
{
x
(
1
)
,
x
(
2
)
,
…
,
x
(
K
)
}
,
y
=
{
y
(
1
)
,
y
(
2
)
,
…
,
y
(
K
)
}
X=\{x^{(1)}, x^{(2)}, \dots, x^{(K)}\}, y=\{y^{(1)}, y^{(2)}, \dots, y^{(K)}\}
X={x(1),x(2),…,x(K)},y={y(1),y(2),…,y(K)},其中
x
(
i
)
∈
R
D
,
y
(
i
)
∈
{
0
,
1
}
x^{(i)}\in R^D,y^{(i)}\in\{0,1\}
x(i)∈RD,y(i)∈{0,1}。
输出:
w
,
b
w,b
w,b
模型:
∏
k
=
1
K
(
p
′
(
k
)
y
(
k
)
(
1
−
p
′
(
k
)
)
1
−
y
(
k
)
)
\prod_{k=1}^K({p'^{(k)}}^{y^{(k)}}{(1-p'^{(k)})}^{1-y^{(k)}})
∏k=1K(p′(k)y(k)(1−p′(k))1−y(k))
模型推导
L
(
w
,
b
∣
X
,
y
)
=
∏
k
=
1
K
(
p
′
(
k
)
y
(
k
)
(
1
−
p
′
(
k
)
)
1
−
y
(
k
)
)
ln
L
(
w
,
b
∣
X
,
y
)
=
∑
k
=
1
K
y
(
k
)
ln
p
′
(
k
)
+
(
1
−
y
(
k
)
)
ln
(
1
−
p
′
(
k
)
)
=
∑
k
=
1
K
y
(
k
)
ln
p
′
(
k
)
1
−
p
′
(
k
)
+
ln
(
1
−
p
′
(
k
)
)
=
∑
k
=
1
K
y
(
k
)
(
w
T
x
(
k
)
+
b
)
−
ln
(
1
+
exp
(
w
T
x
(
k
)
+
b
)
)
\begin{aligned} L(w,b|X,y) &= \prod_{k=1}^K({p'^{(k)}}^{y^{(k)}}{(1-p'^{(k)})}^{1-y^{(k)}})\\ \ln L(w,b|X,y)&= \sum_{k=1}^K y^{(k)}\ln p'^{(k)} + (1-y^{(k)})\ln (1-p'^{(k)})\\ &= \sum_{k=1}^K y^{(k)}\ln\frac{p'^{(k)}}{1-p'^{(k)}}+\ln (1-p'^{(k)})\\ &= \sum_{k=1}^Ky^{(k)}(w^Tx^{(k)}+b) - \ln(1+\exp(w^Tx^{(k)}+b)) \end{aligned}
L(w,b∣X,y)lnL(w,b∣X,y)=k=1∏K(p′(k)y(k)(1−p′(k))1−y(k))=k=1∑Ky(k)lnp′(k)+(1−y(k))ln(1−p′(k))=k=1∑Ky(k)ln1−p′(k)p′(k)+ln(1−p′(k))=k=1∑Ky(k)(wTx(k)+b)−ln(1+exp(wTx(k)+b))
对其求偏导得:
∂
ln
L
(
w
,
b
∣
X
,
y
)
∂
w
=
∑
k
=
1
K
y
(
k
)
x
(
k
)
−
exp
(
w
T
x
(
k
)
+
b
)
x
(
k
)
1
+
exp
(
w
T
x
(
k
)
+
b
)
=
∑
k
=
1
K
(
y
(
k
)
−
p
′
(
k
)
)
x
(
k
)
\begin{aligned} \frac{\partial \ln L(w,b|X,y)}{\partial w} &= \sum_{k=1}^Ky^{(k)}x^{(k)}-\frac{\exp(w^Tx^{(k)}+b)x^{(k)}}{1+\exp(w^Tx^{(k)}+b)}\\ &=\sum_{k=1}^K(y^{(k)}-p'^{(k)})x^{(k)} \end{aligned}
∂w∂lnL(w,b∣X,y)=k=1∑Ky(k)x(k)−1+exp(wTx(k)+b)exp(wTx(k)+b)x(k)=k=1∑K(y(k)−p′(k))x(k)
∂
ln
L
(
w
,
b
∣
X
,
y
)
∂
b
=
∑
k
=
1
K
y
(
k
)
−
exp
(
w
T
x
(
k
)
+
b
)
1
+
exp
(
w
T
x
(
k
)
+
b
)
\begin{aligned} \frac{\partial \ln L(w,b|X,y)}{\partial b} &= \sum_{k=1}^Ky^{(k)}-\frac{\exp(w^Tx^{(k)}+b)}{1+\exp(w^Tx^{(k)}+b)} \end{aligned}
∂b∂lnL(w,b∣X,y)=k=1∑Ky(k)−1+exp(wTx(k)+b)exp(wTx(k)+b)
代码实现
% acc: 99.16%
import numpy as np
def load_data(file_name):
data, label = [], []
fr = open(file_name, 'r')
for line in fr.readlines():
line = line.strip().split(',')
if int(line[0]) == 0:
label.append(1)
else:
label.append(0)
data.append([int(num)/255 for num in line[1:]])
return data, label
def logistic(data, label, epochs=300, lr=0.01):
data, label = np.array(data).T, np.array(label)
dims, nums = data.shape
weights, bias = np.zeros((dims, )), 0
for epoch in range(epochs):
for k in range(nums):
weights_mul_x = np.dot(weights.T, data[:, k]) + bias
p = np.exp(weights_mul_x) / (1 + np.exp(weights_mul_x))
weights += lr * (label[k] - p) * data[:, k]
bias += lr * (label[k] - p)
return weights, bias
def predict(data, weights, bias):
data = np.array(data).T
label = []
for k in range(data.shape[1]):
weights_mul_x = np.dot(weights.T, data[:, k]) + bias
p = np.exp(weights_mul_x) / (1 + np.exp(weights_mul_x))
label.append(1 if p > 0.5 else 0)
return label
if __name__ == '__main__':
train_data, train_label = load_data('../data/mnist_train.csv')
test_data, test_label = load_data('../data/mnist_test.csv')
weights, bias = logistic(train_data, train_label)
y_hat = predict(test_data, weights, bias)
error = 0
for y1, y2 in zip(test_label, y_hat):
if y1 != y2:
error += 1
print('acc: {}%'.format(100 - error / len(y_hat) * 100))