一、逻辑回归(logistic regression)介绍
逻辑回归,又称为对数几率回归,虽然它名字里面有回归二字,但是它并不像线性回归一样用来预测数值型数据,相反,它一般用来解决分类任务,特别是二分类任务。
本质上,它是一个percetron再加上一个sigmoid激活函数,如下所示:
然后逻辑回归采用的损失函数是交叉熵:
这里的推导比较长,详细可以看李宏毅老师对逻辑回归的讲解。用一句话归纳就是用最大化后验概率的公式来定义损失函数。
为后验概率乘积,就是所有的x预测正确的概率乘积,x1的预测值(介于0-1),如果x本来的标签是1,那么可以把预测值当作预测正确的概率,如果x本来的标签是0,那么可以把当作预测正确的概率,可以看到这时f(x)越小其概率就越大。
所以我们要最大化L,也就是最小化-lnL, 最后得出交叉熵损失函数。
第三步就是找出最好的W,要找出更新W的方式,即求其偏微分,结果发现其偏微分跟线性回归w的偏微分是一样的,如下。
二、代码实现
from sklearn.datasets import load_svmlight_file
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(x):
"'sigmoid函数'"
s = 1.0 / (1.0 + np.exp(-x))
return s
def batch_generator(X, y, batch_size):
nsamples = len(X)
batch_num = int(nsamples / batch_size)
indexes = np.random.permutation(nsamples)
for i in range(batch_num):
yield (X[indexes[i*batch_size:(i+1)*batch_size]],
y[indexes[i*batch_size:(i+1)*batch_size]])
def cross_entropy_loss(y_true, y_pred):
loss = 0
y_true = np.squeeze(y_true)
y_pred = np.squeeze(y_pred)
for k in range(len(y_pred)):
loss += - y_true[k] * np.log(y_pred[k] + 1e-5) - (
1 - y_true[k]) * np.log(1 - y_pred[k] + 1e-5)
return loss/len(y_pred)
def train(alpha, beta1, beta2, epoches, epsilon, batch_size):
# 初始化参数
X_train, X_test, y_train, y_test = load_data()
colomn = X_train.shape[1]
W = np.random.uniform(low=1.0, high=10.0, size=(1, colomn))
# 训练参数
losses_val = []
m = np.zeros([1, colomn])
v = np.zeros([1, colomn])
t = 1
while t <= epoches:
for x_batch, y_batch in batch_generator(X_train, y_train, batch_size):
y = sigmoid(np.dot(W, x_batch.transpose()))
deviation = y - y_batch.reshape(y.shape)
gradient = 1/len(x_batch)*np.dot(deviation, x_batch)
# Adam优化方法
m = beta1 * m + (1 - beta1) * gradient
derivative_square = np.array([i*i for i in \
np.squeeze(gradient)]).reshape(gradient.shape)
v = beta2 * v + (1 - beta2) * derivative_square
m_hat = m / (1 - pow(beta1, t))
v_hat = v / (1 - pow(beta2, t))
temp = m_hat / (v_hat**0.5 + epsilon)
W = W - alpha * temp
# 在验证集上进行验证
y_pred = sigmoid(np.dot(W, X_test.T))
losses_val.append(cross_entropy_loss(y_test, y_pred))
print("第{}次训练,在验证机上的损失函数值为:{}".format(t, losses_val[t-1]))
t += 1
return losses_val
losses_val = train(0.005, 0.9, 0.999, 200, 1e-6, 500)
注意:为了少展示代码,数据加载函数load_data()没有放上去(太长了),完整代码可以查看:戳这里
三、mxnet代码实现
from mxnet import gluon, init, autograd
X_train, X_test, y_train, y_test = load_data()
net = gluon.nn.Sequential()
net.add(gluon.nn.Dense(1))
net.add(gluon.nn.Activation('sigmoid'))
net.initialize(init.Normal())
loss_fn = gluon.loss.SigmoidBinaryCrossEntropyLoss(from_sigmoid=True)
trainer = gluon.Trainer(net.collect_params(), 'sgd',{'learning_rate':0.05})
epoches = 15
batch_size = 100
dataset = gluon.data.ArrayDataset(X_train, y_train)
data_iterator = gluon.data.DataLoader(dataset, batch_size, shuffle=True)
start = time.time()
loss_train = []
loss_test = []
#%%
for epoch in range(epoches):
print("[INFO] epoch %s is running..."%epoch)
for batch_x, batch_y in data_iterator:
with autograd.record():
ls = loss_fn(net(batch_x), batch_y)
ls.backward()
# step:更新参数 需在backward()后,record()外
trainer.step(batch_size)
l_test = loss_fn(net(X_test), y_test).mean().asscalar()
l_train = loss_fn(net(X_train), y_train).mean().asscalar()
loss_train.append(l_train)
loss_test.append(l_test)
print('weight:{}'.format(net[0].weight.data()))
print('weight:{}'.format(net[0].bias.data()))
print('time cost:{:.2f}'.format(time.time()-start))
plot(loss_train, loss_test)
在这里仅仅展示实现的思路,load_data函数和plot函数省略,完整的代码实现请 戳这里
四、参考资料
2019李宏毅机器学习课程