逻辑回归
logistic回归介绍
logistic回归是一种广义线性回归(generalized linear model),与多重线性回归分析有很多相同之处。它们的模型形式基本上相同,都具有 wx + b,其中w和b是待求参数,其区别在于他们的因变量不同,多重线性回归直接将wx+b作为因变量,即y =wx+b,而logistic回归则通过函数L将wx+b对应一个隐状态p,p =L(wx+b),然后根据p 与1-p的大小决定因变量的值。如果L是logistic函数,就是logistic回归,如果L是多项式函数就是多项式回归。
说的更通俗一点,就是logistic回归会在线性回归后再加一层logistic函数的调用。
logistic回归主要是进行二分类预测,我们在激活函数时候讲到过 Sigmod函数,Sigmod函数是最常见的logistic函数,因为Sigmod函数的输出的是是对于0~1之间的概率值,当概率大于0.5预测为1,小于0.5预测为0。
import torch
import torch.nn as nn
import numpy as np
torch.__version__
'1.2.0'
data = np.loadtxt("./data/UCI_German_Credit/german.data-numeric")
print(data)
print(data.shape)
[[ 1. 6. 4. ... 0. 1. 1.]
[ 2. 48. 2. ... 0. 1. 2.]
[ 4. 12. 4. ... 1. 0. 1.]
...
[ 4. 12. 2. ... 0. 1. 1.]
[ 1. 45. 2. ... 0. 1. 2.]
[ 2. 45. 4. ... 0. 1. 1.]]
(1000, 25)
数据读取完后,我们要对数据进行标准差标准化处理(变成均值为0 ,标准差为1 的数组)
n,l = data.shape
for j in range(l-1): #最后一列是label1或者2,不需要归一化处理
meanVal = np.mean(data[:,j])
stdVal = np.std(data[:,j])
data[:,j] = (data[:,j]-meanVal)/stdVal
打乱数据
np.random.shuffle(data)
print(data)
train_data = data[:900,:l-1]
train_label = data[:900,l-1] -1
test_data = data[900:,:l-1]
test_label = data[900:,l-1] -1
print(test_label)
[[ 1.13205258 -0.48976238 1.34401408 ... -0.5 0.76635604
1. ]
[ 1.13205258 1.25257373 1.34401408 ... -0.5 0.76635604
1. ]
[ 1.13205258 -0.48976238 -0.50342796 ... -0.5 0.76635604
1. ]
...
[-1.25456565 -0.48976238 1.34401408 ... -0.5 0.76635604
1. ]
[ 1.13205258 1.00366857 1.34401408 ... -0.5 -1.30487651
1. ]
[ 1.13205258 -0.73866754 1.34401408 ... 2. -1.30487651
1. ]]
[0. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
0. 1. 1. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1.
0. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 1. 1. 0. 0. 0.
1. 0. 0. 0. 0. 1. 0. 0. 1. 1. 0. 0. 0. 0. 1. 1. 0. 1. 1. 1. 1. 0. 0. 0.
1. 0. 0. 0.]
区分训练集和测试集,由于这里没有验证集,所以我们直接使用测试集的准确度作为评判好坏的标准
区分规则:900条用于训练,100条作为测试
german.data-numeric的格式为,前24列为24个维度,最后一个为要打的标签(0,1),所以我们将数据和标签一起区分出来
定义模型:
class LR(nn.Module):
def __init__(self):
super(LR,self).__init__()
self.fc = nn.Linear(24,2)
def forward(self,x):
x = self.fc(x)
out = torch.sigmoid(x)
return out
net_LR = LR()
print(net_LR)
LR(
(fc): Linear(in_features=24, out_features=2, bias=True)
)
定义准确率的计算公式
def test(predict_y,true_label):
t = predict_y.max(-1)[1] == true_label
return torch.mean(t.float())
定义损失函数和优化函数
criterion = nn.CrossEntropyLoss() #交叉熵的损失函数
optim = torch.optim.Adam(net_LR.parameters(),lr=0.001)
eporchs = 2000
开始训练:
for i in range(eporchs):
net_LR.train() #指定为训练模式
#把numpy数据转化为tensor
x = torch.from_numpy(train_data).float()
y = torch.from_numpy(train_label).long()
out = net_LR(x)
loss = criterion(out,y) #把计算损失的参数放到损失函数里
optim.zero_grad() #清零前面的梯度
loss.backward() #反向传播损失
optim.step() #更新参数
if (i+1)%100 == 0 : #每一百批次训练输出相关信息
net_LR.eval() #指定为计算模式
#把np转为为tensor
test_x = torch.from_numpy(test_data).float()
test_y = torch.from_numpy(test_label).long()
#算出预测值
test_out = net_LR(test_x)
#计算准确率
acc = test(test_out,test_y)
print("Epoch:{},Loss:{:.4f},Accuracy:{:.2f}".format(i+1,loss.item(),acc))
Epoch:100,Loss:0.6619,Accuracy:0.65
Epoch:200,Loss:0.6293,Accuracy:0.70
Epoch:300,Loss:0.6084,Accuracy:0.72
Epoch:400,Loss:0.5942,Accuracy:0.75
Epoch:500,Loss:0.5838,Accuracy:0.77
Epoch:600,Loss:0.5757,Accuracy:0.77
Epoch:700,Loss:0.5691,Accuracy:0.79
Epoch:800,Loss:0.5637,Accuracy:0.80
Epoch:900,Loss:0.5590,Accuracy:0.80
Epoch:1000,Loss:0.5551,Accuracy:0.80
Epoch:1100,Loss:0.5517,Accuracy:0.80
Epoch:1200,Loss:0.5487,Accuracy:0.80
Epoch:1300,Loss:0.5461,Accuracy:0.80
Epoch:1400,Loss:0.5438,Accuracy:0.80
Epoch:1500,Loss:0.5417,Accuracy:0.79
Epoch:1600,Loss:0.5399,Accuracy:0.79
Epoch:1700,Loss:0.5382,Accuracy:0.79
Epoch:1800,Loss:0.5367,Accuracy:0.80
Epoch:1900,Loss:0.5353,Accuracy:0.79
Epoch:2000,Loss:0.5340,Accuracy:0.79