目录
一、框架结构图
二、导入数据集
a = np.loadtxt('./数据集/test.csv',dtype=int,delimiter=',')
./表示在该项目的列表下,dtype表示读取数据的格式,大部分是nimpy.float32格式。delimiter表示的是告诉系统你要以什么符号作为文本分隔符
(一)切片
a = np.loadtxt('./数据集/test.csv',dtype=int,delimiter=',')
x = torch.from_numpy(a[:,:])
y = torch.from_numpy(a[:,-1])
z = torch.from_numpy(a[:,[-1]])
print(a)
print(x)
print(y)
print(z)
一定要注意a[:,-1]和a[:,[-1]]的区别
(二):导入x、y数据
x_data = torch.from_numpy(data[:,:-1])
y_data = torch.from_numpy(data[:,[-1]])
数据集最后一列是y的值(是否发病的概率),x_data是N*8的Tensor,表示的是有N个数据,每个数据有8个维度。(N个病人人,每个病人有8个指标)
三、定义模型
(一)初始化
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear1 = torch.nn.Linear(8,6)
self.linear2 = torch.nn.Linear(6,4)
self.linear3 = torch.nn.Linear(4,1)
self.activate = torch.nn.Sigmoid()
这里选用的激活函数是sigmoid,也可以使用其他的如Relu(),但是使用relu有风险,在计算损失函数(这里我们选择用交叉熵的时候会计算ln值,这时会出现ln(0)的情况,所以最后一层输出函数应该选择为sigmoid函数)
(二)定义前向函数
def forward(self,x):
y1 = self.activate(self.linear1(x))
y2 = self.activate(self.linear2(y1))
y3 = self.activate(self.linear3(y2))
return y3
这里写有y\y2\y3的目的是为了理解这个顺序,其实为了简化应该都用x
def forward(self,x):
x = self.activate(self.linear1(x))
x= self.activate(self.linear2(x))
x = self.activate(self.linear3(x))
return x
四、定义损失函数和优化器
loss = torch.nn.BCELoss(reduction='mean')
optimize = torch.optim.SGD(model.parameters(),lr=0.01)
五、定义训练循环
for times in range(100):
y_pre = model(x_data)
loss1 = loss(y_pre, y_data)
print("第%d次" % times)
print(loss1.item())
optimize.zero_grad()
loss1.backward()
optimize.step()
注意梯度清零,反向传播,梯度更新的顺序
六、测试
x_test = torch.Tensor([0,0,0,0,0,0,0,0])
y_test =model(x_test)
print("测试结果", y_test.data)
表示在x_test的数据下,病人得病的概率为0.4538
七、画图
times_list = []
loss_list = []
times_list.append(times)
loss_list.append(loss1.data)
plt.plot(times_list,loss_list)
plt.xlabel('次数')
plt.ylabel('损失值')
plt.show()
可以看出,sigmoid回归效果高于Relu
注意:如果画图出现报错,在代码前加上这一段代码
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
整体代码
import numpy as np
import torch
import matplotlib.pyplot as plt
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
data = np.loadtxt('diabetes.csv',dtype=np.float32,delimiter=',')
x_data = torch.from_numpy(data[:,:-1])
y_data = torch.from_numpy(data[:,[-1]])
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear1 = torch.nn.Linear(8,6)
self.linear2 = torch.nn.Linear(6,4)
self.linear3 = torch.nn.Linear(4,1)
self.activate = torch.nn.ReLU()
def forward(self,x):
x = self.activate(self.linear1(x))
x= self.activate(self.linear2(x))
x = torch.sigmoid(self.linear3(x))
return x
model = Model()
loss = torch.nn.BCELoss(reduction='mean')
optimize = torch.optim.SGD(model.parameters(),lr=0.01)
times_list = []
loss_list = []
for times in range(1000):
y_pre = model(x_data)
loss1 = loss(y_pre, y_data)
print("第%d次" % times)
print(loss1.item())
optimize.zero_grad()
loss1.backward()
optimize.step()
times_list.append(times)
loss_list.append(loss1.data)
x_test = torch.Tensor([0,0,0,0,0,0,0,0])
y_test =model(x_test)
print("测试结果", y_test.data)