PyTorch处理多维特征输入的案例


写在前面

本文将用一个糖尿患者病情数据分析的分类案例,借助PyTorch来搭建人工智能神经网络1【这是深度学习数学原理专题系列的第五篇文章】


案例分析

本例中糖尿患者的病情数据文件为diabetes.csv.gz,其中的部分数据如下图所示,其中每行代表一位患者的病情数据,前8列为某八项身体检查数据值,最后一列代表该患者是否患病,如0表示健康,1表示患病,这显然是个二分类问题。该任务要达成的目标是,输入某位病人的前八项检查值,输出该病人是否患糖尿病。

在这里插入图片描述

先修知识

深度学习少不了要进行数据处理,为了更好地理解本例,接下来先简单地讲解一下list列表的数据操作,并引申到矩阵(分别用NumPy和PyTorch来创建)的数据处理操作。Talk is cheap, show me the code.

list格式数据处理

首先是对list列表格式的数据进行操作:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 2021/12/1 14:35
# @Author  : William Baker
# @FileName: demo_list.py
# @Software: PyCharm
# @Blog    : https://blog.csdn.net/weixin_43051346

list = [1, 5, 6, 9, -5]
print(type(list))       # <class 'list'>

a = list[0:-1]    # [1, 5, 6, 9]
b = list[:-1]
c = list[-1]
print(a)
print(b)
print(c)

输出为:

<class 'list'>
[1, 5, 6, 9]
[1, 5, 6, 9]
-5

NumPy创建的矩阵处理之按行按列选取数据

接着是对矩阵数据进行处理,将分别使用NumPyPyTorch来创建矩阵数据,大家可以对比这两种创建方法的异同。这里科普一个小知识,目前的普通显卡处理或运算的数据格式默认都是float32,除非那种比较高端的显卡才有可能进行double类型数据的运算。

首先来看用NumPy创建矩阵:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 2021/12/1 14:59
# @Author  : William Baker
# @FileName: demo_tensor_matrix.py
# @Software: PyCharm
# @Blog    : https://blog.csdn.net/weixin_43051346

import torch
import numpy as np


# 用 NumPy 创建一个两行六列的矩阵
matrix_numpy = np.array([[0.5, 0.6, 0.2, 0.3, 0.5, 0],
                     [0.1, 0.6, 0.9, 0.5, 0.8, 1]], dtype=np.float32)
print(matrix_numpy)
print(matrix_numpy.dtype)       # float32
print(matrix_numpy.shape)       # (2, 6)

#  取出矩阵的前两列
print(matrix_numpy[:, :-1])
print(matrix_numpy[:, [-1]])

输出为:

[[0.5 0.6 0.2 0.3 0.5 0. ]
 [0.1 0.6 0.9 0.5 0.8 1. ]]
float32
(2, 6)
[[0.5 0.6 0.2 0.3 0.5]
 [0.1 0.6 0.9 0.5 0.8]]
[[0.]
 [1.]]

PyTorch创建的矩阵处理之按行按列选取数据

接着用PyTorch来创建矩阵:

# 用 PyTorch 创建一个两行六列的矩阵
matrix_torch = torch.Tensor([[0.5, 0.6, 0.2, 0.3, 0.5, 0],
                     [0.1, 0.6, 0.9, 0.5, 0.8, 1]])
print(matrix_torch)
print(matrix_torch.shape)       # torch.Size([2, 6])
print(matrix_torch.dtype)       # torch.float32

#  取出矩阵的前两列
print(matrix_torch[:, :-1])
print(matrix_torch[:, [-1]])

输出为:

tensor([[0.5000, 0.6000, 0.2000, 0.3000, 0.5000, 0.0000],
        [0.1000, 0.6000, 0.9000, 0.5000, 0.8000, 1.0000]])
torch.Size([2, 6])
torch.float32
tensor([[0.5000, 0.6000, 0.2000, 0.3000, 0.5000],
        [0.1000, 0.6000, 0.9000, 0.5000, 0.8000]])
tensor([[0.],
        [1.]])

torch.where()函数讲解

这里介绍几个PyTorch的内置函数,后面计算评价指标acc的时候会用到,首先来看torch.where()函数,我们不妨先看下PyTorch官方文档给出的讲解
在这里插入图片描述
由此可知,torch.where()函数类似于一个三目运算符,如果符合条件condition,则返回x,否则返回y,注意这里的返回值类型是tensor

获取PyTorch矩阵的行数或列数

数据处理中,如果要获取该矩阵的所有行数或列数,该怎么办呢?我们通过一个小例子来进行讲解:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 2021/12/1 16:49
# @Author  : William Baker
# @FileName: demo_torch_tensor.py
# @Software: PyCharm
# @Blog    : https://blog.csdn.net/weixin_43051346

import torch

a = torch.Tensor([[0.],
                 [1.],
                 [0.],
                 [1.0]])
print(a)
print(a.size)   # <built-in method size of Tensor object at 0x000002146F3C5368>
print(a.size(0))     # 4 , 表示4行数据
print(a.size(1))     # 1, 表示1列数据
print(a.size(-1))    # 1

print('************************************************')
b = torch.Tensor([[0.5, 0.6, 0.2, 0.3, 0.5, 0],
                  [0.1, 0.6, 0.9, 0.5, 0.8, 1]])
print(b.size)   # <built-in method size of Tensor object at 0x000001D08459A3B8>
print(b.size(0))  # 2, 表示第一个数字,即行的数目
print(b.size(1))  # 6, 表示二个数字,即列的数目
print(b.size(-1))  # 6, 表示最后一个个数字,即列的数目

输出为:

tensor([[0.],
        [1.],
        [0.],
        [1.]])
<built-in method size of Tensor object at 0x000001D0847359A8>
4
1
1
************************************************
<built-in method size of Tensor object at 0x000001D08459A3B8>
2
6
6

本案例的PyTorch代码实现

有了上面的数据处理的知识铺垫,我们便知道如何读取本例糖尿病患者的病情数据了。具体的讲解尽在注释中:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 2021/11/30 19:50
# @Author  : William Baker
# @FileName: lesson7_multi_dim.py
# @Software: PyCharm
# @Blog    : https://blog.csdn.net/weixin_43051346
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
import numpy as np
import torch
import matplotlib.pyplot as plt

xy = np.loadtxt('./dataset/diabetes.csv.gz', delimiter=',', dtype=np.float32)
x_data = torch.from_numpy(xy[:, :-1])    #  :-1 表示最后1列数据不要,第一个:处理行,读取所有行
										 # 第二个:对列处理,从第一列开始,最后一列不要
y_data = torch.from_numpy(xy[:, [-1]])   # 表示只取最后一列,最后得到的是个矩阵
# print(y_data)
# print(y_data.size(0))  # # 759,即759行数据,表示759位患者的病情数据
# print(y_data.size)      # <built-in method size of Tensor object at 0x0000021ACF024B38>
# print(y_data.size(1))   # 1

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear1 = torch.nn.Linear(8, 6)
        self.linear2 = torch.nn.Linear(6, 4)
        self.linear3 = torch.nn.Linear(4, 1)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.sigmoid(self.linear1(x))
        x = self.sigmoid(self.linear2(x))
        x = self.sigmoid(self.linear3(x))
        return x

model = Model()

criterion = torch.nn.BCELoss(reduction='mean')
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

epoch_list = []
loss_list = []
for epoch in range(100):
    # Forward
    epoch_list.append(epoch)

    y_pred = model(x_data)
    loss = criterion(y_pred, y_data)
    print('Epoch:{}, Loss:{}'.format(epoch, loss.item()))

    loss_list.append(loss.item())

    # Backward
    optimizer.zero_grad()
    loss.backward()

    # Update
    optimizer.step()

plt.plot(epoch_list, loss_list)
plt.xlabel('epoch')
plt.ylabel('loss')
plt.show()

# 参数说明
# 查看某些层的参数,以查看神经网络的第一层参数为例
layer1_weight = model.linear1.weight.data
layer1_bias = model.linear1.bias.data
print("layer1_weight", layer1_weight)
print("layer1_weight.shape", layer1_weight.shape)
print("layer1_bias", layer1_bias)
print("layer1_bias.shape", layer1_bias.shape)

输出结果为:

Epoch:0, Loss:0.6519893407821655
Epoch:1, Loss:0.6513597965240479
Epoch:2, Loss:0.6507869362831116
Epoch:3, Loss:0.6502658128738403
Epoch:4, Loss:0.6497914791107178
Epoch:5, Loss:0.6493598818778992
Epoch:6, Loss:0.6489669680595398
Epoch:7, Loss:0.6486091613769531
Epoch:8, Loss:0.6482835412025452
Epoch:9, Loss:0.647986888885498
Epoch:10, Loss:0.6477167010307312
Epoch:11, Loss:0.6474705934524536
Epoch:12, Loss:0.6472463607788086
Epoch:13, Loss:0.647041916847229
Epoch:14, Loss:0.6468556523323059
Epoch:15, Loss:0.646685779094696
Epoch:16, Loss:0.6465309262275696
Epoch:17, Loss:0.6463896632194519
Epoch:18, Loss:0.6462607979774475
Epoch:19, Loss:0.6461433172225952
Epoch:20, Loss:0.6460360884666443
Epoch:21, Loss:0.6459380388259888
Epoch:22, Loss:0.6458486914634705
Epoch:23, Loss:0.6457669734954834
Epoch:24, Loss:0.6456923484802246
Epoch:25, Loss:0.6456241607666016
Epoch:26, Loss:0.6455617547035217
Epoch:27, Loss:0.6455047726631165
Epoch:28, Loss:0.6454525589942932
Epoch:29, Loss:0.6454048156738281
Epoch:30, Loss:0.645361065864563
Epoch:31, Loss:0.6453210115432739
Epoch:32, Loss:0.6452842950820923
Epoch:33, Loss:0.6452506184577942
Epoch:34, Loss:0.6452196836471558
Epoch:35, Loss:0.6451913118362427
Epoch:36, Loss:0.645165205001831
Epoch:37, Loss:0.6451411843299866
Epoch:38, Loss:0.6451191902160645
Epoch:39, Loss:0.6450988054275513
Epoch:40, Loss:0.6450800895690918
Epoch:41, Loss:0.6450628042221069
Epoch:42, Loss:0.6450467705726624
Epoch:43, Loss:0.6450319886207581
Epoch:44, Loss:0.6450183391571045
Epoch:45, Loss:0.6450056433677673
Epoch:46, Loss:0.6449939012527466
Epoch:47, Loss:0.6449829339981079
Epoch:48, Loss:0.6449727416038513
Epoch:49, Loss:0.6449632048606873
Epoch:50, Loss:0.6449543833732605
Epoch:51, Loss:0.6449460387229919
Epoch:52, Loss:0.6449382901191711
Epoch:53, Loss:0.6449309587478638
Epoch:54, Loss:0.6449241042137146
Epoch:55, Loss:0.6449176073074341
Epoch:56, Loss:0.6449114680290222
Epoch:57, Loss:0.6449057459831238
Epoch:58, Loss:0.6449002623558044
Epoch:59, Loss:0.644895076751709
Epoch:60, Loss:0.6448901295661926
Epoch:61, Loss:0.6448853611946106
Epoch:62, Loss:0.6448808312416077
Epoch:63, Loss:0.6448764801025391
Epoch:64, Loss:0.6448723077774048
Epoch:65, Loss:0.6448683738708496
Epoch:66, Loss:0.6448644995689392
Epoch:67, Loss:0.6448607444763184
Epoch:68, Loss:0.6448571681976318
Epoch:69, Loss:0.6448536515235901
Epoch:70, Loss:0.6448502540588379
Epoch:71, Loss:0.6448469758033752
Epoch:72, Loss:0.6448436975479126
Epoch:73, Loss:0.6448405385017395
Epoch:74, Loss:0.6448374390602112
Epoch:75, Loss:0.6448343992233276
Epoch:76, Loss:0.6448314785957336
Epoch:77, Loss:0.6448284983634949
Epoch:78, Loss:0.6448256373405457
Epoch:79, Loss:0.6448228359222412
Epoch:80, Loss:0.6448200345039368
Epoch:81, Loss:0.6448173522949219
Epoch:82, Loss:0.6448145508766174
Epoch:83, Loss:0.6448119282722473
Epoch:84, Loss:0.6448091864585876
Epoch:85, Loss:0.6448065638542175
Epoch:86, Loss:0.6448039412498474
Epoch:87, Loss:0.6448012590408325
Epoch:88, Loss:0.644798755645752
Epoch:89, Loss:0.6447961926460266
Epoch:90, Loss:0.6447936296463013
Epoch:91, Loss:0.6447911262512207
Epoch:92, Loss:0.6447885632514954
Epoch:93, Loss:0.6447860598564148
Epoch:94, Loss:0.644783616065979
Epoch:95, Loss:0.6447811126708984
Epoch:96, Loss:0.6447786092758179
Epoch:97, Loss:0.6447760462760925
Epoch:98, Loss:0.6447736024856567
Epoch:99, Loss:0.644771158695221
layer1_weight tensor([[-0.1826, -0.2763,  0.3404, -0.1967, -0.0764,  0.1359, -0.2059,  0.0878],
        [ 0.3425, -0.0640, -0.0543, -0.1270, -0.3324,  0.0529, -0.2476,  0.0894],
        [ 0.2963,  0.3316,  0.1952,  0.3370,  0.0224, -0.1574,  0.1895, -0.3503],
        [-0.2971,  0.2526,  0.0474, -0.3225, -0.1425, -0.2970, -0.0683, -0.1422],
        [-0.1315,  0.3150,  0.3251, -0.2655,  0.0755, -0.2877,  0.3362, -0.2323],
        [ 0.2835, -0.0293,  0.2572,  0.2179, -0.0380, -0.3153, -0.3236,  0.0947]])
layer1_weight.shape torch.Size([6, 8])
layer1_bias tensor([ 0.1976,  0.1445,  0.0606,  0.2860, -0.2966, -0.1678])
layer1_bias.shape torch.Size([6])

在这里插入图片描述
由于只训练了100个Epoch,最后得到的loss值还有很大的下降空间,接下来我们不妨直接训练1000,000个Epoch,每100,000个Epoch输出一次loss值和acc值,并对网络层稍作修改,评价指标加上准确率acc,代码如下2

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 2021/12/1 11:17
# @Author  : William Baker
# @FileName: lesson7_multi_dim_acc.py
# @Software: PyCharm
# @Blog    : https://blog.csdn.net/weixin_43051346
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
import numpy as np
import torch
import matplotlib.pyplot as plt

# prepare dataset
xy = np.loadtxt('./dataset/diabetes.csv.gz', delimiter=',', dtype=np.float32)
# print(xy)
x_data = torch.from_numpy(xy[:, :-1])
# print("input data.shape", x_data.shape)   # input data.shape torch.Size([759, 8])
y_data = torch.from_numpy(xy[:, [-1]])
# print(y_data.size(0))  # 759,即759行数据,表示759位患者的病情数据

# print(x_data.shape)
# design model using class
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear1 = torch.nn.Linear(8, 6)
        self.linear2 = torch.nn.Linear(6, 4)
        self.linear3 = torch.nn.Linear(4, 2)
        self.linear4 = torch.nn.Linear(2, 1)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.sigmoid(self.linear1(x))
        x = self.sigmoid(self.linear2(x))
        x = self.sigmoid(self.linear3(x))
        x = self.sigmoid(self.linear4(x))    # y_hat
        return x

model = Model()

criterion = torch.nn.BCELoss(reduction='mean')
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

epoch_list = []
loss_list = []
for epoch in range(1000000):   #
    epoch_list.append(epoch)
    # 前向传播
    y_pred = model(x_data)   # 将数据喂给网络模型,并得到输出
    loss = criterion(y_pred, y_data)    # 将网络输出的预测值y_hat和真实值y求loss
    # print('Epoch:{}, Loss:{}'.format(epoch, loss.item()))
    loss_list.append(loss.item())
    # 反向传播
    optimizer.zero_grad()   # 先把优化器的权重清零
    loss.backward()
    optimizer.step()    # 优化更新

    if epoch % 100000 == 99999:     # 每十万个Epoch输出一次loss值和准确率acc
        # 计算准确率acc
        y_pred_label = torch.where(y_pred >= 0.5, torch.tensor([1.0]), torch.tensor([0.0]))  # 如果预测概率值大于0.5,则定义为类别1,否则为类别0
        acc = torch.eq(y_pred_label, y_data).sum().item() / y_data.size(0)
        # print("loss=", loss.item(), "acc=", acc)
        print('loss={}, acc={}'.format(loss.item(), acc))

# 绘图
plt.plot(epoch_list, loss_list)
plt.xlabel('epoch')
plt.ylabel('loss')
plt.show()

输出为:

loss=0.4322426915168762, acc=0.7918313570487484
loss=0.3899582028388977, acc=0.8036890645586298
loss=0.3573305308818817, acc=0.8168642951251647
loss=0.3199097216129303, acc=0.8590250329380764
loss=0.29373297095298767, acc=0.8669301712779973
loss=0.28183719515800476, acc=0.8695652173913043
loss=0.2720341682434082, acc=0.8682476943346509
loss=0.2679903209209442, acc=0.8735177865612648
loss=0.26615986227989197, acc=0.8761528326745718
loss=0.26489314436912537, acc=0.8827404479578392

可见,模型的准确率在不断上升,小伙伴们感兴趣的话可以再继续多训练更多的Epoch,来观察准确率的变化。
在这里插入图片描述


写到这里,差不多本文也就要结束了,如有错误,敬请指正。如果我的这篇文章帮助到了你,那我也会感到很高兴,一个人能走多远,在于与谁同行


参考文章


  1. 《PyTorch深度学习实践》完结合集 - 07.处理多维特征的输入
    ↩︎

  2. PyTorch 深度学习实践 第7讲
    ↩︎

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值