从零开始的pytorch小白使用指北

Pytorch是什么

PyTorch是一个开源的Python机器学习库,提供两个高级功能:

  1. 具有强大的GPU加速的张量计算(如NumPy)

Pytorch向numpy做了广泛兼容,很多numpy的函数接口在pytorch下同样适用,
因此在安装pytorch和numpy时要注意版本兼容。

  1. 包含自动求导系统的深度神经网络

Pytorch搭建神经网络

>>> import torch  
>>> import numpy 

数据准备

预备知识

tensor:张量,用于做矩阵的运算,十分适合在GPU上跑。
构建张量

>>> a = torch.tensor([[1,2,3],[4,5,6]],dtype=float)	# 创建一个指定元素的张量
>>> a.size()	
torch.Size([2, 3])
>>> a	
tensor([[1., 2., 3.],
        [4., 5., 6.]], dtype=torch.float64)

张量的size
实际中常用的数据主要包括以下几种类型:

  1. 标量数据
    1D张量,如数据集的标签y,形状为tensor([samples])
  2. 向量数据
    2D张量,形状为 [samples, features]
  3. 时间序列数据或序列数据
    3D张量,形状为 [samples, timesteps, features]
  4. 图像
    4D张量,形状为 [samples, height, width, channels] 或 [samples, channels, height, width]
  5. 视频
    5D张量,形状为 [samples, frames, height, width, channels)]或 [samples, frames, channels, height, width]

一些常用的张量构造函数

>>> b = torch.full((2,3),3.14)	# 创建一个全为某一数值的张量,可以作为初始化矩阵
>>> b
tensor([[3.1400, 3.1400, 3.1400],
        [3.1400, 3.1400, 3.1400]])
>>> c = torch.empty(2,3) # 返回一个未初始化的指定形状的张量,每次的运行结果都不同
>>> c
tensor([[3.6893e+19, 2.1425e+00, 5.9824e-38],
        [1.7449e-39, 6.1124e-38, 1.4013e-45]])

张量运算

>>> #加法运算
>>> a+b
tensor([[4.1400, 5.1400, 6.1400],
        [7.1400, 8.1400, 9.1400]])
        
>>> torch.add(a,b,out=c) # 将指定张量作为输出,若不指定直接返回该值
>>> c
tensor([[4.1400, 5.1400, 6.1400],
        [7.1400, 8.1400, 9.1400]])
        
>>> a.add(b) #返回一个新的张量,原张量不变
>>> a
tensor([[1., 2., 3.],
        [4., 5., 6.]], dtype=torch.float64)
        
>>> a.add_(b) # 张量操作加'_'后,原张量值改变
>>> a
tensor([[4.1400, 5.1400, 6.1400],
        [7.1400, 8.1400, 9.1400]], dtype=torch.float64)
# 乘法运算
>>> x = torch.rand(3,5)
>>> y = torch.rand(3,5)
>>> z = torch.rand(5,3)
>>> torch.mul(x,y) # 矩阵点乘
tensor([[0.5100, 0.2523, 0.3176, 0.7665, 0.7989],
        [0.3799, 0.0779, 0.1352, 0.5991, 0.0338],
        [0.0375, 0.1444, 0.6965, 0.0810, 0.1836]])
>>> torch.matmul(x,z) #矩阵叉乘
tensor([[0.9843, 2.0296, 1.3304],
        [0.7903, 1.4374, 1.1705],
        [1.0530, 1.9607, 0.9520]])
# 常用运算
>>> a = torch.tensor([[1,2,3],[4,5,6]])
>>> torch.max(a) # 所有张量元素最大值
tensor(6)
>>> torch.max(a,0) # 沿着第0维求最大值
torch.return_types.max(values=tensor([4, 5, 6]),indices=tensor([1, 1, 1]))
>>> torch.max(a,1) # 沿着第1维求最大值
torch.return_types.max(values=tensor([3, 6]),indices=tensor([2, 2]))
>>> # min,mean,sum,prod同理

numpy桥
numpy.array和torch.tensor可以互相转化

>>> a = np.ones(3)
>>> a
array([1., 1., 1.])
>>> aa = torch.from_numpy(a) # array转为tensor
>>> aa
tensor([1., 1., 1.], dtype=torch.float64)
>>> b = torch.ones(3)
>>> b
tensor([1., 1., 1.])
>>> bb = b.numpy() # tensor转为array
>>> bb
array([1., 1., 1.], dtype=float32)

构建data_iterator

x = torch.linspace(1, 10, 10)
y = torch.linspace(10, 1, 10)
# 把数据放在数据库中
torch_dataset = torch.utils.data.TensorDataset(x, y)
loader = torch.utils.data.DataLoader(
    # 从数据库中每次抽出batch size个样本
    dataset=torch_dataset,
    batch_size=5,
    shuffle=True,
    num_workers=2,
)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)
# test_loader同理
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

调用data_iterator

for epoch in range(3):
    for step, (batch_x, batch_y) in enumerate(loader):
        print("steop:{}, batch_x:{}, batch_y:{}".format(step, batch_x, batch_y))

在这里插入图片描述

模型准备

预备知识:神经网络层

torch.nn中封装了许多预定义的网络层,可以直接用于构造神经网络

  1. Embedding层
import torch.nn as nn
'''
vocab_num:词典长度
embed_dim:输出维度
nn.embedding本质是一个lookup table,存储了固定大小的dictionary;输入indices,获取indices对应的word embedding向量。
'''
embedding = nn.Embedding(10, 3) 
embedding.weight	# 可以看到输出的embedding.weight就是一个词典
>>> tensor([[ 1.2402, -1.0914, -0.5382],
        [-1.1031, -1.2430, -0.2571],
        [ 1.6682, -0.8926,  1.4263],
        [ 0.8971,  1.4592,  0.6712],
        [-1.1625, -0.1598,  0.4034],
        [-0.2902, -0.0323, -2.2259],
        [ 0.8332, -0.2452, -1.1508],
        [ 0.3786,  1.7752, -0.0591],
        [-1.8527, -2.5141, -0.4990],
        [-0.6188,  0.5902, -0.0860]], requires_grad=True)
# 输入2个句子,每个句子4个单词,而单词是用词典的index表示的
data = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
a = embedding(input)      
print(a)	# 输出的就是对应词典中的embedding
>>> tensor([[[-1.1031, -1.2430, -0.2571],     
         [ 1.6682, -0.8926,  1.4263],    
         [-1.1625, -0.1598,  0.4034],
         [-0.2902, -0.0323, -2.2259]],
        [[-1.1625, -0.1598,  0.4034],
         [ 0.8971,  1.4592,  0.6712],
         [ 1.6682, -0.8926,  1.4263],
         [-0.6188,  0.5902, -0.0860]]], grad_fn=<EmbeddingBackward>)
  1. nn.Embedding.from_pretrained()
    在NLP网络中,通常需要将输入的文本先通过一层embedding编码成向量。上面介绍的nn.Embedding通过查表随机初始化,而from_pretrained可以调用预训练embedding模型,如word2vec,glove等。
    需要准备的数据
数据类型元素示例
word2iddictword2id[word]=idword2id[‘you’]=3
id2worddictid2word[id]=wordid2word[3]=‘you’
id2embdictid2emb[id]=embid2emb[3]=[1,3,2,6]
weigthttensorweight[id]=embweight[3]=[1,3,2,6]
# weight:预训练模型的embedding,tensor形式,这里假设corpus总共2个单词:
weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])
embedding = nn.Embedding.from_pretrained(weight)
# 传入embedding的向量是文本的对应id,即word2id[word]
input = torch.LongTensor([1])
print(embedding(input))
--------
tensor([[4.0000, 5.1000, 6.3000]])
  1. Linear层
    全连接层,将input_dim维数的向量转换为output_dim维。
# 输入二维:(batch_size,input_dim)
input = torch.randn(128, 20)
m = nn.Linear(20, 40)	#nn.Linear(input_dim,output_dim)
output = m(input)
print(output.size())  
>>>[(12840])	# (batch_size,output_size)
# 输入四维,最后一维是input_dim
input = torch.randn(512, 3,128,128)
m = nn.Linear(128, 64)	 #nn.Linear(input_dim,output_dim)
output = m(input)
print(output.size())  
>>>[(512, 3,128,64))	# 最后一维是output_dim
  1. Dropout
    Dropout一般用在Liner后,参数p表示神经元以p的概率不被激活,从全连接变为部分连接。
    只能用于train不能用于test。
class Net(nn.Module):
	def __init__(self):
		super(Net, self).__init__()
		self.linear = nn.Linear(20, 40)
		self.dropout = nn.Dropout(p = 0.3) 

	def forward(self, inputs):
		out = self.linear(inputs)
		out = self.dropout(out)
		return out

net = Net()
# Net只能用在train而不能用在test	
  1. Conv2d层
    输入:(N,C_in​,H_in,W_in)
    输出:(N,C_out​,H_out,W_out)
    其中N:batch_size,C:channel_num, H: token_num(可以理解为矩阵的高度),W:embedding_dim(矩阵的宽度,对文本,W_in=W_out)
import torch.nn as nn
'''
对textCNN,in_channels=1
out_channel: 每个卷积核对应输出channel个数
kernel_size: 每个卷积核大小(二维),多个卷积核需写成tuple
padding: 对文本,在H方向上卷积,W方向padding=0;(1,0)表示文本上下各padding一个token。
'''
conv1 = nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=(1,0))
  1. Module:一个神经网络的层次,可以直接调用全连接层,卷积层,等等神经网络。被自定义的神经网络类所继承。
    自定义一个神经网络class通常包括两部分:__init__定义网络层,forward函数定义网络计算函数。
# Example 1
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool  = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1   = nn.Linear(16*5*5, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

NetModel = Net()
# Example 2
class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
        y_pred = self.output_linear(h_relu)
        return y_pred
        
DynamicModel = DynamicNet(D_in, H, D_out)

模型训练

模型训练的主题代码由2层循环、2个定义和5句函数组成:

# 两个定义:
optimizer = 
criterion = 
# 两层循环:
for epoch in num_epochs:
	for input in batch_iters:
		# 5句函数
		outputs = model(input)
        model.zero_grad()
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
# Example 1
# 定义代价函数和优化器
import torch.optim as optim
criterion = nn.CrossEntropyLoss() # use a Classification Cross-Entropy loss
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# 训练 epoch × batches_per_epoch 轮
for epoch in range(2): 
    running_loss = 0.0
    # 遍历trainloader时每次返回一个batch
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        # wrap them in Variable
        inputs, labels = Variable(inputs), Variable(labels)
   
        # zero the parameter gradients
        optimizer.zero_grad()
        
        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()        
        optimizer.step()
        running_loss += loss.data[0]
        
        # print statistics
        if i % 2000 == 1999: 
            print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss / 2000))
            running_loss = 0.0
# Example 2 简化版
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)

for t in range(500):
    y_pred = model(x)
    loss = criterion(y_pred, y)
    print(t, loss.item())
    # 清零梯度,反向传播,更新权重 
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
# Example3
 optimizer = torch.optim.Adam(model.parameters(), lr=config.learning_rate)
 model.train()
 for epoch in range(config.num_epochs):
     print('Epoch [{}/{}]'.format(epoch + 1, config.num_epochs))
     for i, (trains, labels) in enumerate(train_iter):
         outputs = model(trains)
         model.zero_grad()
         loss = F.cross_entropy(outputs, labels.squeeze(1))
         loss.backward()
         optimizer.step()

模型评价

一个循环+2句计算+n个评价函数:

# 一个循环:test_iters
for texts, labels in test_iter:
    # 2句计算:outputs = model(inputs)
    outputs = net(Variable(images))
    _, predicts = torch.max(outputs.data, 1)
# n个评价函数:用labels和predicts计算metrcs库的各种评价指标
# Example1
correct = 0
total = 0
# 一个循环:test_iters
for data in testloader:
    images, labels = data
    # 2句计算:outputs = model(inputs)
    outputs = net(Variable(images))
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum()
print('Accuracy of the network on the 10000 test images: %d %%' % (100 * correct / total))
# Example 2
from sklearn import metrics
model.load_state_dict(torch.load(config.save_path))
model.eval()
loss_total = 0
predict_all = np.array([], dtype=int)
labels_all = np.array([], dtype=int)
with torch.no_grad():# 不自动求导,节约内存
    for texts, labels in data_iter:
        outputs = model(texts)
        predic = torch.max(outputs.data, 1)[1].cpu().numpy()
        loss = F.cross_entropy(outputs, labels.squeeze(1))
        loss_total += loss
        labels = labels.data.cpu().numpy()
        labels_all = np.append(labels_all, labels)
        predict_all = np.append(predict_all, predic)

acc = metrics.accuracy_score(labels_all, predict_all)
report = metrics.classification_report(labels_all, predict_all, 
confusion = metrics.confusion_matrix(labels_all, predict_all)
print (acc, loss_total / len(data_iter), report, confusion)

模型存储和加载

torch.save(model,'model.pt')
model1 = torch.load('model.pt)

使用GPU训练神经网络

# 使用GPU训练时,需要将model和tensor转至指定device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = BertClassifier(config.bert_path)
model = model.to(device)
tensor = torch.tensor(x_train).to(torch.float32)
tensor = tensor.to(device)

Pytorch和TensorFlow的区别

  1. Pytorch构建动态计算图,TensorFlow构建静态计算图,因此pytorch更为灵活一些;
    计算图示例
计算图示例
  1. TensorFlow在GPU上的计算性能更高一些,在CPU上二者类似。
  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值