- 🍨 本文为🔗365天深度学习训练营 中的学习记录博客
- 🍖 原作者:K同学啊 | 接辅导、项目定制
- 🚀 文章来源:K同学的学习圈子
一、前期准备
1.设置GPU
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision
from torchvision import transforms, datasets
import os,PIL,pathlib,warnings
def seed_torch(seed=1029): #设置随机种子
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
np.random.seed(seed)
torch.manual_seed(seed)#设置pytorch的随机种子
torch.cuda.manual_seed_all(seed)#设置pytorchGPU的随机种子
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False#关闭卷积优化器
warnings.filterwarnings("ignore") #忽略警告信息
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device
device(type=‘cuda’)
关于关闭卷积优化器这个,我刚开始看到这个操作还是有点疑惑的,为什么要把优化器关掉,然后我得到了这样的回答“关闭卷积优化器可以在一些特殊的情况下提高模型的性能。例如,在训练一些非常大的模型时,由于计算资源的限制,我们可能无法使用高效的卷积优化器,此时关闭卷积优化器可以提高训练速度。此外,在一些特殊的任务中,如图像分类、目标检测等,关闭卷积优化器也可以提高模型的性能。”然后我决定先试试。然后在这里设置随机种子,将随机数生成器初始化,确保每次的随机数相同。
2.导入数据
import os,PIL,random,pathlib
data_dir = './P7/'
data_dir = pathlib.Path(data_dir)
data_paths = list(data_dir.glob('*'))
classeNames = [str(path).split("\\")[1] for path in data_paths]
classeNames
[‘Dark’, ‘Green’, ‘Light’, ‘Medium’]
导入必要的os,PIL,random等库,用pathlib将Path转化为路径对象,再用glob读取路径下所有文件,用str.split(“”)方法将每个路径的字符串分割成一个列表,并取出第二个元素即文件名储存到classNames中
train_transforms = transforms.Compose([
transforms.Resize([224, 224]),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
test_transform = transforms.Compose([
transforms.Resize([224, 224]),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
total_data = datasets.ImageFolder("./P7/",transform=train_transforms)
total_data
Dataset ImageFolder
Number of datapoints: 1200
Root location: ./P7/
StandardTransform
Transform: Compose(
Resize(size=[224, 224], interpolation=bilinear, max_size=None, antialias=warn)
RandomHorizontalFlip(p=0.5)
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
分别定义了训练集和测试集的数据增强方法。将数据归一化,统一大小,随机翻转等能提高模型的泛化能力。将图形转化为张量便于模型处理
total_data.class_to_idx#打标签
{‘Dark’: 0, ‘Green’: 1, ‘Light’: 2, ‘Medium’: 3}
3.划分数据
train_size = int(0.8 * len(total_data))
test_size = len(total_data) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(total_data, [train_size, test_size])
train_dataset, test_dataset
(<torch.utils.data.dataset.Subset at 0x239316a53a0>,
<torch.utils.data.dataset.Subset at 0x239316a5640>)
utils.data.random_split()对数据进行batch划分
batch_size = 1#将batch_size设置为1虽然等的久了一些,但是还是能减小一些过拟合的风险
train_dl = torch.utils.data.DataLoader(train_dataset,
batch_size=batch_size,
shuffle=True,
num_workers=1)
test_dl = torch.utils.data.DataLoader(test_dataset,
batch_size=batch_size,
shuffle=True,
num_workers=1)
shuffle = True将在每一次将数据打乱,将线程加大一点,增加速度
for X, y in test_dl:
print("Shape of X [N, C, H, W]: ", X.shape)
print("Shape of y: ", y.shape, y.dtype)
break
Shape of X [N, C, H, W]: torch.Size([1, 3, 224, 224])
Shape of y: torch.Size([1]) torch.int64
二、手动搭建VGG-16模型
1.搭建模型
import torch.nn.functional as F
class vgg16(nn.Module):
def __init__(self):
super(vgg16, self).__init__()#调用了nn.Module的构造函数,用于初始化模型的参数
# 卷积块1
self.block1 = nn.Sequential( #使用nn.Sequential模块将卷积块1中的各个层组合成一个序列。
nn.Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),#输入为3通道,输出为64通道
nn.ReLU(),#定义了一个ReLU激活函数,用于增强模型的非线性表达能力。
nn.Conv2d(64, 64, kernel_size=(3, 3),stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))#定义了一个最大池化层,降低了分辨率,减少了计算量
)
# 卷积块2
self.block2 = nn.Sequential(
nn.Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
)
# 卷积块3
self.block3 = nn.Sequential(
nn.Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
)
# 卷积块4
self.block4 = nn.Sequential(
nn.Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
)
# 卷积块5
self.block5 = nn.Sequential(
nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
)
# 全连接网络层,用于分类
self.classifier = nn.Sequential(#定义了一个分类器,用于将卷积层输出的特征图转换为类别概率。
nn.Linear(in_features=512*7*7, out_features=4096),#输入了26558个特征,输出了4096个特征
nn.ReLU(),
nn.Linear(in_features=4096, out_features=4096),
nn.ReLU(),
nn.Linear(in_features=4096, out_features=4)#第三层连接层用于将特征向量转换为类别概率。模型可以预测四个类别
)
def forward(self, x):#定义了一个前向传播函数
x = self.block1(x)#将数据在卷积块之间不断传递
x = self.block2(x)
x = self.block3(x)
x = self.block4(x)
x = self.block5(x)
x = torch.flatten(x, start_dim=1)#第五个卷积块后展平为一维向量
x = self.classifier(x)#将展平后的向量x传递给分类器classifier,并将分类器的输出结果赋值给x。
return x
device = "cuda" if torch.cuda.is_available() else "cpu"
print("Using {} device".format(device))
model = vgg16().to(device)
model
Using cuda device
vgg16(
(block1): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
)
(block2): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
)
(block3): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU()
(6): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
)
(block4): Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU()
(6): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
)
(block5): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU()
(2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU()
(4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU()
(6): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU()
(2): Linear(in_features=4096, out_features=4096, bias=True)
(3): ReLU()
(4): Linear(in_features=4096, out_features=4, bias=True)
)
)
2.查看模型详情
import torchsummary as summary#对参数信息进行总结
summary.summary(model, (3, 224, 224))
Layer (type) Output Shape Param # Conv2d-1 [-1, 64, 224, 224] 1,792 ReLU-2 [-1, 64, 224, 224] 0 Conv2d-3 [-1, 64, 224, 224] 36,928 ReLU-4 [-1, 64, 224, 224] 0 MaxPool2d-5 [-1, 64, 112, 112] 0 Conv2d-6 [-1, 128, 112, 112] 73,856 ReLU-7 [-1, 128, 112, 112] 0 Conv2d-8 [-1, 128, 112, 112] 147,584 ReLU-9 [-1, 128, 112, 112] 0 MaxPool2d-10 [-1, 128, 56, 56] 0 Conv2d-11 [-1, 256, 56, 56] 295,168 ReLU-12 [-1, 256, 56, 56] 0 Conv2d-13 [-1, 256, 56, 56] 590,080 ReLU-14 [-1, 256, 56, 56] 0 Conv2d-15 [-1, 256, 56, 56] 590,080 ReLU-16 [-1, 256, 56, 56] 0 MaxPool2d-17 [-1, 256, 28, 28] 0 Conv2d-18 [-1, 512, 28, 28] 1,180,160 ReLU-19 [-1, 512, 28, 28] 0 Conv2d-20 [-1, 512, 28, 28] 2,359,808 ReLU-21 [-1, 512, 28, 28] 0 Conv2d-22 [-1, 512, 28, 28] 2,359,808 ReLU-23 [-1, 512, 28, 28] 0 MaxPool2d-24 [-1, 512, 14, 14] 0 Conv2d-25 [-1, 512, 14, 14] 2,359,808 ReLU-26 [-1, 512, 14, 14] 0 Conv2d-27 [-1, 512, 14, 14] 2,359,808 ReLU-28 [-1, 512, 14, 14] 0 Conv2d-29 [-1, 512, 14, 14] 2,359,808 ReLU-30 [-1, 512, 14, 14] 0 MaxPool2d-31 [-1, 512, 7, 7] 0 Linear-32 [-1, 4096] 102,764,544 ReLU-33 [-1, 4096] 0 Linear-34 [-1, 4096] 16,781,312 ReLU-35 [-1, 4096] 0 Linear-36 [-1, 4] 16,388
Total params: 134,276,932 Trainable params: 134,276,932 Non-trainable
params: 0
Input size (MB): 0.57 Forward/backward pass size (MB): 218.52 Params size
(MB): 512.23 Estimated Total Size (MB): 731.32
三、训练模型
1.编写训练函数
# 训练循环
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset) # 训练集的大小
num_batches = len(dataloader) # 批次数目, (size/batch_size,向上取整)
train_loss, train_acc = 0, 0 # 初始化训练损失和正确率
for X, y in dataloader: # 获取图片及其标签
X, y = X.to(device), y.to(device)#将数据集中的数据转换到指定的设备上。
# 计算预测误差
pred = model(X) # 网络输出
loss = loss_fn(pred, y) # 计算网络输出和真实值之间的差距,targets为真实值,计算二者差值即为损失
# 反向传播
optimizer.zero_grad() # grad属性归零
loss.backward() # 反向传播
optimizer.step() # 每一步自动更新
# 记录acc与loss
train_acc += (pred.argmax(1) == y).type(torch.float).sum().item()#计算预测准确率
train_loss += loss.item()#计算误差
train_acc /= size#将预测准确率除以数据集的大小,得到平均准确率。
train_loss /= num_batches#将损失除以数据集中的批次数量,得到平均损失。
return train_acc, train_loss
首先计算数据集的大小和批次数量,然后循环遍历数据集中的每一个批次。对于每一个批次,首先将数据转换到指定的设备上,然后使用模型对数据进行预测,并计算预测误差。接着使用反向传播算法计算梯度,并使用优化器更新模型参数。最后计算预测准确率和损失,并将它们除以数据集的大小和批次数量,得到平均准确率和平均损失。最后返回训练准确率和训练损失。
2.编写测试函数
def test (dataloader, model, loss_fn):
size = len(dataloader.dataset) # 测试集的大小
num_batches = len(dataloader)
test_loss, test_acc = 0, 0
with torch.no_grad():#在测试过程中关闭自动求导,避免计算梯度,提高效率。
for imgs, target in dataloader:#循环遍历测试集中的每一个批次。
imgs, target = imgs.to(device), target.to(device)#将测试集中的数据转换到指定的设备上。
# 计算loss
target_pred = model(imgs)#使用模型对数据进行预测。
loss = loss_fn(target_pred, target)
test_loss += loss.item()
test_acc += (target_pred.argmax(1) == target).type(torch.float).sum().item()
test_acc /= size
test_loss /= num_batches
return test_acc, test_loss
3.正式训练
import copy
optimizer = torch.optim.Adam(model.parameters(), lr= 0.5e-5)#创建一个Adam优化器,用于更新模型参数。model.parameters()表示模型中的所有参数
loss_fn = nn.CrossEntropyLoss() # 创建一个交叉熵损失函数,用于计算预测误差。
epochs = 30
train_loss = []#都是创建空列表,用于储存损失率或准确率
train_acc = []
test_loss = []
test_acc = []
best_acc = 100 # 设置一个最佳准确率,作为最佳模型的判别指标
for epoch in range(epochs):
model.train()#将模型设置为训练模式
epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, optimizer)
model.eval()
epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)
if epoch_test_acc > best_acc:
best_acc = epoch_test_acc
best_model = copy.deepcopy(model)
train_acc.append(epoch_train_acc)
train_loss.append(epoch_train_loss)
test_acc.append(epoch_test_acc)
test_loss.append(epoch_test_loss)
lr = optimizer.state_dict()['param_groups'][0]['lr']
template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}, Lr:{:.2E}')#定义一个模板,用于输出训练和测试的结果。
print(template.format(epoch+1, epoch_train_acc*100, epoch_train_loss,
epoch_test_acc*100, epoch_test_loss, lr))
PATH = './best_model.pth'
torch.save(model.state_dict(), PATH)
print('Done')
Epoch: 1, Train_acc:26.2%, Train_loss:1.382, Test_acc:29.6%, Test_loss:1.193, Lr:5.00E-06
Epoch: 2, Train_acc:60.3%, Train_loss:0.782, Test_acc:67.5%, Test_loss:0.620, Lr:5.00E-06
Epoch: 3, Train_acc:67.3%, Train_loss:0.652, Test_acc:77.5%, Test_loss:0.537, Lr:5.00E-06
Epoch: 4, Train_acc:79.7%, Train_loss:0.487, Test_acc:91.2%, Test_loss:0.269, Lr:5.00E-06
Epoch: 5, Train_acc:90.2%, Train_loss:0.262, Test_acc:94.6%, Test_loss:0.157, Lr:5.00E-06
Epoch: 6, Train_acc:93.8%, Train_loss:0.175, Test_acc:86.2%, Test_loss:0.376, Lr:5.00E-06
Epoch: 7, Train_acc:95.8%, Train_loss:0.125, Test_acc:98.8%, Test_loss:0.038, Lr:5.00E-06
Epoch: 8, Train_acc:94.7%, Train_loss:0.140, Test_acc:92.5%, Test_loss:0.238, Lr:5.00E-06
Epoch: 9, Train_acc:95.6%, Train_loss:0.102, Test_acc:92.9%, Test_loss:0.131, Lr:5.00E-06
Epoch:10, Train_acc:97.1%, Train_loss:0.090, Test_acc:97.9%, Test_loss:0.039, Lr:5.00E-06
Epoch:11, Train_acc:96.0%, Train_loss:0.106, Test_acc:98.8%, Test_loss:0.024, Lr:5.00E-06
Epoch:12, Train_acc:96.0%, Train_loss:0.111, Test_acc:96.2%, Test_loss:0.095, Lr:5.00E-06
Epoch:13, Train_acc:97.1%, Train_loss:0.087, Test_acc:97.9%, Test_loss:0.057, Lr:5.00E-06
Epoch:14, Train_acc:96.5%, Train_loss:0.105, Test_acc:98.3%, Test_loss:0.062, Lr:5.00E-06
Epoch:15, Train_acc:97.8%, Train_loss:0.063, Test_acc:97.5%, Test_loss:0.048, Lr:5.00E-06
Epoch:16, Train_acc:97.1%, Train_loss:0.077, Test_acc:99.2%, Test_loss:0.028, Lr:5.00E-06
Epoch:17, Train_acc:97.2%, Train_loss:0.072, Test_acc:99.2%, Test_loss:0.026, Lr:5.00E-06
Epoch:18, Train_acc:97.6%, Train_loss:0.076, Test_acc:97.9%, Test_loss:0.037, Lr:5.00E-06
Epoch:19, Train_acc:98.1%, Train_loss:0.072, Test_acc:99.2%, Test_loss:0.028, Lr:5.00E-06
Epoch:20, Train_acc:98.2%, Train_loss:0.052, Test_acc:98.3%, Test_loss:0.033, Lr:5.00E-06
Epoch:21, Train_acc:98.4%, Train_loss:0.051, Test_acc:99.2%, Test_loss:0.029, Lr:5.00E-06
Epoch:22, Train_acc:97.8%, Train_loss:0.054, Test_acc:98.8%, Test_loss:0.026, Lr:5.00E-06
Epoch:23, Train_acc:98.0%, Train_loss:0.047, Test_acc:98.8%, Test_loss:0.030, Lr:5.00E-06
Epoch:24, Train_acc:97.7%, Train_loss:0.071, Test_acc:99.6%, Test_loss:0.020, Lr:5.00E-06
Epoch:25, Train_acc:98.9%, Train_loss:0.043, Test_acc:99.6%, Test_loss:0.017, Lr:5.00E-06
Epoch:26, Train_acc:98.5%, Train_loss:0.051, Test_acc:98.3%, Test_loss:0.064, Lr:5.00E-06
Epoch:27, Train_acc:98.4%, Train_loss:0.038, Test_acc:99.2%, Test_loss:0.029, Lr:5.00E-06
Epoch:28, Train_acc:98.6%, Train_loss:0.032, Test_acc:98.3%, Test_loss:0.032, Lr:5.00E-06
Epoch:29, Train_acc:98.0%, Train_loss:0.048, Test_acc:98.3%, Test_loss:0.041, Lr:5.00E-06
Epoch:30, Train_acc:98.5%, Train_loss:0.039, Test_acc:88.8%, Test_loss:0.479, Lr:5.00E-06
Done
四、结果可视化、
1.Loss与Accuracy图
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
plt.rcParams['figure.dpi'] = 500
epochs_range = range(epochs)
plt.figure(figsize=(15,5))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, train_acc, label='Training Accuracy')
plt.plot(epochs_range, test_acc, label='Test Accuracy')
plt.legend(loc='lower left')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_loss, label='Training Loss')
plt.plot(epochs_range, test_loss, label='Test Loss')
plt.legend(loc='upper left')
plt.title('Training and Validation Loss')
plt.show()
2.指定图片进行预测
from PIL import Image
classes = list(total_data.class_to_idx)
def predict_one_image(image_path, model, transform, classes):
test_img = Image.open(image_path).convert('RGB')
plt.imshow(test_img)
test_img = transform(test_img)
img = test_img.to(device).unsqueeze(0)
model.eval()
output = model(img)
_,pred = torch.max(output,1)
pred_class = classes[pred]
print(f'预测结果是:{pred_class}')
# 预测训练集中的某张照片
predict_one_image(image_path='./P7/Green/green (101).png',
model=model,
transform=train_transforms,
classes=classes)
预测结果是:Green
3.模型评估
best_model.eval()
epoch_test_acc, epoch_test_loss = test(test_dl, best_model, loss_fn)
epoch_test_acc, epoch_test_loss
(0.8875, 0.4793169962623239)
epoch_test_acc
0.8875
在这里我单独总结一下我最近学到的一些东西。
1.对于激活函数具体为什么能起到“激活”的作用,在学习的时候只知道能这样做,但是真正想象这个过程还是有点困难。我最新的理解是,在使用激活函数前,卷积池化等只是将特征进行了线性操作,相当于线性组合知识将坐标进行了缩放平移。而激活函数起到了改变原来分布状态的作用(或许又该说成是过滤?)比如使用Tanh函数后,将原来的非线性数据映射到新的空间,形成新的分类。
2.增多神经元,提升了模型的线性转换能力;隐藏层提升了非线性转换能力,但是当我们的结构过于精密是,就会出现过拟合,使得泛化能力大打折扣。
3.对卷积的重要理解是它将输入的数字信号转化成了分类的类别或者概率等,理解它的作用帮助我们更好的去运用卷积。
4.对于多个卷积层的提取作用,我们可以理解为从边缘开始读取,然后一层一层的将特征进行重合,就慢慢的形成了图像特征。然后池化就是抓取主要特征,删去次要特征。(图片来自于B站up主梗直哥)
5.RNN只有短期“记忆”,而我们需要长时间的“记忆”功能时,就用到了LSTM,它相比于RNN增加了一条时间轴记录了全局的信息。中间又有forget gate 和input gate 来对特征进行优化选择,不断更新,再结合sigmoid函数进行删除0的选择或者tanh函数进行梳理,实现关注重要片段的能力。
6.关于attention其实我们就可以理解为权重,而self-attention就是去除顺序,在输入一个词时去寻找它与所有词之间的联系,自己找到自己的特征,通过加权求和,获取对上下文的全局感知。
7.关于transformer它相比于其他内容多了encoder和decoder的东西。每个encoder包括self-attention和前馈网络两个核心,计算好各个部分的权重然后再进行标签,最后在进行新的构架Decoder在机器翻译中的作用是在重编的时候,不光要看已经翻译好的内容,还要兼顾encoder中的上下文信息。
8.多头注意其实就是从自注意机制中分解出来的。
9.transformer在机器翻译是数据的具体流动过程为:先将单词向量化,再嵌入位置信息,
然后归一化统一长度,然后送入encoder。在这里面self-attention通过权重标明各个词片段之间的相互关系,嵌入上下文信息。这个过程中是将每个单词的query分量和所有向量的key分量相乘,得到的结果就是attention权重。再归一化,用softmax过滤掉不相关的词,乘以value向量后加权求和,就得到了输出向量。本质就是通过一系列的矩阵操作,得到了单词间的权重关系。
10.在multi-head中,使用了不同的权重矩阵进行8次计算,是保证减少不确定因素对全局的影响,最后再加权平均,合成一个输出。