1、从平时的论文和开源代码方面来看,大家对pytorch的兴趣要大于tensorflow,而从推理方面,tensorflow要远大于pytorch。
2、Dropout是指在训练的时候,每一层神经元会随机地去除某些连接。避免机器学习发生过拟合。
3、残差网络的原生写法与融合写法
参考:16、PyTorch中进行卷积残差模块算子融合_哔哩哔哩_bilibili
import torch
import torch.nn.function F
import torch.nn as nn
in_channels = 2
out_channels = 2
kernel_size = 3
w = 9
h = 9
x = torch.ones(1, in_channels, w, h) #输入图片
# 方法1:原生写法
conv_2d = nn.Conc2d(in_channels, out_channels, kernel_size, padding="same")
conv_2d_pointwise = nn.Conv2d(in_channels, out_channels, 1)
result1 = conv_2d(x) + conv_2d_pintwise(x) + x
# 方法2:算子融合
# 把point-wise卷积和x本身都写成3*3的卷积
# 最终把三个卷积写成一个卷积
# 1)改造
pointwise_to_conv_weight = F.pad(conv_2d_pointewise.weight, [1,1,1,1,0,0,0,0]) #
2*2*1*1->2*2*3*3
conv_2d_for_pointwise = nn.Conv2d(in_channels, out_channels, kernel_size, padding="same")
conv_2d_for_pointwise.weight = nn.Parameter(pointwise_to_conv_weight)
conv_2d_for_pointwise.bias = conv_2d_pointwise.bias
# 2*2*3*3
zeros = torch.unsequeeze(torch.zeros(kernel_size, kernel_size), 0)
stars = torch.unsequenze(F.pad(torch.ones(1,1), [1,1,1,1]),0)
stars_zeros = torch.unsqueeze(torch.cat([stars, zeros], 0),0)
zeros_stars = torch.unsqueeze(torch.cat([zeros, stars], 0),0)
identity_to_conv_weight = torch.cat([stars_zeros, zeros_start], 0)
identity_to_conv_bias = torch.zeros([out_channels])
conv_2d_for_identity.weight = nn.Parameter(identity_to_conv_weight)
conv_2d_for_identiyu.bias = nn.Parameter(identity_to_conv_bias)
result2 = conv_2d(x) + conv_2d_for_pointwise(x) + conv_2d_for_identity(x)
# 2)融合
conv_2d_for_fusion = nn.Conv2d(in_cahnnels, out_channels, kernel_size, padding="same")
conv_2d_for_fusion.weight = nn.Parameter(conv_2d.weight.data, conv_2d_for_pointwise.weight.data, conv_2d_for_identity.weight.data)
conv_2d_for_fusion.bias = nn.Parameter(conv_2d_bias.data, conv_2d_for_pointwise.data, conv_2d_for_identity.bias.data)
result3 = conv_2d_for_fusion(x)
print(torch.all(torch.isclose(result2, result3)))
两种写法计算时间对比,确实是融合的写法速度会快一些。
4、seq2seq模型(encoder + attention + decoder)的基础模块,主要有三类:CNN、RNN、transformer
CNN:
权重共享:平移不变性、可并行计算
滑动窗口:局部关联性建模、依靠多层堆积来进行长程建模
对相对位置敏感,对绝对位置不敏感
RNN: (依次有序递归建模)
对顺序敏感
串行计算耗时
长程建模能力弱
计算复杂度与序列长度呈线性关系
单步计算复杂度不变
对相对位置敏感,对绝对位置敏感
transformer:
无局部假设:可并行计算,对位置不敏感
无有序假设: 需要位置编码来反映位置变化对于特征的影响,对绝对位置不敏感
任意两字符都可以建模:擅长长短建模,自注意力机制需要序列长度的平方级别复杂度