python-pytorch 常用api打卡0.1.084

liwulin0506

已于 2024-06-21 08:56:22 修改

阅读量461

点赞数 2

分类专栏： pytorch python 文章标签： python pytorch 开发语言

于 2024-03-27 12:00:45 首次发布

本文链接：https://blog.csdn.net/m0_60688978/article/details/137073095

版权

这篇博客详细记录了PyTorch中`torch`和`torch.nn`模块的一些常用API，包括`torch.bmm`、`torch.max`、`torch.matmul`以及`nn`模块中的平均池化、多头注意力机制、Flatten层等。还介绍了如何使用`argmax`、`topk`等Tensor操作，并讨论了如何在模型训练和测试阶段切换模式。

摘要由CSDN通过智能技术生成

记录

2024年5月7日12:02:42----0.1.031

2024年5月8日17:03:17----0.1.032

2024年5月17日10:57:37----0.1.034

2024年5月19日15:31:26----0.1.035

2024年5月23日17:44:54----0.1.045
-2024年5月28日14:39:14----0.1.056

2024年5月29日16:56:27----0.1.060

2024年5月30日13:53:01----0.1.070

2024年4月29日09:20:15----0.1.028

2024年4月29日14:49:31----0.1.029

torch.

torch.bmm

input and mat2 must be 3-D tensors
输出size：如input size[4,5,6],mat2 size是[4,6,9]，那output size是[4,5,9]，就是输入的列变为第二个矩阵的列数
第一个参数第一个和第三个就等于第二个参数的前二维度，否则会报错Expected size for first two dimensions of batch2 tensor to be

torch.max

以表达式torch.max(out.view(-1,25),dim=1)为例
1.dim=1时，表示按行计算，有多少就有多少个值
2.dim=0时，表示按列计算，有多少列就有多少个值

如inputsize是【2，3】，所以如果max(inputsize，dim=0)，那结果就有3个，如果dim=1，那结果的值只有2个

如果torch.max(input)，参数中没有指定dim，则返回的是value，没有key

torch.matmul

input_d = other_d = 1，结果为[]
input_d = other_d = 2 ，结果为torch.Size([2, 2])
input_d = 1, other_d = 2，input_d 扩展成 (1, 2) 后，(1, 2) * (2, 2) => (1, 2) => (2, )
input_d = 2, other_d = 1，other_d 扩展成 (2，1) 后， (2, 2)*(2，1) => (2，1) => (2, )
input_d = 3 and other_d = 2，矩阵部分：(1, 2) * (2, 1)
input_d = 4 and other_d =3，广播部分：(2, 1, , ) => (2, 2, , )。矩阵部分：(2, 1) * (1, 2)
input_d = 4 and other_d =2，矩阵部分：(2, 1) * (1, 2)

torch.nn

torch.nn.nn.AvgPool2d()

torch.nn.AvgPool2d( kernel_size , stride=None , padding=0 , ceil_mode=False , count_include_pad=True , divisor_override=None )
函数名字中的2d表示的是池化核kernel_size数据是2维的,如(4,5)表示4行5列的池化核
stride 表示kernel移动步大小，如为1时就是上下左右移动一个
divisor_override就是在普通形式下计算的结果再除以这个值
ceil_mode为True表示不够的要算作一个，即向下取整数；ceil_mode默认是False，当时False时除后多的就舍弃，即向上取整数
具体size计算
当stride等于kernel的列值时，直接对应除就可以了

torch.nn.MultiheadAttention

3D数据中,batch_first=False情况下：

输入参数：embed_dim = dims, num_heads = heads, dropout = dropout_pro,batch_first=False，其中需要关注的是dims等于num_heads*嵌入维度
对于输入数据的要求qkv，三者的embed_dim和batch_size要一致
kv的seq_len和q的seq_len可以不一样，q的代表是目标句子长度，而kv的代表的是源端句子长度
输出数据的size：attn_output size是【batch_size,target_seq_len,embed_dim】； attn_output_weights size 是【batch_size,target_seq_len,source_seq_len】
如下示例，参考链接https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html

# 先决定参数
dims = 256 * 10 # 所有头总共需要的输入维度
heads = 10    # 单注意力头的总共个数
dropout_pro = 0.0 # 单注意力头
 # 传入参数得到我们需要的多注意力头
layer = torch.nn.MultiheadAttention(embed_dim = dims, num_heads = heads, dropout = dropout_pro,batch_first=False)

q = torch.rand((9, 2, 2560))
k=torch.rand(6,2,2560)
v=torch.rand(6,2,2560)
attn_output, attn_output_weights=layer(q,k,v)

# attn_output [seq, batch, embed_dim]  attn_output_weights []
attn_output.size(), attn_output_weights.size()

"""
(torch.Size([9, 2, 2560]), torch.Size([2, 9, 6]))
"""

torch.nn.Flatten

flatten()函数的作用是将tensor铺平成一维

维度计数是从0开始
m = nn.Flatten()默认从1维开始合并
合并的维度值，就是这些维度的乘积

x = torch.rand(5,4,3,5)
print(x.size())
x = x.flatten(0)
x.size()
"""
输出是：5x4x3x5=300
torch.Size([300])
"""

x = torch.rand(5,4,3,5)
print(x.size())
x = x.flatten(1)
x.size()
"""
输出是：5，4x3x5=300
torch.Size([5, 60])
"""

x = torch.rand(5,4,3,5)
print(x.size())
x = x.flatten(2)
x.size()
"""
输出是：5，4，3x5=300
torch.Size([5,