from Torch-cam
,
# Perform the weighted combination to get the CAM
cam = torch.nansum(weight * activation, dim=1)
其中,activation : torch.size([1, 2048, 7, 7]),weight : torch.size([1, 2048, 1, 1]),
from chatgpt :
This is the sum operation along the specified dimension (dim=1). The dim=1 indicates that the sum is taken along the second dimension (0-indexed) of the tensor, which is the channel dimension. This means that the sum is calculated independently for each channel, resulting in a tensor with reduced dimensionality along the channel axis.
所以这里求和是把每一个channel当作独立的,对这么多channel(2048)进行求和,所以求和结果其余维度保持不变,是torch.size([1, 7, 7]);
还有一个问题,为什么weight后面的形状是[1, 1],也可以和之前activation的[7, 7]来相乘;
因为torch有Broadcasting
机制,注意这里是元素相乘(element-wise multiple)不是矩阵乘法;