cnn到底学了啥

最新推荐文章于 2023-09-11 23:51:41 发布

蜉蝣之翼❉

最新推荐文章于 2023-09-11 23:51:41 发布

阅读量390

点赞数

本文链接：https://blog.csdn.net/fuyouzhiyi/article/details/94970354

版权

可视化代码

torchvision.models
Pytorch - torchvison.models 模型结构定义
 AlexNet网络的Pytorch实现
 hook
PyTorch学习总结(一)——查看模型中间结果
 pytorch 获取层权重，对特定层注入hook，提取中间层输出
 深度学习小白——卷积神经网络可视化（一）

pytorch获取中间层参数、输出与可视化
https://www.zhihu.com/question/68384370/answer/419741762

建议使用hook，在不改变网络forward函数的基础上提取所需的特征或者梯度，在调用阶段对module使用即可获得所需梯度或者特征。

inter_feature = {}
inter_gradient = {}
 def make_hook(name, flag):
     if flag == 'forward':
         def hook(m, input, output):
             inter_feature[name] = input
         return hook
     elif flag == 'backward':
         def hook(m, input, output):
             inter_gradient[name] = output
         return hook
     else:
         assert False
    
    m.register_forward_hook(make_hook(name, 'forward')) 　　　　　　　　　　　　　　　　
    m.register_backward_hook(make_hook(name, 'backward'))

在前向计算和反向计算的时候即可达到类似钩子的作用，中间变量已经被放置于inter_feature 和 inter_gradient。output = model(input) # achieve intermediate feature
loss = criterion(output, target)
loss.backward() # achieve backward intermediate gradients最后可根据需求是否释放hook。hook.remove()

Pytorch对Tensor的各种“特别”操作
 PyTorch中permute的用法
 Pytorch中Tensor与各种图像格式的相互转化

翻译５．４

可视化卷积学了什么
　我们经常说深度学习是个黑匣子：学习到的表示很难被抓取并且难以用人类可理解的形式呈现。但是卷积层的可视化还是可以做到的，因为是视觉概念的表示
　１．　Visualizing intermediate convnet outputs (intermediate activations)—Useful for
understanding how successive convnet layers transform their input, and for getting a first idea of the meaning of individual convnet filters.
２．　Visualizing convnets filters—Useful for understanding precisely what visual pattern or concept each filter in a convnet is receptive to.
３．　 Visualizing heatmaps of class activation in an image—Useful for understanding
which parts of an image were identified as belonging to a given class, thus allow-
ing you to localize objects in images.

Visualizing intermediate activations

给定一个确定的输入，输出卷积层和pooling 层的特征图

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))    #(55,55,64)
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)  #(27,27,64)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))  #(1, 192, 27, 27)
    (4): ReLU(inplace)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)  #(1, 192, 13, 13)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) #(1, 384, 13, 13)
    (7): ReLU(inplace)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))    #(1, 256, 13, 13)
    (9): ReLU(inplace)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) #(1, 256, 13, 13)
    (11): ReLU(inplace)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace)
    (3): Dropout(p=0.5)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

代码

 #-*-coding:utf-8 -*-
from PIL import Image
from alexnetvisualize import alexnet
import torch
from torchvision import models
from torchsummary import summary
import numpy as np
import matplotlib.pyplot as plt

# 导入一张图片
img=Image.open("cat.png").convert('RGB')
img=img.resize((224,224))

# 将图片处理成网络的输入格式
img=np.array(img)
img = np.expand_dims(img, axis=0)
img=torch.from_numpy(img)
img=img.permute(0,3,1,2)
img = img.float()

# 导入已经训练好的网络
model=alexnet(pretrained= True)
#print(model)
# 注册一个hook
first_layer_activation=None

def hook(module, inputdata, output):
	global first_layer_activation
	first_layer_activation=output.data

handle = model.features[12].register_forward_hook(hook)
y=model(img)
print(first_layer_activation.shape)
 
size=first_layer_activation.shape[-1]
number=first_layer_activation.shape[1]
hang=8
lie=number/8
display_grid=np.zeros((size*hang,size*lie))

for i in range(8):
	for j in range(number/8):
		display_grid[i*size:(i+1)*size,j*size:(j+1)*size]=first_layer_activation[0, (i+1)*(j+1)-1, :, :]
plt.figure(figsize=(display_grid.shape[1]/size,display_grid.shape[0]/size))
plt.title("(12): MaxPool2d")
plt.grid(False)
plt.imshow(display_grid,aspect='auto', cmap='viridis')
plt.show()

# 用完hook后删除
#handle.remove()

The first layer acts as a collection of various edge detectors. At that stage, the
activations retain almost all of the information present in the initial picture.
 As you go higher, the activations become increasingly abstract and less visually
interpretable. They begin to encode higher-level concepts such as “cat ear” and
“cat eye.” Higher presentations carry increasingly less information about the
visual contents of the image, and increasingly more information related to the
class of the image.
 The sparsity of the activations increases with the depth of the layer: in the first
layer, all filters are activated by the input image; but in the following layers,
more and more filters are blank. This means the pattern encoded by the filter
isn’t found in the input image.
We have just evidenced an important universal characteristic of the representations learned by deep neural networks: the features extracted by a layer become increasingly abstract with the depth of the layer. The activations of higher layers carry less and less information about the specific input being seen, and more and more information about the target (in this case, the class of the image: cat or dog). A deep neural network effectively acts as an information distillation pipeline, with raw data going in (in this case, RGB pictures) and being repeatedly transformed so that irrelevant information is filtered out (for example, the specific visual appearance of the image), and useful information is magnified and refined (for example, the class of the image)

This is analogous to the way humans and animals perceive the world: after observing a scene for a few seconds, a human can remember which abstract objects were present in it (bicycle, tree) but can’t remember the specific appearance of these objects. In fact, if you tried to draw a generic bicycle from memory, chances are you couldn’t get it even remotely right, even though you’ve seen thousands of bicycles in your lifetime (see, for example, figure 5.28). Try it right now: this effect is absolutely real. You brain has learned to completely abstract its visual input—to transform it into high-level visual concepts while filtering out irrelevant visual details—making it tremendously difficult to remember how things around you look.