理论部分:CS231n 笔记 神经网络可视化(上)_iwill323的博客-CSDN博客_卷积神经网络可视化
CS231n 2022PPT笔记- 神经网络可视化(下)神经风格迁移_iwill323的博客-CSDN博客
目录
Loading ImageNet Validation Images
Total-variation regularization
We will start from a CNN model which has been pretrained to perform image classification on the ImageNet dataset. We will use this model to define a loss function which quantifies our current unhappiness with our image. Then we will use backpropagation to compute the gradient of this loss with respect to the pixels of the image. We will then keep the model fixed and perform gradient descent on the image to synthesize a new image which minimizes the loss.
We will explore three techniques for image generation.
准备工作
导包
# Setup cell.
import torch
import torchvision
import torch.nn as nn
import numpy as np
import random
import matplotlib.pyplot as plt
from PIL import Image
import torchvision.transforms as transforms
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # Set default size of plots.
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
SQUEEZENET_MEAN = np.array([0.485, 0.456, 0.406], dtype=np.float32)
SQUEEZENET_STD = np.array([0.229, 0.224, 0.225], dtype=np.float32)
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")
%load_ext autoreload
%autoreload 2
Pretrained Model
For the purposes of this assignment we will use SqueezeNet [1], which achieves accuracies comparable to AlexNet but with a significantly reduced parameter count and computational complexity. Using SqueezeNet rather than AlexNet or VGG or ResNet means that we can easily perform all image generation experiments on CPU.
[1] Iandola et al, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and < 0.5MB model size", arXiv 2016
# Download and load the pretrained SqueezeNet model.
model = torchvision.models.squeezenet1_1(pretrained=True)
# We don't want to train the model, so tell PyTorch not to compute gradients
# with respect to model parameters.
for param in model.parameters():
param.requires_grad = False
Loading ImageNet Validation Images
# http://cs231n.stanford.edu/imagenet_val_25.npz
from cs231n.data_utils import load_imagenet_val
X, y, class_names = load_imagenet_val(num=5)
plt.figure(figsize=(12, 6))
for i in range(5):
plt.subplot(1, 5, i + 1)
plt.imshow(X[i])
plt.title(class_names[y[i]])
plt.axis('off')
plt.gcf().tight_layout()
用到的函数 load_imagenet_val
from __future__ import print_function
from builtins import range
from six.moves import cPickle as pickle
import numpy as np
import os
from imageio import imread
import platform
def load_imagenet_val(num=None):
"""Load a handful of validation images from ImageNet.
Inputs:
- num: Number of images to load (max of 25)
Returns:
- X: numpy array with shape [num, 224, 224, 3]
- y: numpy array of integer image labels, shape [num]
- class_names: dict mapping integer label to class name
"""
imagenet_fn = os.path.join(
os.path.dirname(__file__), "datasets/imagenet_val_25.npz"
)
if not os.path.isfile(imagenet_fn):
print("file %s not found" % imagenet_fn)
print("Run the following:")
print("cd cs231n/datasets")
print("bash get_imagenet_val.sh")
assert False, "Need to download imagenet_val_25.npz"
# modify the default parameters of np.load
# https://stackoverflow.com/questions/55890813/how-to-fix-object-arrays-cannot-be-loaded-when-allow-pickle-false-for-imdb-loa
np_load_old = np.load
np.load = lambda *a,**k: np_load_old(*a, allow_pickle=True, **k)
f = np.load(imagenet_fn)
np.load = np_load_old
X = f["X"]
y = f["y"]
class_names = f["label_map"].item()
if num is not None:
X = X[:num]
y = y[:num]
return X, y, class_names
确定一下X,y的性质
>>print(X.shape) # 5张图片
(5, 224, 224, 3)
>>print(y)
[958 85 244 182 294]
>>for x in X:
print(x.shape)
(224, 224, 3) (224, 224, 3) (224, 224, 3) (224, 224, 3) (224, 224, 3)
图像处理函数
def preprocess(img, size=224):
transform = transforms.Compose([
transforms.Resize(size),
transforms.ToTensor(),
transforms.Normalize(mean=SQUEEZENET_MEAN.tolist(),
std=SQUEEZENET_STD.tolist()),
transforms.Lambda(lambda x: x[None]),
# Add a batch dimension in the first position of the tensor:
# aka, a tensor of shape (H, W, C) will become -> (1, H, W, C).
])
# return transforms(img).unsqueeze(0) 也可以使用这种方式增加第一维
return transform(img)
def deprocess(img, should_rescale=True):
"""
De-processes a Pytorch tensor from the output of the CNN model
to become a PIL JPG Image
"""
transform = transforms.Compose([
transforms.Lambda(lambda x: x[0]), # Remove the batch dimension at the first position. A tensor of dims (1, H, W, C) will become -> (H, W, C).
transforms.Normalize(mean=[0, 0, 0], std=(1.0 / SQUEEZENET_STD).tolist()), # Normalize the standard deviation
transforms.Normalize(mean=(-SQUEEZENET_MEAN).tolist(), std=[1, 1, 1]), # Normalize the mean
# Rescale all the values in the tensor so that they lie in the interval [0, 1] to prepare for transforming it into image pixel values.
transforms.Lambda(rescale) if should_rescale else transforms.Lambda(lambda x: x),
transforms.ToPILImage(),
])
return transform(img)
def rescale(x):
""" A function used internally inside `deprocess`.
Rescale elements of x linearly to be in the interval [0, 1]
with the minimum element(s) mapped to 0, and the maximum element(s)
mapped to 1.
"""
low, high = x.min(), x.max()
x_rescaled = (x - low) / (high - low)
return x_rescaled
也可以使用以下deprocess函数处理。输入img是一个pytorch的tensor(包含batch维)对象,而方差和均值数据是numpy数组,所以在计算前要改变img的通道位置,计算完后要将通道变回去
def deprocess(img):
img = img[0]
img = torch.clamp(img.permute(1, 2, 0) * SQUEEZENET_STD + SQUEEZENET_MEAN, 0, 1)
return torchvision.transforms.ToPILImage()(img.permute(2, 0, 1))
Saliency Maps
We can use saliency maps to tell which part of the image influenced the classification decision made by the network.
A saliency map tells us the degree to which each pixel in the image affects the classification score for that image. To compute it, we compute the gradient of the unnormalized score corresponding to the correct class (which is a scalar) with respect to the pixels of the image. If the image has shape (3, H, W)
then this gradient will also have shape (3, H, W)
; for each pixel in the image, this gradient tells us the amount by which the classification score will change if the pixel changes by a small amount. To compute the saliency map, we take the absolute value of this gradient, then take the maximum value over the 3 input channels; the final saliency map thus has shape (H, W)
and all entries are nonnegative.
import torch
import random
import torchvision.transforms as transforms
import numpy as np
import torch.nn as nn
from scipy.ndimage.filters import gaussian_filter1d
def compute_saliency_maps(X, y, model):
"""
Compute a class saliency map using the model for images X and labels y.
Input:
- X: Input images; Tensor of shape (N, 3, H, W)
- y: Labels for X; LongTensor of shape (N,)
- model: A pretrained CNN that will be used to compute the saliency map.
Returns:
- saliency