在深度学习工程中,有时需要利用opencv对图像进行读取后传入深度学习模型中
import cv2
img = cv2.imread(filepath, 1)
这里的img是narray格式,opencv以BGR的形式读入,如果我们要传入model中,需要将img转化为RGB的格式并变为tensor
from torchvision import transforms
tfms = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.485,0.456, 0.406), (0.229, 0.224, 0.225))
])
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # BGR-->RGB
img = tfms(img) # 转化为tensor 并 归一化
这里的torchvision.transforms.ToTensor() 的官方解释是:
Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
经过ToTensor(), img 由 (H x W x C) -->(C x H x W), 每个像素值从[0, 255] --> [0.0, 1.0]
torchvision.transforms.Normalize(mean, std[, inplace]) 官方解释:
This transform does not support PIL Image. Given mean: (mean[1],...,mean[n])
and std: (std[1],..,std[n])
for n
channels, this transform will normalize each channel of the input torch.*Tensor
i.e., output[channel] = (input[channel] - mean[channel]) / std[channel]
transforms.Normalize的对象必须是tensor,因此他一般和ToTensor()连用,因为RGB的数据有三个channel, 所以代码设置了mean=[0.485,0.456, 0.406], std = [0.229, 0.224, 0.225]
这组 均值和方差是imagNet数据集计算出的,通常情况下,数据为(people, buildings, animals, varied lighting/angles/backgrounds, etc.) 均可以使用该均值和方差。或者你可以根据自己的数据生成新的均值和方差:Pytorch生成数据集均值和方差_努力沉淀的博客-CSDN博客
img传入model处理之后,如果要保存生成的图像,我们需要对其进行逆归一化
根据公式:output[channel] = (input[channel] - mean[channel]) / std[channel]
逆归一化的过程为:
x = model(img)
x[0] = x[0] * std[0] + mean[0]
x[1] = x[1] * std[1] + mean[1]
x[2] = x[2] * std[2] + mean[2]
img = x.mul(255).byte() # 数值范围为[0.0~1.0]-->[0~255]
img = img.numpy().transpose((1, 2, 0)) # C×H×W --> H×W×C
img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
cv2.imwrite('./test.png', img) # 保存