图像预处理 mean=[0.485, 0.456, 0.406] std=[0.229, 0.224, 0.225] 的由来以及使用

最新推荐文章于 2024-01-31 13:26:31 发布

AI_潜行者

最新推荐文章于 2024-01-31 13:26:31 发布

阅读量8k

点赞数 22

分类专栏：人工智能学习 OpenCV 文章标签：深度学习 python pytorch

本文链接：https://blog.csdn.net/weixin_40011280/article/details/120908702

版权

人工智能学习同时被 2 个专栏收录

11 篇文章 1 订阅

订阅专栏

OpenCV

9 篇文章 0 订阅

订阅专栏

为什么一些深度学习的图像预处理使用mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]来正则化？

Using the mean and std of Imagenet is a common practice. They are calculated based on millions of images. If you want to train from scratch on your own dataset, you can calculate the new mean and std. Otherwise, using the Imagenet pretrianed model with its own mean and std is recommended.

这是因为使用了使用ImageNet的均值和标准差。使用Imagenet的均值和标准差是一种常见的做法。它们是根据数百万张图像计算得出的。如果要在自己的数据集上从头开始训练，则可以计算新的均值和标准差。否则，建议使用Imagenet预设模型及其平均值和标准差。

mean=[0.485, 0.456, 0.406] #RGB
#mean = [123.680, 116.779,103.939 ]  #RGB 图像范围0-255时
std=[0.229, 0.224, 0.225]#来正则化

在图像送入网络训练之前，减去图片的均值，算是一种归一化操作。
图像其实是一种平稳的分布，减去数据对应维度的统计平均值，可以消除公共部分。
以凸显个体之前的差异和特征。

用法：

import torch.nn as nn
import torch
import cv2
import numpy as np

# convert uint (HxWxn_channels) to 3-dimensional torch tensor
def uint2tensor3(img):
    if img.ndim == 2:
        img = np.expand_dims(img, axis=2) #升维
    return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1).float().div(255.) #内存连续,换轴,归一化

# convert torch tensor to uint
def tensor2uint(img):
    img = img.data.squeeze().float().clamp_(0, 1).cpu().numpy()
    if img.ndim == 3:
        img = np.transpose(img, (1, 2, 0)) # 换轴
    return np.uint8((img*255.0).round()) # 四舍五入

def imsave(img, img_path):
    if img.ndim == 3:
        img = img[:, :, [2, 1, 0]] # 换轴
    cv2.imwrite(img_path, img)

def readimage(path):
    img = cv2.imread(path, cv2.IMREAD_UNCHANGED)  # BGR or G
    if img.ndim == 2:
        img = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)  # GGG
    else:
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    return img

def test():
    img=readimage('small_girl.jpg')
    print(img.shape)#(400, 600, 3)
    x=uint2tensor3(img) # 结果：三通道,归一化,chw
    print(x.shape)#torch.Size([3, 400, 600])
    x=torch.unsqueeze(x,dim=0)
    print(x.shape)  #torch.Size([1, 3, 400, 600])

    # 均值
    mean = nn.Parameter(torch.Tensor([0.485, 0.456, 0.406]).reshape(1, 3, 1, 1), requires_grad=False)
    # 标准差
    std = nn.Parameter(torch.Tensor([0.229, 0.224, 0.225]).reshape(1, 3, 1, 1), requires_grad=False)

    # mean.repeat 代表重复,就是翻倍.  对指定轴上的值repeat
    # 原图 - 均值
    x = x - mean.repeat(x.size(0), 1, x.size(2), x.size(3)) # (1, 1, 540, 960)
    si = tensor2uint(x) # 归一化到 - 和 + 之间, 还原成 255 值大小
    imsave(si, 'mean2.png') # 就是保留了均值以上的像素值

    # (原图 - 均值) / 标准差
    x = x / std.repeat(x.size(0), 1, x.size(2), x.size(3))
    si=tensor2uint(x)
    imsave(si,'mean_std2.png')

test()

AI_潜行者

关注

22
点赞
踩
39

收藏

觉得还不错? 一键收藏
打赏
0
评论
图像预处理 mean=[0.485, 0.456, 0.406] std=[0.229, 0.224, 0.225] 的由来以及使用

为什么一些深度学习的图像预处理使用mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]来正则化？Using the mean and std of Imagenet is a common practice. They are calculated based on millions of images. If you want to train from scratch on your own dataset, you can calculat
复制链接

扫一扫