一些loss和指标记录_python实现

MORE_77

于 2024-04-10 13:15:44 发布

阅读量717

点赞数 20

文章标签： python 深度学习机器学习

本文链接：https://blog.csdn.net/qq_51219814/article/details/136167943

版权

1 L1_loss

定义：计算对应位置的绝对值，求平均。又称绝对值损失函数
公式：
$a=[x_1,x_2,x_3...x_n]、b=[y_1,y_2,y_3...y_n]$
$L1(a,b)=\frac{1}{n}\sum_{i=1}^n|x_i-y_i|$
代码实现：

import torch.nn as nn
import torch.nn as nn
import torch

#loss function
loss = nn.L1Loss()

a=torch.FloatTensor([1,2,3])
b=torch.FloatTensor([1.1,2.2,2.9]) 
output = loss(a, b)
print(output)

c=torch.FloatTensor([[1,3],[2,3],[1,1]])
print(f"c_shape:{c.shape}")
d=torch.FloatTensor([[2,3.2],[2.1,3.2],[1.1,1.1]])
print(f"c_shape:{d.shape}")
out=loss(c,d)#除以6，不是除以3!
print(out)

2 Lpips_loss

感知损失，符合人眼的损失
输入网络提取特征，计算L2距离，算平均。

class LpipsLoss(nn.Module):

    def __init__(self):
        super(LpipsLoss, self).__init__()
        self.lpips_fn = lpips.LPIPS(net='vgg')#加载预训练的模型
        self.lpips_fn.net.requires_grad_(False)


    def forward(self, gt, output):
        output = output*2.-1
        gt = gt*2.
        lpips_loss = torch.mean(self.lpips_fn(output, gt))
        return lpips_loss

3 PSNR

全称：peak signal-to-Noise Ratio 峰值信噪比

定义：
给定大小为 $n * m$ 的标签图片 $I$ 和模型输出图片 $K$ ：

计算均方误差 $MSE=\frac{1}{m*n}\sum_{i=0}^{n-1}\sum_{j=0}^{m-1}[I(i,j)-K(i,j)]^2$
$PSNR=10*log_{10}^{\frac{MAX^2}{MSE}}$ 。 $M A X$ 是指图片的最可能的最大像素值（比如范围是0~255，那 $M A X = 255$ ）

rgb情况下计算PSNR的代码：

import lpips
def calculate_psnr(img1, img2, normalized=False):
    if not normalized:
        img1 = img1.astype(np.float32)/255.
        img2 = img2.astype(np.float32)/255.
    MSE = np.mean((img1-img2)**2)
    
    #psnr= 10 * log10 (1^2 / MSE)
    if MSE == 0:
        psnr = 100
    else:
        psnr = -10*math.log10(MSE)
    return psnr

4 SSIM

全称：Structural Similarity 结构相似性

5 GAN_loss

5.1 softplus函数

公式： $softplus(x)=log^{(1+e^x)}$
分析：
1. 当输入 $x = 0$ 时，函数输出为 $log^2$
2. 当输入 $x$ 无限趋近于负无穷时，函数输出无限接近于0
3. 当输入 $x$ 无限趋近于正无穷时，函数输出无限接近于正无穷

5.2 生成器的Loss

分析：生成器希望自己生成的“假”图片，能被判别器识别为真。即我们判别器模型的输入是fake_img，希望判别器输出为正数。
思路：判别器结果越趋近于正数，计算获得的Loss就应该越小。将输出转负后输入softplus函数，获得小loss

#G_loss
def cal_adv_loss(fake_g_pred):
    adv_loss = F.softplus(-fake_g_pred).mean()
    return adv_loss

5.3 判别器的loss

注意，计算判别器的loss，由有部分组成。两个输入，real_img和fake_img。

分析：我们希望判别器能识别出real_img为真，输出正数，fake_img为假，为负数。
思路：
- real_img_pred越靠近正数，越符合期望，那我们的loss就应该越小;
- 同理fake_img_pred越靠近负数，越符合期望，我们的loss就应该越小。
- 而soft_plus是输入越靠近负无穷，loss越小，因此代码中，real_img_pred应该加负号

#D_loss
def cal_adv_d_loss(fake_d_pred, real_d_pred):
    real_loss = F.softplus(-real_d_pred).mean()
    fake_loss = F.softplus(fake_d_pred).mean()
    D_loss = real_loss + fake_loss
    return D_loss

6 FID指标

FID指标：用于衡量生成图像的多样性和质量，FID越小，则图像多样性越好，质量越好。
操作：

用inception network网络对两个数据集的图片都进行特征提取（最后一层输出图像的类别被去除），因此一张图片就会得到一个2048维度的特征。
所有真实图片的提取的向量是服从一个分布的；对于用GAN生成的图片对应的高位向量特征也是服从一个分布的。如果两个分布相同，那么意味着GAN生成图片的真实程度很高。那怎么去衡量分布呢？——计算两个多维变量分布之间的距离。
- 公式为： $FID(x,g)=||u_x-u_g||^2_2+Tr(\sum_x+\sum_g-2(\sum_x\sum_g)^{0.5})$
- $T r$ 是指矩阵中的迹，也就是对角线上元素之和
- $x 、 g$ 表示真实图片、生成图片
- $u 、 s i g ma$ 表示均值、协方差

代码：

import torch
import os
import pathlib
from argparse import ArgumentDefaultsHelpFormatter, ArgumentParser

import numpy as np
import torch
import torchvision.transforms as TF
from PIL import Image
from scipy import linalg
from torch.nn.functional import adaptive_avg_pool2d

try:
    from tqdm import tqdm
except ImportError:
    # If tqdm is not available, provide a mock version of it
    def tqdm(x):
        return x

from pytorch_fid.inception import InceptionV3

parser = ArgumentParser(formatter_class=ArgumentDefaultsHelpFormatter)
parser.add_argument('--batch-size', type=int, 
                    default=50,
                    help='Batch size to use')
parser.add_argument('--num-workers', type=int, 
                    default=8,
                    help=('Number of processes to use for data loading. '
                          'Defaults to `min(8, num_cpus)`'))
parser.add_argument('--device', type=str, 
                    default="cuda:6",
                    help='Device to use. Like cuda, cuda:0 or cpu')

parser.add_argument('--dims', type=int, 
                    default=2048,
                    choices=list(InceptionV3.BLOCK_INDEX_BY_DIM),
                    help=('Dimensionality of Inception features to use. '
                          'By default, uses pool3 features'))
parser.add_argument('--save-stats', action='store_true',
                    help=('Generate an npz archive from a directory of samples. '
                          'The first path is used as input and the second as output.'))
#path dataset of the fake imgs from GAN and target 
parser.add_argument('path', type=str, nargs='*', 
                    default=[
                    '/mnt/Datasets/CelebAMask-HQ/CelebAMask-HQ', #gt dataset
                    '/home/my_GAN_dataset/'],
                    help=('Paths to the generated images or to .npz statistic files'))

IMAGE_EXTENSIONS = {'bmp', 'jpg', 'jpeg', 'pgm', 'png', 'ppm',
                    'tif', 'tiff', 'webp'}



class ImagePathDataset(torch.utils.data.Dataset):
    def __init__(self, files, transforms=None):
        self.files = files
        self.transforms = transforms

    def __len__(self):
        return len(self.files)

    def __getitem__(self, i):
        path = self.files[i]
        img = Image.open(path).convert('RGB')
        if self.transforms is not None:
            img = self.transforms(img)
        return img


def get_activations(files, model, batch_size=50, dims=2048, device='cpu',
                    num_workers=1):
    """Calculates the activations of the pool_3 layer for all images.

    Params:
    -- files       : List of image files paths
    -- model       : Instance of inception model
    -- batch_size  : Batch size of images for the model to process at once.
                     Make sure that the number of samples is a multiple of
                     the batch size, otherwise some samples are ignored. This
                     behavior is retained to match the original FID score
                     implementation.
    -- dims        : Dimensionality of features returned by Inception
    -- device      : Device to run calculations
    -- num_workers : Number of parallel dataloader workers

    Returns:
    -- A numpy array of dimension (num images, dims) that contains the
       activations of the given tensor when feeding inception with the
       query tensor.
    """
    model.eval()

    if batch_size > len(files):
        print(('Warning: batch size is bigger than the data size. '
               'Setting batch size to data size'))
        batch_size = len(files)

    dataset = ImagePathDataset(files, transforms=TF.ToTensor())
    dataloader = torch.utils.data.DataLoader(dataset,
                                             batch_size=batch_size,
                                             shuffle=False,
                                             drop_last=False,
                                             num_workers=num_workers)

    pred_arr = np.empty((len(files), dims))

    start_idx = 0

    #把数据集中图片的特征都放入pred_arr
    for batch in tqdm(dataloader):
        batch = batch.to(device)

        with torch.no_grad():
            pred = model(batch)[0]

        # If model output is not scalar, apply global spatial average pooling.
        # This happens if you choose a dimensionality not equal 2048.
        if pred.size(2) != 1 or pred.size(3) != 1:
            pred = adaptive_avg_pool2d(pred, output_size=(1, 1))

        pred = pred.squeeze(3).squeeze(2).cpu().numpy()

        pred_arr[start_idx:start_idx + pred.shape[0]] = pred

        start_idx = start_idx + pred.shape[0]

    return pred_arr


def calculate_frechet_distance(mu1, sigma1, mu2, sigma2, eps=1e-6):
    """Numpy implementation of the Frechet Distance.
    The Frechet distance between two multivariate Gaussians X_1 ~ N(mu_1, C_1)
    and X_2 ~ N(mu_2, C_2) is
            d^2 = ||mu_1 - mu_2||^2 + Tr(C_1 + C_2 - 2*sqrt(C_1*C_2)).

    Stable version by Dougal J. Sutherland.

    Params:
    -- mu1   : Numpy array containing the activations of a layer of the
               inception net (like returned by the function 'get_predictions')
               for generated samples.
    -- mu2   : The sample mean over activations, precalculated on an
               representative data set.
    -- sigma1: The covariance matrix over activations for generated samples.
    -- sigma2: The covariance matrix over activations, precalculated on an
               representative data set.

    Returns:
    --   : The Frechet Distance.
    """
    #转换为numpy
    mu1 = np.atleast_1d(mu1)
    mu2 = np.atleast_1d(mu2)
    
    #转换为至少两维
    sigma1 = np.atleast_2d(sigma1)
    sigma2 = np.atleast_2d(sigma2)

    assert mu1.shape == mu2.shape, \
        'Training and test mean vectors have different lengths'
    assert sigma1.shape == sigma2.shape, \
        'Training and test covariances have different dimensions'

    diff = mu1 - mu2

    # Product might be almost singular
    covmean, _ = linalg.sqrtm(sigma1.dot(sigma2), disp=False)#整体开平方 
    
    #数据是否都是有限的
    if not np.isfinite(covmean).all():
        msg = ('fid calculation produces singular product; '
               'adding %s to diagonal of cov estimates') % eps
        print(msg)
        offset = np.eye(sigma1.shape[0]) * eps#生成对角矩阵
        covmean = linalg.sqrtm((sigma1 + offset).dot(sigma2 + offset))

    # Numerical error might give slight imaginary component
    if np.iscomplexobj(covmean):#判断是否为复数
        #获得矩阵对角线上的虚数部分，判断是否接近0，允许误差1e-3，如果误差过大，报错
        if not np.allclose(np.diagonal(covmean).imag, 0, atol=1e-3):
            m = np.max(np.abs(covmean.imag))
            raise ValueError('Imaginary component {}'.format(m))
        covmean = covmean.real#取实数部分

    tr_covmean = np.trace(covmean)#算矩阵的迹

    return (diff.dot(diff) + np.trace(sigma1)
            + np.trace(sigma2) - 2 * tr_covmean)


def calculate_activation_statistics(files, model, batch_size=50, dims=2048,
                                    device='cpu', num_workers=1):
    """Calculation of the statistics used by the FID.
    Params:
    -- files       : List of image files paths
    -- model       : Instance of inception model
    -- batch_size  : The images numpy array is split into batches with
                     batch size batch_size. A reasonable batch size
                     depends on the hardware.
    -- dims        : Dimensionality of features returned by Inception
    -- device      : Device to run calculations
    -- num_workers : Number of parallel dataloader workers

    Returns:
    -- mu    : The mean over samples of the activations of the pool_3 layer of
               the inception model.
    -- sigma : The covariance matrix of the activations of the pool_3 layer of
               the inception model.
    """
    act = get_activations(files, model, batch_size, dims, device, num_workers)
    mu = np.mean(act, axis=0)
    sigma = np.cov(act, rowvar=False)
    return mu, sigma


def compute_statistics_of_path(path, model, batch_size, dims, device,
                               num_workers=1):
    #如果以.npz结尾
    if path.endswith('.npz'):
        with np.load(path) as f:
            m, s = f['mu'][:], f['sigma'][:]
    else:
        path = pathlib.Path(path)
        files = sorted([file #有序
                        for ext in IMAGE_EXTENSIONS#make sure what we get is img
                       for file in path.glob('*.{}'.format(ext))])
        m, s = calculate_activation_statistics(files, model, batch_size,
                                               dims, device, num_workers)

    return m, s


def calculate_fid_given_paths(paths, batch_size, device, dims, num_workers=1):
    """Calculates the FID of two paths"""
    #chech path
    for p in paths:
        if not os.path.exists(p):
            raise RuntimeError('Invalid path: %s' % p)

    block_idx = InceptionV3.BLOCK_INDEX_BY_DIM[dims]

    model = InceptionV3([block_idx]).to(device)

    #get the mean and conv of the dateset
    m1, s1 = compute_statistics_of_path(paths[0], model, batch_size,
                                        dims, device, num_workers)
    m2, s2 = compute_statistics_of_path(paths[1], model, batch_size,
                                        dims, device, num_workers)
    #compute
    fid_value = calculate_frechet_distance(m1, s1, m2, s2)

    return fid_value


def save_fid_stats(paths, batch_size, device, dims, num_workers=1):
    """Calculates the FID of two paths"""
    if not os.path.exists(paths[0]):
        raise RuntimeError('Invalid path: %s' % paths[0])

    if os.path.exists(paths[1]):
        raise RuntimeError('Existing output file: %s' % paths[1])

    block_idx = InceptionV3.BLOCK_INDEX_BY_DIM[dims]

    model = InceptionV3([block_idx]).to(device)

    print(f"Saving statistics for {paths[0]}")

    m1, s1 = compute_statistics_of_path(paths[0], model, batch_size,
                                        dims, device, num_workers)

    np.savez_compressed(paths[1], mu=m1, sigma=s1)


def main():
    args = parser.parse_args()
    print(args.path)

    if args.device is None:
        device = torch.device('cuda' if (torch.cuda.is_available()) else 'cpu')
    else:
        device = torch.device(args.device)

    if args.num_workers is None:
        try:
            num_cpus = len(os.sched_getaffinity(0))#get the number of CPUs
        except AttributeError:
            # os.sched_getaffinity is not available under Windows, use
            # os.cpu_count instead (which may not return the *available* number
            # of CPUs).
            num_cpus = os.cpu_count()

        num_workers = min(num_cpus, 8) if num_cpus is not None else 0
    else:
        num_workers = args.num_workers

    if args.save_stats:
        save_fid_stats(args.path, args.batch_size, device, args.dims, num_workers)
        return
    fid_value = calculate_fid_given_paths(paths=args.path,
                                          batch_size=args.batch_size,
                                          device=device,
                                          dims=args.dims,
                                          num_workers=num_workers)
    print('FID: ', fid_value)


if __name__ == '__main__':
    main()

MORE_77

关注

20
点赞
踩
26

收藏

觉得还不错? 一键收藏
打赏
1
评论
一些loss和指标记录_python实现

2 Lpips_loss感知损失，符合人眼的损失输入网络提取特征，计算L2距离，算平均。3 PSNR全称：peak signal-to-Noise Ratio 峰值信噪比定义：给定大小为n∗mn*mn∗m的标签图片III和模型输出图片KKK：rgb情况下计算PSNR的代码：4 SSIM全称：Structural Similarity 结构相似性5.3 判别器的loss注意，计算判别器的loss，由有部分组成。两个输入，real_img和fake_img。6 FID指标FID指标：用于衡
复制链接

扫一扫