论文与代码分析|| EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies

EfficientAD: Accurate Visual Anomaly Detection at Millisecond-Level Latencies

+ ∞ 2 = ∑ + ∞ \frac{+\infty}{2}=\sum\limits^{+\infty} 2+=+

https://github.com/rximg/EfficientAD

Summary

  • 提出了一个简称PDN的高效的网络结构。
  • 提出了能高效训练教师-学生网络的困难特征loss。
  • 实现了一个基于自编码器的有效的逻辑缺陷检测。

Method(s)

Efficient Patch Descriptors

提取特征,设计了一个PDN网络作为特征提取
在这里插入图片描述
EfficientAD-S教师网络结构,学生网络具有相同的架构,但在Conv-4层有768个内核,而不是384个内核。填充值为3表示在输入要素地图的每个边框上分别附加三行或三列零。
在这里插入图片描述

  1. 每个输出神经元具有33×33个像素的感受野,因此每个输出特征向量描述一个33×33的patch
  2. 相比于之前的S-T方法只使用了很少的卷积层的特征,这个提升了效率
  3. PDN生成的特征向量仅依赖于其各自的33×33块中的像素,确保了图像一部分中的异常不会触发其他遥远部分中的异常特征向量

Lightweight Student–Teacher

对于异常特征向量的检测,使用了学生-教师方法,其中教师网络、学生网络是PDN(推理不到一毫秒的时间)。此外,这种轻量级的师生方法对缺乏以前方法所使用的技术来提高异常检测性能,如合多个教师和学生,使用来自层的金字塔的特征,以及使用学生和教师网络之间的架构不对称。

在标准的S-T框架中,增加训练集可以提高学生在异常情况下模仿老师的能力但降低了异常检测的性能,而减少训练集可能会抑制学生网络对正常图像重要信息的学习

因此,引入了训练损失,与Online Hard Example Mining类似,其将学生的loss限制在图像中最相关的部分,提出了hard feature loss,它只使用损失最大的输出元素进行反向传播。

标准S-T方法的loss 计算:
D c , w , h = ( T ( I ) c , w , h − S ( I ) c , w , h ) 2 D_{c,w,h}=(T(I)_{c,w,h}-S(I)_{c,w,h})^{2} Dc,w,h=(T(I)c,w,hS(I)c,w,h)2,T表示教师网络,S表示学生网络,I表示输入图像,cwh分别代表通道数、图像宽、图像高

基于 p h a r d ∈ [ 0 , 1 ] p_{hard}∈[0,1] phard[01],比如取 d h a r d = 0.999 d_{hard}=0.999 dhard=0.999。取 D c , w , h ≥ d h a r d D_{c,w,h}\geq d_{\mathrm{hard}} Dc,w,hdhard的部分参与 L h a r d L_{hard} Lhard计算。将 d h a r d d_{hard} dhard设置为0将产生最初的S-T损失。在实验中,我们将 d h a r d d_{hard} dhard设置为0.999,这对应于平均使用D的三个维度中每个维度中的10%的值进行反向传播。这样的loss更关注于产品的主体区域。

111
在推理过程中,异常分数图 M w , h = C − 1 ∑ c D c , w , h M_{w,h}=C^{-1}\sum_{c}D_{c,w,h} Mw,h=C1cDc,w,h

由于这个hard feature loss,抑制了了背景区域的假正例。

在标准的S-T框架中,教师在ImageNet上进行预训练,学生没有接受关于该预训练数据集的培训,而只接受了关于的正常图像训练。在每个训练步骤中,我们ImageNet数据集中采样随机图像P,在我们的例子中是ImageNet。计算学生的损失为:
L S T = L h a r d + ( C W H ) − 1 ∑ c ∥ S ( P ) c ∥ F 2 . L_{\mathrm{ST}} = L_{\mathrm{hard}} + (CWH)^{-1}\sum_{c}\|S(P)_{c}\|_{F}^{2}. LST=Lhard+(CWH)1cS(P)cF2.

Logical Anomaly Detection

在这里插入图片描述
在这里插入图片描述

上图描述了EfficientAD的异常检测方法。它由前面提到的师生对和一个自动编码器组成。自动编码器被训练来预测教师的输出。

A的loss计算:
L A E = ( C W H ) − 1 ∑ c ∥ T ( I ) c − A ( I ) c ∥ F 2 L_{\mathrm{AE}}=(CWH)^{-1}\sum_{c}\|T(I)_{c}-A(I)_{c}\|_{F}^{2} LAE=(CWH)1cT(I)cA(I)cF2
其中A(I)代表自动编码器的输出

与基于补丁的学生模型不同,自动编码器通过编码和解码获取输出
在存在逻辑异常的图像上,自动编码器通常无法生成正确的潜在代码
在正常图像上,因为自动编码器通常在细粒度模式下的重建存在困难,其重建也存在缺陷

使用教师模型输出与自动编码器重建之间的差异作为异常图,在这些情况下会导致误报的检测。相反,我们加倍了学生网络的输出通道数量,并训练它除了预测教师模型的输出外,还要预测自动编码器的输出。

S’(I)表示学生的附加输出通道,学生的额外损失loss计算:
L S T A E = ( C W H ) − 1 ∑ c ∥ A ( I ) c − S ′ ( I ) c ∥ F 2 L_{\mathrm{STAE}}=(CWH)^{-1}\sum_{c}\|A(I)_{c}-S^{\prime}(I)_{c}\|_{F}^{2} LSTAE=(CWH)1cA(I)cS(I)cF2
最终: L t o t a l = L S T + L A E + L S T A E L_{total}=L_{ST}+L_{AE}+L_{STAE} Ltotal=LST+LAE+LSTAE

学生模型学习了自动编码器在正常图像上的系统性重建误差,例如模糊的重建。同时,它不会学习异常情况下的重建误差,因为这些不是训练集的一部分。这使得自动编码器的输出与学生模型的输出之间的差异非常适合用于计算异常图,异常图是两个输出之间的平方差,跨通道平均。我们称这种异常图为全局异常图,并且称学生-教师对生成的异常图为局部异常图。我们平均这两种异常图来计算组合异常图,并使用其最大值作为图像级别的异常分数。因此,组合异常图包含了学生-教师对的检测结果和自动编码器-学生对的检测结果。在计算这些检测结果时共享学生的隐藏层使我们的方法能够在保持低计算需求的同时,能够检测结构性和逻辑性异常。

Anomaly Map Normalization

局部和全局异常图必须先标准化为相似的比例,然后再对其进行平均以获得组合异常图。否则,一个模块中的噪声可能会对另一个模块中的检测准确性产生干扰

我们为每组计算两个 P-分位数: q a q_a qa q b q_b qb,分别是 p = a 和 p = b。我们确定一个线性变换,将 qa 映射到异常分数 0,将 qb 映射到分数 0.1。在测试时,局部和全局异常图会使用各自的线性变换进行归一化

Implementation Details

训练集的图片数量: 105
验证集的图片数量:12

training of EfficientAD-S
  • Require

A pretrained teacher network  T : R 3 × 256 × 256 → R 384 × 64 × 64  with an architecture as given in Table 6 \text{A pretrained teacher network }T:\mathbb{R}^{3\times256\times256}\to\mathbb{R}^{384\times64\times64}\text{ with an architecture as given in Table 6} A pretrained teacher network T:R3×256×256R384×64×64 with an architecture as given in Table 6

A   s e q u e n c e   o f   t r a i n i n g   i m a g e s   I t r a i n   w i t h   I t r a i n ∈ R 3 × 256 × 256   f o r   e a c h   I t r a i n ∈ I t r a i n \mathrm{A~sequence~of~training~images~}\mathcal{I}_{\mathrm{train}}\mathrm{~with~}I_{\mathrm{train}}\in\mathbb{R}^{3\times256\times256}\mathrm{~for~each~}I_{\mathrm{train}}\in\mathcal{I}_{\mathrm{train}} A sequence of training images Itrain with ItrainR3×256×256 for each ItrainItrain

Randomly initialize a student network S : R 3 × 256 × 256 → R 768 × 64 × 64  with an architecture as given in Table 6 \text{Randomly initialize a student network S}:\mathbb{R}^{3\times256\times256}\to\mathbb{R}^{768\times64\times64}\text{ with an architecture as given in Table 6} Randomly initialize a student network S:R3×256×256R768×64×64 with an architecture as given in Table 6
代码这里网络没有选择填充,所以64x64->56x56

if config.model_size == 'small':
    teacher = get_pdn_small(out_channels) # out_channels:384
    student = get_pdn_small(2 * out_channels)

if config.model_size == 'small':
    teacher = get_pdn_small(out_channels)
    student = get_pdn_small(2 * out_channels)
    
autoencoder = get_autoencoder(out_channels)

#################################################################
def get_pdn_small(out_channels=384, padding=False):
    pad_mult = 1 if padding else 0
    return nn.Sequential(
        nn.Conv2d(in_channels=3, out_channels=128, kernel_size=4,
                  padding=3 * pad_mult),
        nn.ReLU(inplace=True),
        nn.AvgPool2d(kernel_size=2, stride=2, padding=1 * pad_mult),
        nn.Conv2d(in_channels=128, out_channels=256, kernel_size=4,
                  padding=3 * pad_mult),
        nn.ReLU(inplace=True),
        nn.AvgPool2d(kernel_size=2, stride=2, padding=1 * pad_mult),
        nn.Conv2d(in_channels=256, out_channels=256, kernel_size=3,
                  padding=1 * pad_mult),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=256, out_channels=out_channels, kernel_size=4)
    )

def get_autoencoder(out_channels=384):
    return nn.Sequential(
        # encoder
        nn.Conv2d(in_channels=3, out_channels=32, kernel_size=4, stride=2,
                  padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=32, out_channels=32, kernel_size=4, stride=2,
                  padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=32, out_channels=64, kernel_size=4, stride=2,
                  padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=64, out_channels=64, kernel_size=4, stride=2,
                  padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=64, out_channels=64, kernel_size=4, stride=2,
                  padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=64, out_channels=64, kernel_size=8),
        # decoder
        nn.Upsample(size=3, mode='bilinear'),
        nn.Conv2d(in_channels=64, out_channels=64, kernel_size=4, stride=1,
                  padding=2),
        nn.ReLU(inplace=True),
        nn.Dropout(0.2),
        nn.Upsample(size=8, mode='bilinear'),
        nn.Conv2d(in_channels=64, out_channels=64, kernel_size=4, stride=1,
                  padding=2),
        nn.ReLU(inplace=True),
        nn.Dropout(0.2),
        nn.Upsample(size=15, mode='bilinear'),
        nn.Conv2d(in_channels=64, out_channels=64, kernel_size=4, stride=1,
                  padding=2),
        nn.ReLU(inplace=True),
        nn.Dropout(0.2),
        nn.Upsample(size=32, mode='bilinear'),
        nn.Conv2d(in_channels=64, out_channels=64, kernel_size=4, stride=1,
                  padding=2),
        nn.ReLU(inplace=True),
        nn.Dropout(0.2),
        nn.Upsample(size=63, mode='bilinear'),
        nn.Conv2d(in_channels=64, out_channels=64, kernel_size=4, stride=1,
                  padding=2),
        nn.ReLU(inplace=True),
        nn.Dropout(0.2),
        nn.Upsample(size=127, mode='bilinear'),
        nn.Conv2d(in_channels=64, out_channels=64, kernel_size=4, stride=1,
                  padding=2),
        nn.ReLU(inplace=True),
        nn.Dropout(0.2),
        nn.Upsample(size=56, mode='bilinear'),
        nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1,
                  padding=1),
        nn.ReLU(inplace=True),
        nn.Conv2d(in_channels=64, out_channels=out_channels, kernel_size=3,
                  stride=1, padding=1)
    )

  • 计算教师输出的通道的归一化参数,
#(1,384,1,1),(1,384,1,1)
teacher_mean, teacher_std = teacher_normalization(teacher, train_loader)
def teacher_normalization(teacher, train_loader):
    mean_outputs = []
    for train_image, _ in tqdm(train_loader, desc='Computing mean of features'):
        if on_gpu:
            train_image = train_image.to(device)
        teacher_output = teacher(train_image) # ->(1,384,56,56)
        mean_output = torch.mean(teacher_output, dim=[0, 2, 3]) #(384) 计算每个chanel的µ
        mean_outputs.append(mean_output) # (105,384)
    channel_mean = torch.mean(torch.stack(mean_outputs), dim=0) #(105,384)->(384)
    channel_mean = channel_mean[None, :, None, None] #(1,384,1,1)

    mean_distances = []
    for train_image, _ in tqdm(train_loader, desc='Computing std of features'):
        if on_gpu:
            train_image = train_image.to(device)
        teacher_output = teacher(train_image) # (1,384,56,56)
        distance = (teacher_output - channel_mean) ** 2
        mean_distance = torch.mean(distance, dim=[0, 2, 3])
        mean_distances.append(mean_distance)
    channel_var = torch.mean(torch.stack(mean_distances), dim=0)
    channel_var = channel_var[None, :, None, None]
    channel_std = torch.sqrt(channel_var)

    return channel_mean, channel_std
  • 对于S和A的参数,用学习率 : 1 0 − 4 10^{-4} 104 和权重衰减 : 1 0 − 4 10^{-4} 104初始化 Adam
optimizer = torch.optim.Adam(itertools.chain(student.parameters(),
                                                 autoencoder.parameters()),
                                 lr=1e-4, weight_decay=1e-5)
# lr调度器,step_size周期,每次调整学习率时乘以的因子,这里设置为 0.1,意味着每次调整后学习率会降低到原来的 10%
scheduler = torch.optim.lr_scheduler.StepLR(
        optimizer, step_size=int(0.95 * config.train_steps), gamma=0.1)
  • 迭代训练teacher是不更新参数的,只有student和autoencoder更新参数
#  开始训练迭代,train_loader_infinite代表一个无限循环的迭代器
    for iteration, (image_st, image_ae), image_penalty in zip(
            tqdm_obj, train_loader_infinite, penalty_loader_infinite):
        if on_gpu:
            image_st = image_st.to(device) # 随机选择一个训练图像(1,3,256,256)
            image_ae = image_ae.to(device)
            if image_penalty is not None:
                image_penalty = image_penalty.to(device)
        with torch.no_grad(): # S-T的前向传播
            teacher_output_st = teacher(image_st) # Y'(1,384,56,56)
            teacher_output_st = (teacher_output_st - teacher_mean) / teacher_std # 归一化教师输出Y^_
        student_output_st = student(image_st)[:, :out_channels] # 选取学生输出的前一半通道(Y^st)参与S-T训练
        distance_st = (teacher_output_st - student_output_st) ** 2 # 计算Y^st与Y^_的平方差D^st
        d_hard = torch.quantile(distance_st, q=0.999) # 计算D^st元素的0.999分位数
        loss_hard = torch.mean(distance_st[distance_st >= d_hard]) # hard feature loss:只使用损失最大的输出元素进行反向传播。

        if image_penalty is not None:
            student_output_penalty = student(image_penalty)[:, :out_channels]
            loss_penalty = torch.mean(student_output_penalty ** 2)
            loss_st = loss_hard + loss_penalty
        else:
            loss_st = loss_hard # 计算出了loss_total中的第一个loss_st,这里没有加正则项,因为要从ImageNet中取图像

        ae_output = autoencoder(image_ae) # Y^a (1,384,56,56)
        with torch.no_grad(): # A的前向传播,训练A来预测教师的输出
            teacher_output_ae = teacher(image_ae) # Y'
            teacher_output_ae = (teacher_output_ae - teacher_mean) / teacher_std # Y^_
        student_output_ae = student(image_ae)[:, out_channels:] # (1,384,56,56),学生输出的通道后一半Y^stae
        distance_ae = (teacher_output_ae - ae_output) ** 2  # (1,384,56,56) ,D^ae= (Y^_ - Y^a)^2
        distance_stae = (ae_output - student_output_ae) ** 2  #(1,384,56,56), D^stae = (Y^a - Y^stae)^2
        loss_ae = torch.mean(distance_ae)  #  loss_ae
        loss_stae = torch.mean(distance_stae) # loss_stae
        loss_total = loss_st + loss_ae + loss_stae  # loss_toatal

        optimizer.zero_grad()
        loss_total.backward()
        optimizer.step()
        scheduler.step()
  • 在验证集上获取Anomaly Map Normalization所需的 q a S T , q b S T , q a A E , q b A E q_{a}^{\mathrm{ST}},{q_{b}^{\mathrm{ST}}},q_{a}^{\mathrm{AE}}, q_{b}^{\mathrm{AE}} qaST,qbST,qaAE,qbAE
def predict(image, teacher, student, autoencoder, teacher_mean, teacher_std,
            q_st_start=None, q_st_end=None, q_ae_start=None, q_ae_end=None):
    teacher_output = teacher(image)
    teacher_output = (teacher_output - teacher_mean) / teacher_std # Y'
    student_output = student(image) # Y_s
    autoencoder_output = autoencoder(image) # Y_a (1,384,56,56)
    map_st = torch.mean((teacher_output - student_output[:, :out_channels]) ** 2,
                        dim=1, keepdim=True) # (1,1,56,56)计算S-T的map,即 local anomaly map,
    map_ae = torch.mean((autoencoder_output -
                         student_output[:, out_channels:]) ** 2,
                        dim=1, keepdim=True)# (1,1,56,56)计算A-S的map, 即global anomaly map
    if q_st_start is not None: # 在test中启用
        map_st = 0.1 * (map_st - q_st_start) / (q_st_end - q_st_start)
    if q_ae_start is not None:# 在test中启用
        map_ae = 0.1 * (map_ae - q_ae_start) / (q_ae_end - q_ae_start)
    map_combined = 0.5 * map_st + 0.5 * map_ae
    return map_combined, map_st, map_ae


@torch.no_grad()
def map_normalization(validation_loader, teacher, student, autoencoder,
                      teacher_mean, teacher_std, desc='Map normalization'):
    maps_st = []
    maps_ae = []
    # ignore augmented ae image
    for image, _ in tqdm(validation_loader, desc=desc):
        if on_gpu:
            image = image.to(device)
        map_combined, map_st, map_ae = predict(
            image=image, teacher=teacher, student=student,
            autoencoder=autoencoder, teacher_mean=teacher_mean,
            teacher_std=teacher_std) #
        maps_st.append(map_st) # maps_st验证集中的所有local anomaly map
        maps_ae.append(map_ae) #maps_ae验证集中的所有global anomaly map
    maps_st = torch.cat(maps_st) #(12,1,56,56)
    maps_ae = torch.cat(maps_ae)
    # 分别计算两个异常图的0.9分位数和0.995分位数
    q_st_start = torch.quantile(maps_st, q=0.9)
    q_st_end = torch.quantile(maps_st, q=0.995)
    q_ae_start = torch.quantile(maps_ae, q=0.9)
    q_ae_end = torch.quantile(maps_ae, q=0.995)
    return q_st_start, q_st_end, q_ae_start, q_ae_end
inference 推理过程

获取 q a S T , q b S T , q a A E , q b A E q_{a}^{\mathrm{ST}},{q_{b}^{\mathrm{ST}}},q_{a}^{\mathrm{AE}}, q_{b}^{\mathrm{AE}} qaST,qbST,qaAE,qbAE,Teacher, Student, Autoencoder, teacher_mean, teacher_std后,就可以进行推理

def predict(image, teacher, student, autoencoder, teacher_mean, teacher_std,
            q_st_start=None, q_st_end=None, q_ae_start=None, q_ae_end=None):
    teacher_output = teacher(image)
    teacher_output = (teacher_output - teacher_mean) / teacher_std # Y'
    student_output = student(image) # Y_s
    autoencoder_output = autoencoder(image) # Y_a (1,384,56,56)
    map_st = torch.mean((teacher_output - student_output[:, :out_channels]) ** 2,
                        dim=1, keepdim=True) # (1,1,56,56)计算S-T的map,即 local anomaly map,
    map_ae = torch.mean((autoencoder_output -
                         student_output[:, out_channels:]) ** 2,
                        dim=1, keepdim=True)# (1,1,56,56)计算A-S的map, 即global anomaly map
    if q_st_start is not None:
        map_st = 0.1 * (map_st - q_st_start) / (q_st_end - q_st_start) # 计算归一化
    if q_ae_start is not None:
        map_ae = 0.1 * (map_ae - q_ae_start) / (q_ae_end - q_ae_start)# 计算归一化
    map_combined = 0.5 * map_st + 0.5 * map_ae # 组合异常图
    return map_combined, map_st, map_ae

def test(test_set, teacher, student, autoencoder, teacher_mean, teacher_std,
         q_st_start, q_st_end, q_ae_start, q_ae_end, test_output_dir=None,
         desc='Running inference'):
    y_true = []
    y_score = []
    for image, target, path in tqdm(test_set, desc=desc):
        orig_width = image.width
        orig_height = image.height
        image = default_transform(image) # resize(256,256)
        image = image[None] # 前面增加一个维度
        if on_gpu:
            image = image.to(device)
        map_combined, map_st, map_ae = predict(
            image=image, teacher=teacher, student=student,
            autoencoder=autoencoder, teacher_mean=teacher_mean,
            teacher_std=teacher_std, q_st_start=q_st_start, q_st_end=q_st_end,
            q_ae_start=q_ae_start, q_ae_end=q_ae_end) #  (1,1,56,56)
        map_combined = torch.nn.functional.pad(map_combined, (4, 4, 4, 4))#填充0,(1,1,56,56)-> (1,1,64,64)
        map_combined = torch.nn.functional.interpolate(
            map_combined, (orig_height, orig_width), mode='bilinear') # 双线性插值,resize到source image
        y_score_image = np.max(map_combined) # 这张图的异常分数 image-level score

train过程和test过程分离

源码中两个过程在一起,不想每次训练,就把训练好后把 q a S T , q b S T , q a A E , q b A E q_{a}^{\mathrm{ST}},{q_{b}^{\mathrm{ST}}},q_{a}^{\mathrm{AE}}, q_{b}^{\mathrm{AE}} qaST,qbST,qaAE,qbAE,Teacher, Student, Autoencoder, teacher_mean, teacher_std都保存下来,单独进行test测试

import numpy as np
import tifffile
import torch
from torch.utils.data import DataLoader
from torchvision import transforms
import argparse
import itertools
import os
import random
from tqdm import tqdm
from common import get_autoencoder, get_pdn_small, get_pdn_medium, \
    ImageFolderWithoutTarget, ImageFolderWithPath, InfiniteDataloader
from sklearn.metrics import roc_auc_score
import xml.etree.ElementTree as ET
from efficientad import default_transform, predict
import matplotlib.pyplot as plt

root = "output/1"
train_output_dir = f"{root}/trainings/mvtec_ad/ytmqOne"
test_output_dir = f"{root}/anomaly_maps"
device = "cuda"

q_st_start = torch.load(os.path.join(train_output_dir, 'q_st_start.pt'), map_location=device)
q_st_end = torch.load(os.path.join(train_output_dir, 'q_st_end.pt'), map_location=device)
q_ae_start = torch.load(os.path.join(train_output_dir, 'q_ae_start.pt'), map_location=device)
q_ae_end = torch.load(os.path.join(train_output_dir, 'q_ae_end.pt'), map_location=device)
teacher_mean = torch.load(os.path.join(train_output_dir, 'teacher_mean.pt'), map_location=device)
teacher_std = torch.load(os.path.join(train_output_dir, 'teacher_std.pt'), map_location=device)
teacher = torch.load(os.path.join(train_output_dir, 'teacher_final.pth'), map_location=device)
student = torch.load(os.path.join(train_output_dir, 'student_final.pth'), map_location=device)
autoencoder = torch.load(os.path.join(train_output_dir, 'autoencoder_final.pth'), map_location=device)

teacher.eval()
student.eval()
autoencoder.eval()

test_set = ImageFolderWithPath(
    os.path.join("E:\datasets\mvtec_anomaly_detection", "ytmqOne", 'test'))

y_true = []
y_score = []
for image, target, path in tqdm(test_set, desc="infer: "):
    orig_width = image.width
    orig_height = image.height
    sourceImage = np.array(image)
    image = default_transform(image)  # resize(256,256)
    image = image[None]  # 前面增加一个维度
    image = image.to(device)

    teacher_output = teacher(image)
    teacher_output = (teacher_output - teacher_mean) / teacher_std  # Y'
    student_output = student(image)  # Y_s
    autoencoder_output = autoencoder(image)  # Y_a (1,384,56,56)
    map_st = torch.mean((teacher_output - student_output[:, :384]) ** 2,
                        dim=1, keepdim=True)  # (1,1,56,56)计算S-T的map,即 local anomaly map,
    map_ae = torch.mean((autoencoder_output -
                         student_output[:, 384:]) ** 2,
                        dim=1, keepdim=True)  # (1,1,56,56)计算A-S的map, 即global anomaly map
    map_st = 0.1 * (map_st - q_st_start) / (q_st_end - q_st_start)  # 计算归一化
    map_ae = 0.1 * (map_ae - q_ae_start) / (q_ae_end - q_ae_start)  # 计算归一化
    map_combined = 0.5 * map_st + 0.5 * map_ae  # 组合异常图

    map_combined = torch.nn.functional.pad(map_combined, (4, 4, 4, 4))  # 填充0,(1,1,56,56)-> (1,1,64,64)
    map_combined = torch.nn.functional.interpolate(
        map_combined, (orig_height, orig_width), mode='bilinear')  # 双线性插值,resize到source image
    map_combined = map_combined[0, 0].detach().cpu().numpy()
    img_nm = os.path.split(path)[1].split('.')[0]

    # if test_output_dir is not None:
    #     file = os.path.join(test_output_dir, img_nm + '.tiff')
    #     tifffile.imwrite(file, map_combined)

    f, axes = plt.subplots(1, 2)
    axes[0].imshow(sourceImage)
    axes[1].imshow(map_combined, vmin=0, vmax=1, cmap=plt.cm.jet)
    f.set_size_inches(3 * 2, 3)
    f.tight_layout()
    f.savefig(os.path.join(test_output_dir, img_nm + '.png'))
    plt.close('all')
    y_score_image = np.max(map_combined)  # 这张图的异常分数 image-level score

  • 9
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值