前置任务
-
首先是作者不公开checkpoints,需要自己训练一遍
这里先不载入模型单纯过一遍流程
而且因为没有说明是否需要去背景(之后再过一下论文),所以反正先用去过背景的数据debug一下 -
3DSR/geo_utils.py:61: RuntimeWarning: invalid value encountered in divide
dist = np.abs(np.sum(points * plane_rs[:,:-1], axis=1) - plane[-1])/ np.sum(plane[:-1]**2)**0.5
这是因为有除零风险,所以加个eps
test_dsr
他这个代码风格还是挺“亲民”的,不像method那种那么复杂
初始化
total_pixel_scores = np.zeros((img_dim * img_dim * len(dataset)))
total_gt_pixel_scores = np.zeros((img_dim * img_dim * len(dataset)))
total_pixel_scores_2d = np.zeros((len(dataset),img_dim, img_dim))
total_gt_pixel_scores_2d = np.zeros((len(dataset),img_dim, img_dim))
取数据
如果当前取到的是“good”类型正常样本,那么直接transform_image;否则带上gt
pc和rgb还是800,800,3,resize成384,384,3(这个384是固定预设的)
;gt是800,800-》384,384,归一化
pc拷贝为image_t,同时取出最后一通道有image 384,384
zero_mask = np.where(image == 0, np.ones_like(image), np.zeros_like(image))
相当于判断image中为0的项记作1,不为0的位置记作0
get_plane_mask
(等于说这里其实是在去背景?)
输入image_t->depth_image
**
depth_image
被重塑为形状 147456,3的points
。变量
p1, p2, p3, p4
:
- 这些变量通过对
depth_image
中特定区域的元素进行求和和平均操作得到,具体来说,是对四个角的3x3区域进行操作。p1, p2, p3, p4
是通过对这些3x3区域中的像素求和后,除以最后一通道上该对应角落区域中非零元素的数量(加上一个非常小的数避免除以零)来计算的。函数
get_plane_from_points(p1, p2, p3)
:
- 从
p1, p2, p3
三点计算一个平面,返回平面的系数 (a, b, c, d),对应于平面方程 (ax + by + cz = d)。函数
get_distance_to_plane(points, plane)
:
- 计算每个点到平面的距离 ∣ a x + b y + c z − d ∣ a 2 + b 2 + c 2 \frac{|ax + by + cz - d|}{\sqrt{a^2 + b^2 + c^2}} a2+b2+c2∣ax+by+cz−d∣
- 返回
point_distance
147456,根据根据点到平面的距离point_distance创建一个掩码points_mask,距离大于 0.005 的位置被设置为 1,其余为 0。从而0是背景,1是前景
将
points_mask
重塑回plane_mask 384,384,1并返回,以匹配原始图像的高度和宽度,但只有一个通道,表示每个像素是否属于计算得到的平面。这个过程主要用于图像深度信息的分析,通过计算深度图中特定区域的平均值和对整个图像每个点进行平面拟合,最终生成一个掩码,指示哪些点近似位于同一平面上。
更新
plane_mask
:
plane_mask[:, :, 0] = plane_mask[:, :, 0] * (1.0 - zero_mask)
:这行代码使用zero_mask
更新plane_mask
。如果zero_mask
在某个位置为1,相应的plane_mask
位置会被置为0。这个操作的目的是在plane_mask
中移除由zero_mask
指定的区域。平面掩码处理:
plane_mask = fill_plane_mask(plane_mask)
:调用fill_plane_mask
函数来进一步处理plane_mask
。这个函数使用形态学闭运算来改进掩码,以平滑和填充掩码中的小空洞和间隙或移除小的噪声点。将闭运算的结果重新赋值给 plane_mask 的第一个通道。更新
image
:
image = image * plane_mask[:,:,0]
:用更新后的plane_mask
的第一个通道(也就是闭运算的那一层)
来更新image
,实质上是保留了与平面对应的部分,移除其他部分。重新计算
zero_mask
和调整image
:
zero_mask
被重新计算,基于新的image
。im_min
和im_max
用于对image
进行标准化。计算
im_min
特殊一些
- 非零元素处理:
image * (1.0 - zero_mask)
通过将image
与(1.0 - zero_mask)
相乘,将 image 中对应 zero_mask 中为1的位置 (即原图中为0的位置) 置为0。这一步确保取最小值只考虑 image 中原本非零的元素。。- 零元素处理:
1000 * zero_mask
则是将zero_mask
中为1的位置 (即非零元素处理后原图中为0的位置) 设置为1000。这一步骤是为了在接下来寻找最小值时彻底排除原图中为0的像素点。
image
通过减去im_min
并除以(im_max - im_min)
进行归一化,再经过缩放和偏移调整到 [0.1, 0.9] 的范围内。不过除此以外还是有为0的missing pixel深度图填充
fill_depth_map
函数:
- 该函数旨在填充深度图中的缺失值missing pixel。通过2轮迭代地应用
unfold
和fold
操作,基于周围的有效像素值来估计缺失的像素值。- 使用一个3x3的窗口对每个像素的深度值进行处理,如果中心像素值为0(缺失),则用周围非零像素的平均值替代。
- unfold 函数将图像分解为多个小块,这里每个小块的大小为 3×3。这个操作允许我们对每个 3×3 的区域进行局部处理。也就是1,1,384,384-》1,9,147456
- 计算非零值的总和(dimg_t_nonzero_sum)和值的总和(dimg_t_sum)
- dimg_t_filtered = dimg_t_sum / (dimg_t_nonzero_sum + 1e-12)
- fold 函数将处理后的小块1,1,147456重新组合成完整的图像1,1,384,384
从而最后取到sample字典里有:image bs,1,384,384->depth_image;rgb bs,3,384,384;mask bs,1,384,384->true_mask_cv 384,384,1;plane_mask bs,1,384,384->fg_mask;has_anomaly bs,1 (1异常0正常)->is_normal;idx 样本索引
depth_image和rgb再拼接得到in_image 1,4,384,384输入给model(DiscreteLatentModelGroups)
DiscreteLatentModelGroups(
(_encoder_b): EncoderBot(
(input_conv_depth): Conv2d(1, 64, kernel_size=(1, 1), stride=(1, 1))
把in_image的第一通道也就是深度那一层经过这个
得到1,64,384,384
(input_conv_rgb): Conv2d(3, 64, kernel_size=(1, 1), stride=(1, 1))
in_image的剩余rgb三层通过这个
得到1,64,384,384
两个输出拼接得到1,128,384,384
(relu): ReLU()
(_conv_1): Conv2d(128, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), groups=2)
(relu): ReLU()
1,128,192,192
(_conv_2): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), groups=2)
(relu): ReLU()
1,256,96,96
(_conv_3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2)
(relu): ReLU()
1,256,96,96
(_residual_stack): ResidualStack(
(_layers): ModuleList(
(0): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), groups=2, bias=False)
)
)
这里是两个残差块
也就是每个block的输入和输出相加
(1): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), groups=2, bias=False)
)
)
)
(relu): ReLU()
)
)
从而有输出enc_b 1,256,96,96输入给下面
(_encoder_t): EncoderTop(
(_conv_1): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), groups=2)
(relu): ReLU()
1,256,48,48
(_conv_2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2)
(relu): ReLU()
1,256,48,48
(_residual_stack): ResidualStack(
(_layers): ModuleList(
(0): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), groups=2, bias=False)
)
)
(1): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), groups=2, bias=False)
)
)
)
(relu): ReLU()
)
)
输出enc_t 1,256,48,48
(_pre_vq_conv_top): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1), groups=2)
输出zt 1,256,48,48
(_vq_vae_top): VectorQuantizerEMA(
重塑出inputs 1,48,48,256、flat_input 2304,256
(_embedding): Embedding(2048, 256)
这里是计算flat_input和_embedding.weight的欧氏距离
但是_embedding.weight的尺寸是2048,256,二者尺寸不适配
所以用了完全平方公式展开计算
得到distances 2304,2048
有distances的最小值索引encoding_indices 2304,1,这也就是每行flat_input在嵌入空间中最近的嵌入向量的索引
初始化一个全零的encodings 2304,2048
encodings.scatter_(1, encoding_indices, 1)
这是一个就地操作,它根据 encoding_indices 中的索引,在 encodings 的对应位置填充1
具体来说,如果某个输入向量最接近第 i 个嵌入向量,则在该输入向量对应的编码行的第 i 列会被设置为1
encodings和_embedding.weight计算矩阵乘法后重塑得到quantized 1,48,48,256
由quantized和inputs计算mse损失e_latent_loss
与预设的self._commitment_cost=0.25相乘得到loss
(所谓承诺成本)
quantized = inputs + (quantized - inputs).detach()并重塑为1,256,48,48
计算encodings的均值avg_probs 2048,
从而有perplexity=exp[-∑avg_probs · log(avg_probs)]
这个困惑度是衡量编码分布多样性的指标
困惑度低表示大多数输入都被映射到少数几个编码上
高困惑度表示输入被均匀映射到许多编码上
在这里,困惑度通过计算编码概率的熵来得出,然后取指数。
)
返回loss->loss_t、quantized->quantized_t、perplexity->perplexity_t、encodings->encodings_t
(upsample_t): ConvTranspose2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), groups=2)
对quantized_t上采样得到up_quantized_t 1,256,96,96
将enc_b与其拼接有feat 1,512,96,96
(_pre_vq_conv_bot): PreVQBot(
feat输入后先取enc_b的部分记作in_enc_b
后半部分记作in_up_t
再对in_enc_b折半得到in_enc_b_d和in_enc_b_rgb 分别1,128,96,96
in_up_t折半得到in_up_b_d和in_up_b_rgb
然后再按照in_enc_b_d,in_up_t_d, in_enc_b_rgb,in_up_t_rgb的顺序拼接回1,512,96,96(这绕的)
(_pre_vq_conv_bot): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), groups=2)
)
输出一个zb 1,256,96,96
(_vq_vae_bot): VectorQuantizerEMA(
(_embedding): Embedding(2048, 256)
和_vq_vae_top一样的操作
区别就在于输入的特征图尺寸不一样
)
输出loss_b、quantized_b 1,256,96,96、perplexity_b、encodings_b 9216,2048
up_quantized_t和quantized_b拼接得到quant_join 1,512,96,96
(_decoder_b): DecoderBot(
quant_join输入进来先均分四份:
in_t_d、in_t_rgb、in_b_d、in_b_rgb
然后在按照in_t_d,in_b_d, in_t_rgb,in_b_rgb的顺序拼接回去得到in_joined
(_conv_1): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2)
得到1,256,96,96
(_residual_stack): ResidualStack(
(_layers): ModuleList(
(0): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), groups=2, bias=False)
)
)
(1): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=2, bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), groups=2, bias=False)
)
)
)
(relu): ReLU()
)
得到1,256,96,96
(_conv_trans_1): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), groups=2)
(relu): ReLU()
得到1,128,192,192
(_conv_trans_2): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), groups=2)
得到1,64,384,384
折半得到x_d和x_rgb,分别1,32,384,384
分别给到下面的卷积
(_conv_out_depth): Conv2d(32, 1, kernel_size=(1, 1), stride=(1, 1))
得到out_d 1,1,384,384
(_conv_out_rgb): Conv2d(32, 3, kernel_size=(1, 1), stride=(1, 1))
得到out_d 1,3,384,384
拼接输出recon_fin 1,4,384,384
)
)
输出loss_b, loss_t, recon_fin->recon_out->recon_image_general, quantized_t->embeddings_lo, quantized_b->embeddings_hi
sub_res_model_hi
输入embeddings_hi, embedder_hi->quantization
其中embedder_hi是上面的_vq_vae_bot
SubspaceRestrictionModule(
(unet): SubspaceRestrictionNetwork(
(encoder): FeatureEncoder(
embeddings_hi 1,256,96,96输入到这里
(block1): Sequential(
(0): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
b1 1,128,96,96
(mp1): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
mp1 1,128,48,48
(block2): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
b2 1,256,48,48
(mp2): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
mp2 1,256,24,24
(block3): Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
b3 1,512,24,24
)
输出b1,b2,b3
(decoder): FeatureDecoder(
只用到b3
(up2): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(3): ReLU(inplace=True)
)
up2 1,256,48,48
(db2): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
db2 1,256,48,48
(up3): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(3): ReLU(inplace=True)
)
up3 1,128,96,96
(db3): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
db3 1,128,96,96
(fin_out): Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
)
)
输出output 1,256,96,96
(_vq_vae_bot): VectorQuantizerEMA(也就是embedder_hi
(_embedding): Embedding(2048, 256)
)
输出loss_b、quantized_b 1,256,96,96、perplexity_b、encodings_b 9216,2048
最终输出output->_、quantized_b->recon_embeddings_hi、loss_b->_
sub_res_model_lo
同理输入embeddings_lo, embedder_lo(_vq_vae_top)
最终输出recon_embeddings_lo
(upsample_t): ConvTranspose2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), groups=2)
recon_embeddings_lo上采样为up_quantized_recon_t 1,256,96,96
再拿他和recon_embeddings_hi拼接为quant_join 1,512,96,96输入到下面
model_decode
ImageReconstructionNetwork(
(block1): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
1,1024,96,96
(mp1): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
1,1024,48,48
(block2): Sequential(
(0): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(1024, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(1024, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(2048, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
1,2048,48,48
(mp2): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
1,1024,24,24
(pre_vq_conv): Conv2d(2048, 64, kernel_size=(1, 1), stride=(1, 1))
1,64,24,24
(upblock1): ConvTranspose2d(64, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
relu
1,64,48,48
(upblock2): ConvTranspose2d(64, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
relu
1,64,96,96
(_conv_1): Conv2d(64, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
1,256,96,96
(_residual_stack): ResidualStack(
(_layers): ModuleList(
(0): Residual(
(_block): Sequential(
(0): ReLU(inplace=True)
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): ReLU(inplace=True)
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
(1): Residual(
(_block): Sequential(
(0): ReLU(inplace=True)
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): ReLU(inplace=True)
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
)
)
1,256,96,96
(_conv_trans_1): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
relu
1,128,192,192
(_conv_trans_2): ConvTranspose2d(128, 4, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
)
最终输出recon_image_recon 1,4,384,384
decoder_seg
把recon_image_recon->image_real和recon_image_general->image_anomaly去梯度后输入
AnomalyDetectionModule(
拼接得到img_x 1,8,384,384
(unet): UnetModel(这次是个ED之间有跳跃连接的unet了
(encoder): UnetEncoder(
(block1): Sequential(
(0): Conv2d(8, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
b1 1,32,384,384
(mp1): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
mp1 1,32,192,192
(block2): Sequential(
(0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
b2 1,64,192,192
(mp2): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
mp2 1,64,96,96
(block3): Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
b3 1,128,96,96
(mp3): Sequential(
(0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
mp3 1,128,48,48
(block4): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
b4 1,128,48,48
)
输入b1,b2,b3,b4
(decoder): UnetDecoder(
(up1): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(3): ReLU(inplace=True)
)
输入b4得到up1 1,128,96,96
和b3拼接得到 cat1 1,256,96,96
(db1): Sequential(
(0): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
输入cat1得到db1 1,128,96,96
(up2): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(3): ReLU(inplace=True)
)
输入db1得到up2 1,64,192,192
和b2拼接得到cat2 1,128,192,192
(db2): Sequential(
(0): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
输入cat2得到db2 1,64,192,192
(up3): Sequential(
(0): Upsample(scale_factor=2.0, mode=bilinear)
(1): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(2): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(3): ReLU(inplace=True)
)
输入db2得到up3 1,32,384,384
和b1拼接得到cat3 1,64,384,384
(db3): Sequential(
(0): Conv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(2): ReLU(inplace=True)
(3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(32, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
(5): ReLU(inplace=True)
)
输入cat3得到db3 1,32,384,384
(fin_out): Sequential(
(0): Conv2d(32, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
)
)
)
输出out_mask 1,2,384,384
softmax之后取他的第2通道平均池化得到out_mask_averaged 1,1,384,384
再有它最大值image_score
再取第1通道拍平得到flat_out_mask 147456,
flat_true_mask = true_mask_cv.flatten()
147456,
total_pixel_scores[mask_cnt * img_dim * img_dim:(mask_cnt + 1) * img_dim * img_dim] = flat_out_mask
这里的mask_cnt是每一个batch累加一
total_gt_pixel_scores[mask_cnt * img_dim * img_dim:(mask_cnt + 1) * img_dim * img_dim] = flat_true_mask
total_pixel_scores_2d[mask_cnt] = out_mask_averaged[0,0,:,:]
total_gt_pixel_scores_2d[mask_cnt] = true_mask_cv[:,:,0]
mask_cnt += 1
总之最后auc的计算还是靠的sklearn,顺带还算了ap,也是图级像素级,也是sklearn
pro也是用的cpmf、sg那一套
(cpmf) pengpeng@pengpeng-X79:/media/pengpeng/新加卷/3DSR$ python test_dsr.py --gpu_id 0 --data_path /media/pengpeng/新加卷/mvtec3d/new/ --out_path checkpoints --run_name 3dsr_depth_MODEL
/media/pengpeng/新加卷/3DSR/geo_utils.py:61: RuntimeWarning: invalid value encountered in divide
dist = np.abs(np.sum(points * plane_rs[:,:-1], axis=1) - plane[-1])/ np.sum(plane[:-1]**2)**0.5
------------------
bagel
AUC Image: 0.48295454545454547
AP Image: 0.8303980866193896
AUC Pixel: 0.8653512617500728
AP Pixel: 0.0027919836111362077
AUPRO: 0.6409741619885146
/media/pengpeng/新加卷/3DSR/geo_utils.py:61: RuntimeWarning: invalid value encountered in divide
dist = np.abs(np.sum(points * plane_rs[:,:-1], axis=1) - plane[-1])/ np.sum(plane[:-1]**2)**0.5
------------------
cable_gland
AUC Image: 0.38204707170224417
AP Image: 0.7543322533620077
AUC Pixel: 0.3191318201560968
AP Pixel: 0.0012690916945081912
AUPRO: 0.11711088222595595
/media/pengpeng/新加卷/3DSR/geo_utils.py:61: RuntimeWarning: invalid value encountered in divide
dist = np.abs(np.sum(points * plane_rs[:,:-1], axis=1) - plane[-1])/ np.sum(plane[:-1]**2)**0.5
------------------
carrot
AUC Image: 0.5145903479236813
AP Image: 0.8586393671379533
AUC Pixel: 0.7002368997022665
AP Pixel: 0.0028430123250470962
AUPRO: 0.5335603331938935
/media/pengpeng/新加卷/3DSR/geo_utils.py:61: RuntimeWarning: invalid value encountered in divide
dist = np.abs(np.sum(points * plane_rs[:,:-1], axis=1) - plane[-1])/ np.sum(plane[:-1]**2)**0.5
------------------
cookie
AUC Image: 0.4618585298196949
AP Image: 0.7708466572657134
AUC Pixel: 0.5833117101238113
AP Pixel: 0.0041975325858495125
AUPRO: 0.2375673997345978
/media/pengpeng/新加卷/3DSR/geo_utils.py:61: RuntimeWarning: invalid value encountered in divide
dist = np.abs(np.sum(points * plane_rs[:,:-1], axis=1) - plane[-1])/ np.sum(plane[:-1]**2)**0.5
------------------
dowel
AUC Image: 0.6194526627218934
AP Image: 0.8756597485734436
AUC Pixel: 0.8712298011132138
AP Pixel: 0.041571855979198304
AUPRO: 0.6949830659963853
/media/pengpeng/新加卷/3DSR/geo_utils.py:61: RuntimeWarning: invalid value encountered in divide
dist = np.abs(np.sum(points * plane_rs[:,:-1], axis=1) - plane[-1])/ np.sum(plane[:-1]**2)**0.5
------------------
foam
AUC Image: 0.50875
AP Image: 0.8136079544186484
AUC Pixel: 0.1103978803713763
AP Pixel: 0.0003780426904060454
AUPRO: 0.0
/media/pengpeng/新加卷/3DSR/geo_utils.py:61: RuntimeWarning: invalid value encountered in divide
dist = np.abs(np.sum(points * plane_rs[:,:-1], axis=1) - plane[-1])/ np.sum(plane[:-1]**2)**0.5
------------------
peach
AUC Image: 0.43287373004354135
AP Image: 0.7901264692416418
AUC Pixel: 0.14096067314659547
AP Pixel: 0.00037021432963991514
AUPRO: 0.0003233022044344373
/media/pengpeng/新加卷/3DSR/geo_utils.py:61: RuntimeWarning: invalid value encountered in divide
dist = np.abs(np.sum(points * plane_rs[:,:-1], axis=1) - plane[-1])/ np.sum(plane[:-1]**2)**0.5
------------------
potato
AUC Image: 0.5696640316205533
AP Image: 0.8396992944310544
AUC Pixel: 0.24822617017575851
AP Pixel: 0.0002800779041878196
AUPRO: 0.1580423488325828
/media/pengpeng/新加卷/3DSR/geo_utils.py:61: RuntimeWarning: invalid value encountered in divide
dist = np.abs(np.sum(points * plane_rs[:,:-1], axis=1) - plane[-1])/ np.sum(plane[:-1]**2)**0.5
------------------
rope
AUC Image: 0.5163043478260869
AP Image: 0.7099800925148527
AUC Pixel: 0.8200974765135473
AP Pixel: 0.0029663631237565164
AUPRO: 0.5942897059926104
/media/pengpeng/新加卷/3DSR/geo_utils.py:61: RuntimeWarning: invalid value encountered in divide
dist = np.abs(np.sum(points * plane_rs[:,:-1], axis=1) - plane[-1])/ np.sum(plane[:-1]**2)**0.5
------------------
tire
AUC Image: 0.550344827586207
AP Image: 0.7836263304448925
AUC Pixel: 0.822602468380847
AP Pixel: 0.012433688865297466
AUPRO: 0.6727794170154127
--------MEAN---------------------------------------
AUC Image: 0.5038840094698447
AP Image: 0.8026916254009597
AUC Pixel: 0.5481546161433586
AP Pixel: 0.006910186310902708
AUPRO: 0.3649630617184388
AUC 48.3 38.2 51.46 46.19 61.95 50.88 43.29 56.97 51.63 55.03 50.39
AUCp 86.54 31.91 70.02 58.33 87.12 11.04 14.1 24.82 82.01 82.26 54.82
AUPRO 64.1 11.71 53.36 23.76 69.5 0.0 0.03 15.8 59.43 67.28 36.5
AP 83.04 75.43 85.86 77.08 87.57 81.36 79.01 83.97 71.0 78.36 80.27
test_dsr_depth
首先model是discrete_model.py的DiscreteLatentModel,导入的模型是"DADA_D.pckl"
上一篇test_dsr中model是discrete_model_groups.py的DiscreteLatentModelGroups,导入"DADA_RGB_D.pckl"
并且输入输出的通道数也从4变成1,也就是说单纯只用深度层
decoder_seg的输入通道从8改成2
model_decode的输出通道从4到1
out_mask_averaged的平均池化尺寸也变了
DiscreteLatentModel(
输入1,1,384,384
因为输入给模型的in_image没有拼接
只有深度通道
(_encoder_b): EncoderBot(
(_conv_1): Conv2d(1, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(_conv_2): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(_conv_3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
1,256,96,96
那么其实相对于DiscreteLatentModelGroups这里少了一层卷积
(_residual_stack): ResidualStack(
(_layers): ModuleList(
(0): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
(1): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
)
(relu): ReLU()
)
(relu): ReLU()
)
(_encoder_t): EncoderTop(
(_conv_1): Conv2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(_conv_2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(_residual_stack): ResidualStack(
(_layers): ModuleList(
(0): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
(1): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
)
(relu): ReLU()
)
(relu): ReLU()
)
(_pre_vq_conv_top): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(_vq_vae_top): VectorQuantizerEMA(
(_embedding): Embedding(2048, 256)
)
(upsample_t): ConvTranspose2d(256, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(_pre_vq_conv_bot): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(_vq_vae_bot): VectorQuantizerEMA(
(_embedding): Embedding(2048, 256)
)
(_decoder_b): DecoderBot(
(_conv_1): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(_residual_stack): ResidualStack(
(_layers): ModuleList(
(0): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
(1): Residual(
(_block): Sequential(
(0): ReLU()
(1): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(2): ReLU()
(3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
)
)
(relu): ReLU()
)
(_conv_trans_1): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(_conv_trans_2): ConvTranspose2d(128, 1, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(relu): ReLU()
)
)
输出loss_b, loss_t, recon_out->recon_image_general 1,1,384,384, quantized_t->embeddings_lo 1,256,48,48, quantized_b->embeddings_hi 1,256,96,96
后面问题不大,因为前面模型初始化有改动大小
上面这个是去过背景的tiff的深度通道可视化
上面这个是未去背景的tiff的深度通道可视化