Skeletonization 基于UNet的骨架提取网络

★★★ 本文源自AlStudio社区精品项目,【点击此处】查看更多精品内容 >>>


Skeletonization: 基于UNet的骨架提取网络

本项目为ICCV 2021:the Deep Learning for Geometric Computing中Pixel SkelNetOn比赛的亚军方案
论文地址:U-Net based skeletonization and bag of tricks
原项目地址:https://github.com/namdvt/skeletonization

主要贡献

  • 向UNet网络中加入 Multi-head Attention 机制
  • 在Decoder中引入CBAM(Convolutional Block Attention Module)
  • 建立Auxiliary task learning,将模型前向传递过程中的结果加入Loss的计算中

网络整体结构

UNet网络因为其机构酷似“U”而得名,在本项目中,输入图像大小为(256 * 256),Loss的计算中需要最终结果图(256 * 256)和推理过程中的三张图,除了label以外,对label进行三次resize,从而得到另外三张图的label

预测效果

左图为待预测图,中间为真值图,右侧为预测图

数据集介绍

图像数据集由1725幅黑白图像组成,这些图像以256×256像素的便携式网络图形格式提供,分为1218幅训练图像、241幅验证图像和266幅测试图像。我们提供测试集和验证集中每个类的样本。有两种类型的图像:表示数据集中形状的形状图像和表示与形状图像对应的骨架的骨架图像。
Pixel SkelNetOn 竞赛:像素骨架提取分析能对不同目标的二值化掩码进行细化,克服传统方法人工设置参数的不便。如何从少量样本中准确提取不同形状的的骨架,存在一定的挑战性。

1.解压数据集

该数据集为CVPR 2022官方提供

!unzip data/data190494/Pixel_SkelNet.zip -d ./Pixel_SkelNet

2.自定义数据集加载

通过paddle提供的paddle.io.Dataset接口自定义数据加载方式
输入图片已是二值化之后的,因此可以将像素值整理至[0,1]中,将label三次resize之后作为标签

import paddle
import cv2
import numpy as np

class Normalize(object):
    def __call__(self, sample):
        image, label = sample
        image = image / 255.
        image = np.expand_dims(image, axis=0)
        label = label / 255.
        # 二值化
        label[label >= 0.5] = 1
        label[label < 0.5] = 0

        label_128 = cv2.resize(label, (128, 128), interpolation=cv2.INTER_AREA)
        label_64 = cv2.resize(label, (64, 64), interpolation=cv2.INTER_AREA)
        label_32 = cv2.resize(label, (32, 32), interpolation=cv2.INTER_AREA)

        return image, label, label_128, label_64, label_32

class MyDataset(paddle.io.Dataset):
    def __init__(self, ann_path):
        super(MyDataset,self).__init__()
        self.normalize = Normalize()
        self.dataset = []
        self.indexes = []
        with open(ann_path, 'r') as f:
            ann_file = f.readlines()
        for i in ann_file:
            i = i.replace('\n', '').split(' ')
            self.indexes.append([i[0], i[1]])
            img = cv2.imread(i[0])
            label = cv2.imread(i[1])

            self.dataset.append([img, label])
        

    def __getitem__(self,index):
        image, label =  self.dataset[index]
        image = cv2.resize(image, (256, 256))
        image = (image[:,:,0])
        
        label = cv2.resize(label, (256, 256))
        label = (label[:,:,0])
        image, label, label_128, label_64, label_32 = self.normalize((image, label))

        return [image.astype("float32")], [label, label_128, label_64, label_32]
    
    def __len__(self):
        return len(self.indexes)

3.定义Loss

本方案中的Loss需要根据四幅输出图像和四个label计算结果,这种计算方式称为Auxiliary task learning.
四个Loss的加权公式为:
L f i n a l = 0.5 L 256 + 0.3 L 128 + 0.2 L 64 + 0.1 L 32 L_{final}=0.5L_{256}+0.3L_{128}+0.2L_{64}+0.1L_{32} Lfinal=0.5L256+0.3L128+0.2L64+0.1L32

import paddle.nn as nn
import paddle
import math
import paddle.nn.functional as F

class DiceLoss(nn.Layer):
    def __init__(self, smooth=1.):
        super(DiceLoss, self).__init__()
        self.smooth = smooth

    def forward(self, preds, targets):

        numerator = 2 * paddle.sum(preds * targets) + self.smooth
        denominator = paddle.sum(preds ** 2) + paddle.sum(targets ** 2) + self.smooth
        soft_dice_loss = 1 - numerator / denominator

        return soft_dice_loss


class WeightedFocalLoss(nn.Layer):
    "Non weighted version of Focal Loss"
    def __init__(self, alpha=.01, gamma=2):
        super(WeightedFocalLoss, self).__init__()
        self.alpha = paddle.to_tensor([alpha, 1-alpha])
        self.gamma = gamma

    def forward(self, preds, targets):
        BCE_loss = F.binary_cross_entropy(paddle.flatten(preds), paddle.flatten(targets).astype("float32"), reduction='none')
        targets = targets.astype(paddle.compat.long_type)
        # self.alpha = self.alpha.to(preds.device)
        at = self.alpha.gather(paddle.flatten(targets))
        pt = paddle.exp(-BCE_loss)
        F_loss = at*(1-pt)**self.gamma * BCE_loss
        F_loss = F_loss.mean()

        if math.isnan(F_loss) or math.isinf(F_loss):
            F_loss = paddle.zeros(1).to(preds.device)
            
        return F_loss

class Loss(nn.Layer):
    def __init__(self):
        super(Loss, self).__init__()
        self.alpha = 0.4
        self.dice_loss = DiceLoss()
        self.focal_loss = WeightedFocalLoss()
        self.w_dice = 1.
        self.w_focal = 100.
        self.S_dice = []
        self.S_focal = []
        self.sigmoid = paddle.nn.Sigmoid()

    def forward(self, pred, pred_128, pred_64, pred_32, target, target_128, target_64, target_32):
        pred = self.sigmoid(pred.squeeze())
        pred_128 = self.sigmoid(pred_128.squeeze())
        pred_64 = self.sigmoid(pred_64.squeeze())
        pred_32 = self.sigmoid(pred_32.squeeze())
        
        soft_dice_loss = self.dice_loss(pred, target) * self.w_dice
        bce_loss = self.focal_loss(pred, target) * self.w_focal
        
        soft_dice_loss_128 = self.dice_loss(pred_128, target_128) * self.w_dice
        bce_loss_128  = self.focal_loss(pred_128, target_128) * self.w_focal
        
        soft_dice_loss_64 = self.dice_loss(pred_64, target_64) * self.w_dice
        bce_loss_64 = self.focal_loss(pred_64, target_64) * self.w_focal
        
        soft_dice_loss_32 = self.dice_loss(pred_32, target_32) * self.w_dice
        bce_loss_32 = self.focal_loss(pred_32, target_32) * self.w_focal

        loss = 0.5*(soft_dice_loss+bce_loss) \
                + 0.3*(soft_dice_loss_128+bce_loss_128) \
                + 0.2*(soft_dice_loss_64+bce_loss_64) \
                + 0.1*(soft_dice_loss_32+bce_loss_32)

        return loss

4.定义优化器

原项目使用的优化器为CosineAnnealingWarmRestarts,在Paddle中并未提供
这里使用项目中复现的CosineAnnealingWarmRestarts优化器

from paddle.optimizer.lr import LinearWarmup
from paddle.optimizer.lr import CosineAnnealingDecay

class Cosine(CosineAnnealingDecay):
    """
    Cosine learning rate decay
    lr = 0.05 * (math.cos(epoch * (math.pi / epochs)) + 1)

    Args:
        lr(float): initial learning rate
        step_each_epoch(int): steps each epoch
        epochs(int): total training epochs
    """

    def __init__(self, lr, step_each_epoch, epochs, **kwargs):
        super(Cosine, self).__init__(
            learning_rate=lr,
            T_max=step_each_epoch * epochs, )

        self.update_specified = False

class CosineWarmup(LinearWarmup):
    """
    Cosine learning rate decay with warmup
    [0, warmup_epoch): linear warmup
    [warmup_epoch, epochs): cosine decay

    Args:
        lr(float): initial learning rate
        step_each_epoch(int): steps each epoch
        epochs(int): total training epochs
        warmup_epoch(int): epoch num of warmup
    """

    def __init__(self, lr, step_each_epoch, epochs, warmup_epoch=5, **kwargs):
        assert epochs > warmup_epoch, "total epoch({}) should be larger than warmup_epoch({}) in CosineWarmup.".format(
            epochs, warmup_epoch)
        warmup_step = warmup_epoch * step_each_epoch
        start_lr = 0.0
        end_lr = lr
        lr_sch = Cosine(lr, step_each_epoch, epochs - warmup_epoch)

        super(CosineWarmup, self).__init__(
            learning_rate=lr_sch,
            warmup_steps=warmup_step,
            start_lr=start_lr,
            end_lr=end_lr)

        self.update_specified = False

5.定义UNet模型

论文中描述的模型结构很直观

The multi-head attention

The CBAM architecture

将上述两个网络完成后,即可根据总体网络结构图加入UNet网络,完成对UNet网络的修改

import paddle
import paddle.nn as nn
import paddle.nn.functional as F

class Conv2d(nn.Layer):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
        super(Conv2d, self).__init__()
        self.conv = nn.Conv2D(in_channels, out_channels, kernel_size, stride=stride, padding=padding)
        self.bn = nn.BatchNorm2D(out_channels)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

class DoubleConv2d(nn.Layer):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, bias=True):
        super(DoubleConv2d, self).__init__()
        self.conv1 = Conv2d(in_channels, out_channels, kernel_size, stride, padding)
        self.conv2 = Conv2d(out_channels, out_channels, kernel_size, stride, padding)

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        return x

class AttentionGroup(nn.Layer):
    def __init__(self, num_channels):
        super(AttentionGroup, self).__init__()
        self.conv1 = Conv2d(num_channels, num_channels, kernel_size=3, padding=1)
        self.conv2 = Conv2d(num_channels, num_channels, kernel_size=3, padding=1)
        self.conv3 = Conv2d(num_channels, num_channels, kernel_size=3, padding=1)
        self.conv_1x1 = nn.Conv2D(num_channels, 3, kernel_size=1)
        self.softmax = nn.Softmax(axis=1)

    def forward(self, x):
        x1 = self.conv1(x)
        x2 = self.conv2(x)
        x3 = self.conv3(x)
        # s = paddle.nn.Softmax(self.conv_1x1(x), axis=1)
        s = self.softmax(self.conv_1x1(x))

        att = s[:,0,:,:].unsqueeze(1) * x1 + s[:,1,:,:].unsqueeze(1) * x2 \
            + s[:,2,:,:].unsqueeze(1) * x3

        return x + att

class UNet(nn.Layer):

    def __init__(self,):
        super().__init__()

        self.encode = Encoder()
        self.decode = Decoder()

    def forward(self, x):

        out1, out2, out3, out4, x = self.encode(x)
        x, aux_128, aux_64, aux_32 = self.decode(out1, out2, out3, out4, x)

        return x.squeeze(), aux_128.squeeze(), aux_64.squeeze(), aux_32.squeeze()


class UpConv2d(nn.Layer):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, bias=True):
        super(UpConv2d, self).__init__()
        self.conv = nn.Conv2DTranspose(in_channels, out_channels, kernel_size, stride=stride, padding=padding)
        self.bn = nn.BatchNorm2D(out_channels)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        x = self.relu(x)
        return x

class ChannelAttention(nn.Layer):
    def __init__(self, in_planes, ratio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2D(1)
        self.max_pool = nn.AdaptiveAvgPool2D(1)

        self.fc = nn.Sequential(nn.Conv2D(in_planes, in_planes // 16, 1),
                                nn.ReLU(),
                                nn.Conv2D(in_planes // 16, in_planes, 1))
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc(self.avg_pool(x))
        max_out = self.fc(self.max_pool(x))
        out = avg_out + max_out
        return self.sigmoid(out)

class SpatialAttention(nn.Layer):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()

        self.conv1 = nn.Conv2D(2, 1, kernel_size, padding=kernel_size // 2)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = paddle.mean(x, axis=1, keepdim=True)
        max_out = paddle.max(x, axis=1, keepdim=True)
        x = paddle.concat([avg_out, max_out], axis=1)
        x = self.conv1(x)
        return self.sigmoid(x)


class Encoder(nn.Layer):
    def __init__(self, in_channels=3):
        super().__init__()
        self.conv1 = DoubleConv2d(1, 64, kernel_size=3, padding=1)
        self.conv2 = DoubleConv2d(64, 128, kernel_size=3, padding=1)
        self.conv3 = DoubleConv2d(128, 256, kernel_size=3, padding=1)
        self.conv4 = DoubleConv2d(256, 512, kernel_size=3, padding=1)
        self.conv5 = DoubleConv2d(512, 1024, kernel_size=3, padding=1)
        self.pooling = nn.MaxPool2D(kernel_size=2)

        self.att1 = AttentionGroup(64)
        self.att2 = AttentionGroup(128)
        self.att3 = AttentionGroup(256)
        self.att4 = AttentionGroup(512)
        self.att5 = AttentionGroup(1024)

    def forward(self, x):        
        out1 = self.conv1(x)
        out1 = self.att1(out1)

        out2 = self.conv2(self.pooling(out1))
        out2 = self.att2(out2)

        out3 = self.conv3(self.pooling(out2))
        out3 = self.att3(out3)

        out4 = self.conv4(self.pooling(out3))
        out4 = self.att4(out4)

        out5 = self.conv5(self.pooling(out4))
        out5 = self.att5(out5)
        return out1, out2, out3, out4, out5


class Decoder(nn.Layer):
    def __init__(self):
        super().__init__()
        self.upconv1 = UpConv2d(1024, 512, kernel_size=2, stride=2)
        self.upconv2 = UpConv2d(512, 256, kernel_size=2, stride=2)
        self.upconv3 = UpConv2d(256, 128, kernel_size=2, stride=2)
        self.upconv4 = UpConv2d(128, 64, kernel_size=2, stride=2)

        self.conv1 = DoubleConv2d(1024, 512, kernel_size=3, padding=1)
        self.conv2 = DoubleConv2d(512, 256, kernel_size=3, padding=1)
        self.conv3 = DoubleConv2d(256, 128, kernel_size=3, padding=1)
        self.conv4 = DoubleConv2d(128, 64, kernel_size=3, padding=1)

        self.conv1x1 = nn.Conv2D(64, 1, kernel_size=1, stride=1, padding=0)
        self.aux_conv_128 = nn.Conv2D(128, 1, kernel_size=1, stride=1, padding=0)
        self.aux_conv_64 = nn.Conv2D(256, 1, kernel_size=1, stride=1, padding=0)
        self.aux_conv_32 = nn.Conv2D(512, 1, kernel_size=1, stride=1, padding=0)

        self.ca1 = ChannelAttention(512)
        self.sa1 = SpatialAttention()

        self.ca2 = ChannelAttention(256)
        self.sa2 = SpatialAttention()

        self.ca3 = ChannelAttention(128)
        self.sa3 = SpatialAttention()

        self.ca4 = ChannelAttention(64)
        self.sa4 = SpatialAttention()

    def forward(self, out1, out2, out3, out4, x):
        x = self.upconv1(x)
        x = paddle.concat([x, out4], axis=1)
        x = self.conv1(x)
        x = self.ca1(x) * x
        x = self.sa1(x) * x
        aux_32 = self.aux_conv_32(x)

        x = self.upconv2(x)
        x = paddle.concat([x, out3], axis=1)
        x = self.conv2(x)
        x = self.ca2(x) * x
        x = self.sa2(x) * x
        aux_64 = self.aux_conv_64(x)

        x = self.upconv3(x)
        x = paddle.concat([x, out2], axis=1)
        x = self.conv3(x)
        x = self.ca3(x) * x
        x = self.sa3(x) * x
        aux_128 = self.aux_conv_128(x)

        x = self.upconv4(x)
        x = paddle.concat([x, out1], axis=1)
        x = self.conv4(x)
        x = self.ca4(x) * x
        x = self.sa4(x) * x
        x = self.conv1x1(x)

        return x, aux_128, aux_64, aux_32

6.查看网络结构

使用padd.Model()集成网络,进而实现使用model.fit()等高阶接口完成模型训练

model = paddle.Model(UNet())
model.summary((1,1,256,256))

7.训练网络

已将下列代码放入train.py
根据论文的描述,训练200个epoch

import paddle

from core.model import UNet
from solver.dateset_ import MyDataset
from solver.loss import Loss
from solver.coswarmup import CosineWarmup


train_dataset = MyDataset('Pixel_SkelNet/train/train_pair.lst')
test_dataset = MyDataset('Pixel_SkelNet/test/test_pair.lst')

model = paddle.Model(UNet())

# model.summary((1,3,256,256))
scheduler = CosineWarmup(
        lr=0.02, step_each_epoch=10, epochs=200, warmup_steps=10, start_lr=0, end_lr=0.02, verbose=True)
optim = paddle.optimizer.SGD(learning_rate=scheduler, parameters=model.parameters())

model.prepare(optimizer=optim,
                loss=Loss())
callback = paddle.callbacks.VisualDL(log_dir='visualdl_log_dir')

model.fit(train_dataset,
          eval_data=test_dataset,
          eval_freq=10,
          epochs=200,
          batch_size=20,
          save_dir='model',
          log_freq=100,
          save_freq=10,
          callbacks=callback,
          verbose=1)

# 这个save命令可以直接保存推理模型
model.save('inference_model/Model', training=False)
!python train.py

训练结果:

训练过程中的Loss呈阶梯状下降,这是由于采用了CosineAnnealingWarmRestarts优化器的原因

推理测试

使用下述代码进行推理

import paddle
import cv2
import numpy as np

from core.model import UNet

model = paddle.Model(UNet())
model.load('model/final')
model.prepare()
image = cv2.imread('Pixel_SkelNet/train/im/apple-1.png')
image = cv2.resize(image, (256, 256))
image = (image[:,:,0])
image = image / 255.
image = np.expand_dims(image, axis=0)
image = image.astype("float32")
data = paddle.to_tensor([[image]])
sigmoid = paddle.nn.Sigmoid()
result = model.predict(data)
result = paddle.to_tensor(result[0])
result = sigmoid(result)
threshold = 0.4
pred = result[0]
pred[pred >= threshold] = 255
pred[pred < threshold] = 0

cv2.imwrite('test.png', np.array(pred))

!python test.py

推理结果:

左图为原图,中间为真值,右侧为推理结果

发展前景:

本项目可以用于矢量图形的识别和提取,可以实现像素宽度为1的形状识别。

  • 1
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
点云骨架提取是一个比较复杂的问题,需要用到一些点云处理的基础知识和算法。在Python中,可以使用一些第三方库来实现点云骨架提取,比如Open3D和PyVista等。 具体来说,点云骨架提取的步骤可以分为以下几步: 1. 点云预处理:对点云进行去噪、滤波等预处理操作,以提高后续算法的精度和效率。 2. 点云分割:将点云分为不同的部分,以便对每个部分进行单独的骨架提取。 3. 骨架提取:使用一些骨架提取算法,如MEDIAL AXIS TRANSFORM(MAT)算法、SKELETONIZATION算法等,对点云进行骨架提取。 4. 骨架后处理:对提取出的骨架进行后处理,比如去除非法分支、连接分割的部分等。 以下是使用Open3D库实现点云骨架提取的示例代码: ```python import open3d as o3d # 读入点云 pcd = o3d.io.read_point_cloud("point_cloud.pcd") # 预处理 pcd = pcd.voxel_down_sample(voxel_size=0.05) pcd.estimate_normals(search_param=o3d.geometry.KDTreeSearchParamHybrid(radius=0.1, max_nn=30)) # 分割 labels = pcd.cluster_dbscan(eps=0.2, min_points=10) # 骨架提取 line_set = o3d.geometry.LineSet.create_from_point_cloud_alpha_shape(pcd, alpha=0.1, labels=labels) line_set = line_set.filter_smooth_simple(number_of_iterations=10) # 骨架后处理 line_set = line_set.prune_vertices(min_vertex_degree=2) # 可视化 o3d.visualization.draw_geometries([pcd, line_set]) ``` 以上代码中,使用了Open3D库提供的点云降采样、法向量估计、DBSCAN聚类、Alpha Shape、平滑、顶点剪枝等函数实现了点云骨架提取,并使用可视化工具展示了结果。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值