《LEDNet:A Lightweight Encoder-Decoder Network For Real-Time Semantic Segmentation》论文笔记

代码地址1:LEDNet official
代码地址2:LEDNet unofficial

1. 概述

导读:这篇文章提出的方法LEDNet是用于解决实时场景下的分割任务的,该网络是采用非对称的编解码器结构。具体的,采用ResNet网络作为主干网络,在每个残差块中使用channel split与shuffle减少计算量(shufflenet的思想);在解码器中使用attention pyramid work(APN)来降低网络的复杂度。最后得到的网络参数量少于1M,在CityScapes数据集上单GPU上能够飚到71FPS。

这篇文章的主要贡献:

  • 1)提出了非对称的网络结构LEDNet,减少了网络的参数同时也加快了运行时间;
  • 2)残差块中的channel split与shuffle操作利用了网络的大小与强大的空间表达能力,并且channel shuffle是可微的,也就能端到端训练;
  • 3)在解码器中采用attention机制的Attention Pyramid Network(APN),减少了整个网络的复杂度;

2. 网络设计

2.1 网络结构

文章提出的网络结构见下图所示,很明显是一个非对称的编解码器结构。
在这里插入图片描述
详细的网络设计见下表所示:
在这里插入图片描述
其中,Downsampling Unit是将stride为2,kernel大小为 3 ∗ 3 3*3 33的卷积输出与Max Pooling的输出叠加起来实现下采样的。在解码器的APN模块中使用参数设置为 3 ∗ 3 , 5 ∗ 5 , 7 ∗ 7 + s t r i d e 为 2 3*3,5*5,7*7+stride为2 33,55,77+stride2的三个卷积去产生特征金字塔,之后金字塔特征与编码器输出特征进行点乘融合,再经过Attention操作使得特征得到增强,最后通过上采样得到分割的结果。

2.2 Split与Shuffle的残差块

一般来讲高精度语义分割是计算密集的而且需要较大的memory,对此现有的现有的克服方法大概有两种:网络剪裁压缩与卷积因式分解。文章针对残差块的问题(bottleneck)与channel shuffle(并不适用于轻量级网络)做出了改进得到下图中(d)的残差块结构:
在这里插入图片描述
使用SS-nbt模块带来的好处是:

  • 1)模块的计算高效,使得可以增加更多的特征的channel数量,
  • 2)该模块在输出端才进行channel shuffle可以被认为是一种feature reuse,这样可以在不显著增加网络复杂度的同时提升网络的表达能力。

3. 实验结果

网络的性能与现有的实时分割网络的性能比较:
在这里插入图片描述
各个分类的分割性能比较:
在这里插入图片描述

### U2-Net Deep Learning Model Overview The **U2-Net** model is not explicitly mentioned in any of the provided references; however, based on general knowledge within the field of deep learning, it can be inferred to belong to a family of neural network architectures designed primarily for image segmentation tasks. Below is an overview of its structure and functionality. #### Architecture Description U2-Net refers to a lightweight yet powerful architecture specifically tailored for salient object detection (SOD). It employs a deeply nested encoder-decoder framework where multiple levels of feature maps are extracted through successive convolution operations followed by pooling layers during encoding stages [^5]. During decoding phases, these features undergo up-sampling processes combined with skip connections enabling richer contextual understanding at various scales. Additionally, unlike traditional single-path networks such as vanilla UNets which may lose fine details due to downsizing mechanisms applied across all spatial dimensions simultaneously – thereby reducing resolution significantly before reconstruction steps commence later down pipeline - here instead each stage maintains higher resolutions throughout most parts thanks largely because residual blocks were strategically placed inside every layer pair thus preserving more intricate patterns even after aggressive compression techniques have been utilized elsewhere along processing chain . This design choice allows better preservation of edge boundaries while still maintaining computational efficiency suitable enough so mobile devices could potentially run them without much trouble despite limited hardware resources available compared against high-end GPUs typically found inside research labs or cloud computing environments today . #### Applications Given its proficiency in identifying key areas within complex visual datasets efficiently , some common use cases include but aren't limited strictly too : 1. Medical Imaging Analysis : Detect tumors automatically from MRI scans etc. 2. Autonomous Driving Systems : Recognize pedestrians/vehicles accurately under varying weather conditions . 3. Augmented Reality Experiences Creation : Isolate foreground objects seamlessly overlaying virtual elements onto real-world views captured via smartphone cameras instantly ```python import torch from u2net import U2NET # Hypothetical module name assuming PyTorch implementation exists model = U2NET() input_tensor = torch.randn((1, 3, 256, 256)) output_mask = model(input_tensor) print(output_mask.shape) # Expected shape would likely match input size e.g., (1, 1, 256, 256), representing predicted masks per batch item respectively ``` §§Related Questions§§ 1. How does multi-scale fusion contribute towards enhancing performance metrics like IoU scores when employing U2-NET over standard alternatives? 2. Can you explain how attention modules integrated into certain variants affect overall accuracy versus speed tradeoffs observed experimentally ? 3. What preprocessing steps should ideally precede feeding raw pixel values directly into this kind of semantic segmentation algorithm ? Would normalization always yield positive outcomes regardless dataset specifics involved therein?
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值