READING NOTE: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

TITLE: ENet A Deep Neural Network Architecture for Real-Time Semantic Segmentation

AUTHER: Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello

ASSOCIATION: University of Warsaw, Purdue University

FROM: arXiv:1606.02147

CONTRIBUTIONS

  1. A novel deep neural network architecture named ENet (efficient neural network) is propsed, which is quite efficient.
  2. A serie of designing strategies is discussed.

Design Choices

Network Architecture

Readers could refer to the paper to have a look at the network architecture. The network is inspired by ResNet structure, while the authers re-design it based on the specific task of semantic segmentation and their intuitions. The intial block and basic building block (bottlenect module) is shown in the following figure. After the intial block, a comparetively large encoder is constructed using the bottleneck module. On the other hand, a smaller decoder follows the encoder.

Design Strategy

  1. Feature map resolution: Small feature map resolution has two drawbacks 1) loss of finer information of edges and 2) smaller size compared with original image. The advantage is that small feature map resolution means larger receptive field and more context for the filters. The first problem is solved by adding more feature maps or unsampling technique.
  2. Early downsampling: Early downsampling is very helpful for boosting the efficiency of the network while persisting the performance. The idea is that visual information is highly redundant and that initial network layers should not directly contribute to classification but act as good feature extractors.
  3. Decoder size: In most previous works, the encoder and decoder have the same size, for example totally symmetric. In this work, the auther uses a larger encoder and a smaller decoder. The responsibility of encoder is to operate on smaller resolution data and provide for information processing and filtering. Instead, the role of the the decoder, is to upsample the output of the encoder, only fine-tuning the details.
  4. Nonlinear operations In this paper some interesting observations are carried out. The auther invetigates the effect of nonlinear operations by training the network using PReLU. All layers in the main branch behave nearly exactly like regular ReLUs, while the weights of PReLU inside bottleneck modules are negative. It means that typical identity shortcut in ResNet does not work well because of the limited depth of the network.
  5. Information-preserving dimensionality changes: A method of performing pooling operation in parallel with a convolution of stride 2 and concatenating resulting feature maps is used to guarentee efficiency and performance, just as the intial block shows.
  6. Factorizing filters: Using factorizing technique can achive a kernel of larger size while using less computations. In addition, deeper network and more times of non-linear operation helps simulate richer functions.
  7. Dilated convolutions: Dilated convolutions is a good way of maintaining feature resolution while boosting efficiency.
  8. Regularization: Spatial Dropout is used to prevent overfitting.

ADVANTAGES

  1. The network processes fast.

DISADVANTAGES

  1. The performance is comparatively inferior.
  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 3
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值