MobileDets_mobiledets: searching for object detection archite-CSDN博客

本文链接：https://blog.csdn.net/qq_26269815/article/details/106759044

MobileDets: Searching for Object Detection Architectures for Mobile Accelerators

（本文仅仅是一个阅读笔记，所有内容均来源于**论文原文**，如有侵权，请联系删除；如果错误，请联系改正。）

研究背景：

IBN模块（inverted bottlenecks）在state-of-the-art 的mobile models 中作为主要构件模块。但是作者发现虽然IBN得益于depthwise-separable conv，可以很好的减少参数数量和flops. 但是在一些现代手机加速器（EdgeTPU accelerators and Qualcomm DSPs,）上很少被优化。

发现常规卷积可以更好的利用移动端加速器

It is observed that for certain tensor shapes and kernel dimensions, a regular convolution can utilize the hardware up to 3× more efficiently than the depthwise variation on an EdgeTPU despite the much larger amount of theoretical computation cost (7× more FLOPs)。也就是说常规的卷积可以更好的利用移动端加速器。

提出新的搜索空间

因此作者提出一个新的搜索空间。 including IBNs and full convolution sequences motivated by the structure of Tensor decomposition [34,6], called TDB(Tensor-Decomposition-Based search space)

实验效果

By learning to leverage full convolutions at selected positions in the network, our method outperforms IBN-only models by a significant margin,

outperform MobileNetV2 by 1.9mAP on mobile CPU, 3.7mAP on EdgeTPU and 3.4mAP on DSP
outperform the state-of-the-art MobileNetV3 classification backbone by 1.7mAP at similar CPU
comparable performance with the state-of-the-art mobile CPU detector, MnasFPN, but 2x faster.

方法：

It employs TuNAS [1] for its scalability and its reliable improvement over random baselines.

TuNAS简要介绍：

A controller whose goal is to pick an architecture that optimize a platform-aware reward function.

The one-shot model and the controller are trained together during search.

In each step, the controller samples a random architecture from a multinomial distribution that spans over the choices,
then the portion of the one-shot model’s weights associated with the sampled architecture are updated,
finally a reward is computed for the sampled architecture, which is used to update the controller

The update is given by applying standard REINFORCE algorithm [37] ：

mAP(M) denotes the detection mAP of an architecture M,
c(M) is the inference cost (in this case, latency)

mAP(M):

为了加快速度， estimate mAP(M) based on a small mini-batch for efficiency.

c(M):

枚举所有网络来得到c(M)是不可能的。所以选择训练一个cost model - a liner regression model来估计模型的 inference cost .

The cost model’s features are composed of, for each layer, an indicator of the cross product between input/output channel sizes and layer type.

在搜索期间，我们使用回归模型作为设备延迟的代理。

为了收集cost model的训练数据，我们从搜索空间中随机抽取数千个网络架构，并在设备上对每个架构进行基准测试。