ERFNet的核心操作是residual connections
和factorized convolutions(convolutions with 1D kernels)
83 FPS in a single Titan X
7 FPS in a JetsonTX1 (embedded GPU)
Factorized Residual Layers
non-bottleneck和bottleneck有相近的参数量以及精度。但bottleneck计算量低,层数多,非线性强,故多被采用。
但严格来讲,non-bottleneck精度更高。
1D factorized convolutions 大大减少参数量(33%)的同时精度与二维卷积接近,且非线性更强。
non-bottleneck-1D (non-bt-1D) receives a direct 33% reduction in both convolutions and greatly accelerating its execution time.
Downsampling (reducing the spatial resolution) has thedrawback
of reducing the pixel precision (coarser outputs), but it also has two benefits
: it lets the deeper layers gather more context (to improve classification) and it helps to reduce computation.
Architecture design
Result
cityscapes