持续更新中。。。图1. SSD架构图
总体架构The SSD approach is based on a feed-forward convolutional network that produces a fixed-size collection of bounding boxes and scores for the presence of object class instances in those boxes, followed by a non-maximum suppression step to produce the final detections. The early network layers are based on a standard architecture used for high quality image classification (truncated before any classification layers), which we will call the base network
SSD 方法基于一个输出固定尺寸边界框和存在于边界框内对象实例分数集合的前馈卷积网络,(译者注:前馈卷积网络后)跟着一个非极大值抑制步骤以产出最终的检测(译者注:结果)。前期网络层基于用于高质量图像分类的标准架构(在分类层前截断),我们称之为基础网络Multi-scale feature maps for detection We add convolutional feature layers to the end of the truncated base network. These layers decrease in size progressively and allow predictions of detections at multiple scales. The convolutional model for predicting detections is different for each feature layer (cfOverfeat
用于检测的多尺度特征图
我们在截断的基础网络后添加卷积特征层,这些层在尺寸上逐步减小并且允许预测多尺度检测,用于预测检测的卷积模型每一个特征层是不同的(比照 OverfeatConvolutional predictors for detection Each added feature layer (or optionally an ex- isting feature layer from the base network) can produce a fixed set of detection predic- tions using a set of convolutional filters. These are indicated on top of the SSD network architecture in Fig. 2. For a feature layer of size m × n with p channels, the basic el- ement for predicting parameters of a potential detection is a 3 × 3 × p small kernelthat produces either a score for a category, or a shape offset relative to the default box coord