SSD 输入图片尺寸、比例

最新推荐文章于 2023-11-17 10:40:35 发布

Damon_Code

最新推荐文章于 2023-11-17 10:40:35 发布

阅读量6.8k

点赞数 5

分类专栏： AI 文章标签： SSD 输入图片比例尺寸

本文链接：https://blog.csdn.net/u014710355/article/details/104198695

版权

AI 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

SSD一般是 VVG16特征提取 + flat + 全连接预测

默认的SSD S 300*300 或512*512

实际中，图片一般不是这个比例、大小。怎么办？

1、先说比例问题

SSD输入是正方形，如果是矩形，模型会自动缩放，会引起图片畸变，影响精度，解决方法：

A、放任模型缩放，牺牲精度

B、对图片切分成多个正方形的图像，有可能导致部分特征被切到多个图中；

C、做图形的扩展，如300*900的图，在两侧补充白色为900*900的方形，代价是会引起目标特征相对于图的比例缩小，影响小特征的检测。

参考：

You have several options from then on:

Just letting TF reshape the input to (w, h) with the resizer, without preprocessing. The problem is that the images will be deformed, which may (or not, depending on your data and the objects you're trying to detect) be a problem.
Cropping all the images to have sub-images with the same aspect ratio as (w, h). Problem: you'll lose part of the images or have to do more inferences for each image.
Padding all images (with black pixels or random white noise) to get images with the same aspect ratio as (w, h). You'll have to do some coordinate translations on the output bounding boxes (the coordinates you'll get will be in the augmented image, you'll have to translate to initial coordinates by multiplying them by old_size/new_size on both axes). The problem is that some objects will be downsized (relatively to the full image size) more than some others, which may or may not be a problem depending on your data and what you're trying to detect.

结论，最好处理成方形，否则会有畸变。推荐方法C

参考：https://stackoverflow.com/questions/48145456/tensorflow-object-detection-api-ssd-model-using-keep-aspect-ratio-resizer?rq=1

另外，

SSD and faster R-CNN work quite differently one from another, so, even though F-RCNN has no such constraint, for SSD you need input images that always have the same size (actually you need the feature map to always have the same size, but the best way to ensure it is with always the same input size). This is because it ends with fully connected layers, for which you need to know the size of the feature maps; whereas for F-RCNN there are only convolutions (which work on any input size) up to the ROI-pooling layer (which only doesnt need a fixed image size).

2、图像大小问题

经过1中处理，图像已是方形，那我们是否要预处理输入图片为300*300或512*512？

模型在读取数据时，会自动缩放到300*300或512*512，可以不手动处理。

具体见连接

“原因6：SSD设置了输入图片的大小，它会将不同大小的图片裁剪为300x300，或者512x512，和Faster-rcnn相比，在输入上就会少很多的计算，不要说后面的啦，不快就怪啦！！！”

也可以在1中直接缩放图形，记得要处理label文件中的坐标。

3、修改成SSD N*N模型

如果我就不要300*300 或者512*512怎么办？

改为SSD640

不建议增大模型，会产生更多Bbox----->

如上图所示，当Faster-rcnn的输入分辨率为1000x600时，产生的BB是6000个；当SSD300的输入分辨率为300x300时，产生的BB是8372个；当SSD512的输入分辨率为512x512时，产生的BB是24564个，大家像一个情况，当SSD的分辨率也是1000x600时，会产生多少个BB呢？这个数字可能会很大！但是它却说自己比Faster-rcnn和YOLO等算法快很多，我们来分析分析原因。
————————————————
版权声明：本文为CSDN博主「技术挖掘者」的原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/WZZ18191171661/article/details/79444217

4、针对小目标的改进

SSD 存在问题：
SSD的缺点是对小尺寸的目标识别仍比较差，还达不到Faster R-CNN的水准。这主要是因为小尺寸的目标多用较低层级的anchor来训练(因为小尺寸目标在较低层级IOU较大)，较低层级的特征非线性程度不够，无法训练到足够的精确度。

个人观点：SSD到底好不好，需要根据你的应用和需求来讲，真正合适你的应用场景的检测算法需要你去做性能验证，比如你的场景是密集的包含多个小目标的，我很建议你用Faster-rcnn，针对特定的网络进行优化，也是可以加速的；如果你的应用对速度要求很苛刻，那么肯定首先考虑SSD，至于那些测试集上的评估结果，和真实的数据还是有很大的差距，算法的性能也需要进一步进行评估。
————————————————
版权声明：本文为CSDN博主「技术挖掘者」的原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/WZZ18191171661/article/details/79444217

A、改进的SSD RSSD