[深度学习论文笔记][Image Classification] Human Performance

最新推荐文章于 2022-06-20 13:39:31 发布

Hao_Zhang_Vision

最新推荐文章于 2022-06-20 13:39:31 发布

阅读量511

点赞数

分类专栏： CNN Papers 文章标签： CNN Deep Learning Computer Vision Papers Image Classification

本文链接：https://blog.csdn.net/Hao_Zhang_Vision/article/details/52945892

版权

CNN Papers 专栏收录该内容

58 篇文章 1 订阅

订阅专栏

Russakovsky, Olga, et al. “Imagenet large scale visual recognition challenge.” International Journal of Computer Vision 115.3 (2015): 211-252. (Citations: 1352).

1 Error Both CNN and Human are Susceptible to

1.1 Multiple Objects

Both CNN and humans struggle with images that contain multiple ILSVRC classes (usually many more than five), with little indication of which object is the focus of the image.
See the first column of Fig. 3.12.

We attribute 24% of GoogLeNet errors and 16% of human errors to this category. Humans can have a slight advantage in this error type, since it can sometimes be easy to identify the most salient object in the image.

1.2 Incorrect Annotations
We found that approximately 0.3% were incorrectly annotated in the ground truth. This introduces an approximately equal number of errors for both humans and GoogLeNet.

2 Error CNN is More Susceptible to Than Human
2.1 Object Small or Thin
Such as a standing person wearing sunglasses, or a small ant on a stem of a flower. We found that 21% of GoogLeNet errors fall into this category, while none of the human errors do. See the forth column of Fig. 3.12.

2.2 Image Filters
Many people enhance their photos with filters that distort the contrast and color distributions of the image. We found that 13% of the images that GoogLeNet incorrectly classified contained a filter. See the third column of Fig. 3.12.

2.3 Abstract Representations
GoogLeNet struggles with images that depict objects of interest in an abstract form, such as 3D-rendered images, paintings, sketches, plush toys, or statues. We attribute approximately 6% of GoogLeNet errors to this type. See the fifth column of Fig. 3.12.

2.4 Miscellaneous Sources
Including extreme closeups of parts of an object, unconventional viewpoints, objects with heavy occlusions. See the second column of Fig. 3.12.

3 Error Human is More Susceptible to Than CNN
3.1 Fine-Grained Recognition
Humans are noticeably worse at fine-grained recognition, even when they are in clear view. We estimate that 37% of the human errors fall into this category, while only 7%
of GoogLeNet erros do. See the last column of Fig. 3.12.

3.2 Class Unawareness
The annotator may sometimes be unaware of the ground truth class present as a label option. Approximately 24% of the human errors fall into this category.

3.3 Insufficient Training Data
The annotator is only presented with 13 examples of a class under every category name. Approximately 5% of human errors fall into this category.

4 Conclusions
Human accuracy is not a point. It lives on a tradeoff curve. It is clear that humans will soon only be able to outperform state of the art image classification models by use of significant effort, expertise, and time.

Hao_Zhang_Vision

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[深度学习论文笔记][Image Classification] Human Performance

Russakovsky, Olga, et al. “Imagenet large scale visual recognition challenge.” International Journal of Computer Vision 115.3 (2015): 211-252. (Citations: 1352).1 Error Both CNN and Human are Su
复制链接

扫一扫