[深度学习论文笔记][Image Classification] Human Performance

Russakovsky, Olga, et al. “Imagenet large scale visual recognition challenge.” International Journal of Computer Vision 115.3 (2015): 211-252. (Citations: 1352).


1 Error Both CNN and Human are Susceptible to

1.1 Multiple Objects

Both CNN and humans struggle with images that contain multiple ILSVRC classes (usually many more than five), with little indication of which object is the focus of the image.
See the first column of Fig. 3.12.


We attribute 24% of GoogLeNet errors and 16% of human errors to this category. Humans can have a slight advantage in this error type, since it can sometimes be easy to identify the most salient object in the image.




1.2 Incorrect Annotations
We found that approximately 0.3% were incorrectly annotated in the ground truth. This introduces an approximately equal number of errors for both humans and GoogLeNet.

2 Error CNN is More Susceptible to Than Human
2.1 Object Small or Thin
Such as a standing person wearing sunglasses, or a small ant on a stem of a flower. We found that 21% of GoogLeNet errors fall into this category, while none of the human errors do. See the forth column of Fig. 3.12.

2.2 Image Filters
Many people enhance their photos with filters that distort the contrast and color distributions of the image. We found that 13% of the images that GoogLeNet incorrectly classified contained a filter. See the third column of Fig. 3.12.

2.3 Abstract Representations
GoogLeNet struggles with images that depict objects of interest in an abstract form, such as 3D-rendered images, paintings, sketches, plush toys, or statues. We attribute approximately 6% of GoogLeNet errors to this type. See the fifth column of Fig. 3.12. 


2.4 Miscellaneous Sources
Including extreme closeups of parts of an object, unconventional viewpoints, objects with heavy occlusions. See the second column of Fig. 3.12.


3 Error Human is More Susceptible to Than CNN
3.1 Fine-Grained Recognition
Humans are noticeably worse at fine-grained recognition, even when they are in clear view. We estimate that 37% of the human errors fall into this category, while only 7%
of GoogLeNet erros do. See the last column of Fig. 3.12. 


3.2 Class Unawareness 
The annotator may sometimes be unaware of the ground truth class present as a label option. Approximately 24% of the human errors fall into this category.

3.3 Insufficient Training Data
The annotator is only presented with 13 examples of a class under every category name. Approximately 5% of human errors fall into this category.

4 Conclusions
Human accuracy is not a point. It lives on a tradeoff curve. It is clear that humans will soon only be able to outperform state of the art image classification models by use of significant effort, expertise, and time.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值