卷积神经网络不考虑偏置么_我们需要重新考虑卷积神经网络

本文探讨了卷积神经网络是否忽视偏置的问题,源自一篇翻译文章,提出了重新审视卷积神经网络中偏置的必要性。
摘要由CSDN通过智能技术生成

卷积神经网络不考虑偏置么

重点(Top highlight)

Convolutional Neural Networks (CNNs) have shown impressive state-of-the-art performance on multiple standard datasets, and no doubt they have been instrumental in the development and research acceleration around the field of image processing.

卷积神经网络(CNN)在多个标准数据集上显示了令人印象深刻的最新性能,毫无疑问,它们在图像处理领域的开发和研究加速中发挥了重要作用。

There’s one problem: they kind of suck.

有一个问题:它们有点烂。

Researchers often have a problem of getting too wrapped in the closed world of theory and perfect datasets. Unfortunately, chasing extra fractions of percentage points on accuracy is actually counterproductive to the real usages of image processing: the real world. When algorithms and methods are designed with the noiseless and perfectly predictable world of a dataset in mind, they very well may perform poorly in the real world.

研究人员经常有一个问题,就是过于笼罩在封闭的理论世界和完美的数据集中。 不幸的是,在精度上追逐额外的百分比实际上与适得其反的图像处理方法相反:现实世界。 当设计算法和方法时要牢记数据集的无噪声且完全可预测的世界时,它们在现实世界中的性能可能会很差。

This has certainly shown to be the case. Convolutional neural networks are especially prone to ‘adversarial’ inputs, or small changes to the input that, unintentionally or intentionally, confuse the neural network.

事实证明确实如此。 卷积神经网络特别容易出现“对抗性”输入,或输入的微小变化,无意或有意使神经网络感到困惑。

Recently in 2020, the cybersecurity firm McAfee showed that Mobileye — the car intelligence system used by Tesla and other auto manufacturers — could be fooled into accelerating 50 MPH over the speed limit just by plastering a strip of black tape two inches wide to a speed limit sign.

不久前,在2020年,网络安全公司McAfee展示了Mobileye(特斯拉和其他汽车制造商使用的汽车智能系统)可能会被愚弄,只需将一条宽约2英寸的黑色胶带涂上一条速度限制,即可将速度提高50 MPH。标志。

Researchers from four universities including the University of Washington and UC Berkeley discovered that road sign recognition models were completely fooled when introduced to a bit of spray paint or stickers on stop signs — all completely natural and non-malicious alterations.

来自包括华盛顿大学和加州大学伯克利分校在内的四所大学的研究人员发现,将道路标志识别模型引入停车位标志上的一些喷漆或贴纸后,会被完全愚弄,这些都是完全自然和非恶意的改动。

Even more importantly, convolutional neural networks are really bad at generalizing across shifts and rotations in images, not to mention different angles of three-dimensional viewing.

更为重要的是,卷积神经网络确实不擅长对图像中的位移和旋转进行概括,更不用说三维视图的不同角度了。

Image for post
Source: Evtimov et al. Image free to share.
资料来源:Evtimov等。 图片免费分享。

To understand why convolutional neural networks have so much trouble generalizing across perspectives and angles, one must first understand why a convolutional neural network works at all — and what is so special about convolutional and pooling layers.

要了解为什么卷积神经网络在跨角度和角度进行泛化时会有如此大的麻烦,所以必须首先了解为什么卷积神经网络完全起作用,以及卷积和池化层有何特别之处。

Since convolutional layers apply the same filter across the entire image — which can be thought of as some sort of ‘feature detector’ that looks for lines or other features — it is translation invariant (not affected by translation). Regardless if an object appears in the top left or the bottom right, it will be detected because filters cross the entire image. Pooling helps ‘summarize’ the findings within each region for further smoothing. With convolutional and pooling layers, objects can be still detected in different regions, or even with slight tilting and scaling.

由于卷积层在整个图像上应用了相同的过滤器(可以将其视为寻找线条或其他特征的某种“特征检测器”),因此它是平移不变的(不受平移影响)。 无论对象是出现在左上角还是右下角,都将被检测到,因为滤镜会穿过整个图像。 合并有助于“汇总”每个区域内的结果,以进一步平滑。 使用卷积层和池化层,仍可以在不同区域甚至轻微倾斜和缩放时都可以检测到对象。

Image for post
HackerNoon. Image free to share. HackerNoon 。 图片免费分享。

On the other hand, filters wouldn’t be able to capture scaling. The red box represents the filter that produces a high activation when it comes across a bird. In a scaled image, no position of the filter can produce a high activation since the filter’s size is limited.

另一方面,过滤器将无法捕获缩放比例。 红色框表示当遇到鸟时产生高激活的过滤器。 在缩放的图像中,由于滤镜的大小受到限制,滤镜的任何位置都不会产生高激活。

Image for post
Pixabay Pixabay

The same applies with rotation. A filter is simply a matrix of weights that produces high values if certain pixels are certain values in relationship with other values. Because filters are fixed and move in a top-down left-right movement, it does not recognize images that are not viewed from the same axis orientation.

旋转也是如此。 过滤器只是权重矩阵,如果某些像素是与其他值相关的某些值,则该矩阵会产生较高的值。 因为滤镜是固定的,并且以自上而下的左右移动方式移动,所以它无法识别从同一轴方向未观看的图像。

Image for post
Pixabay Pixabay

The standard way to deal with this is by using data augmentation, but it’s not a great solution either. The convolutional neural network simply memorizes that an object can also appear in that approximate orientation and size, instead of necessarily generalizing to all viewpoint. It is practically infeasible to expose the network to all viewpoints of all objects.

解决此问题的标准方法是使用数据增强,但这也不是一个好的解决方案。 卷积神经网络只是简单地记住一个对象也可以以近似的方向和大小出现,而不一定要泛化到所有视点。 将网络暴露给所有对象的所有视点实际上是不可行的。

Another method to deal with this is by using higher-dimensional maps, which are of course incredibly inefficient.

解决此问题的另一种方法是使用高维地图,这当然效率极低。

Geoff Hinton describes CNNs as trying to model invariance — neural activities are pooled, or smoothed, in order to be invariant to small changes. “This is”, he writes, “the wrong goal. It is motivated by the fact that the final label needs to be viewpoint-invariant.” Instead, he proposes an aim for equivariance — neural activities change depending on changes on viewpoint. Weights code the invariant knowledge of a shape, not filters of activations.

杰夫·欣顿(Geoff Hinton)将CNN描述为试图为不变性建模-合并或平滑化神经活动,以使微小变化不变。 他写道,“这是错误的目标。 最终标签必须是视点不变的事实,这是出于动机。” 相反,他提出了一个同变性目的-神经活动的变化取决于对观点的变化。 权重编码形状的不变知识,而不是激活过滤器。

Additionally, CNNs parse an image as a whole entity, and not as the composition of several objects. There is no explicit representation of different entities and their relationships, which means it is not robust to objects it has not seen before. This also means it takes a brute-force way to image recognition, memorizing richer and more detailed representations of an image for each individual pixel, instead of looking at its parts (e.g. thin tires + frame + handles = bike).

此外,CNN会将图像解析为一个整体,而不是解析为几个对象的组合。 没有对不同实体及其关系的明确表示,这意味着它对于以前从未见过的对象并不稳健。 这也意味着它采用了一种蛮力的方式来进行图像识别,为每个单独的像素存储了更丰富,更详细的图像表示,而不是查看其各个部分(例如,细轮胎+车架+手柄=自行车)。

Much of this is because convolutional neural networks don’t recognize images like humans do. Yes, under the perfect environments of standard datasets or with simple tasks where rotation and shift are not common or important, CNNs perform well. However, as we move to demand more from image processing, we need an update.

这主要是因为卷积神经网络无法像人类那样识别图像。 是的,在标准数据集的理想环境下,或者在旋转和移位不常见或不重要的简单任务下,CNN表现良好。 但是,随着我们对图像处理的要求越来越高,我们需要进行更新。

One approach towards addressing the invariance problem is through a Spatial Transformer, which defines the axes and image bounds before prediction. This can help correct scaling (first row) and rotation (second row) imbalances, as well as noise (third row) through attention mechanisms.

解决不变性问题的一种方法是通过空间变压器,该变压器在预测之前定义轴和图像边界。 这可以通过注意机制帮助纠正缩放(第一行)和旋转(第二行)不平衡以及噪声(第三行)。

Image for post
Spatial Transformer Networks. Image free to share. 空间变压器网络。 图片免费分享。

In fact, it can undo complex distortions, which is incredibly valuable given the complex nature of three-dimensional viewpoint that go beyond rotation and scaling transformations.

实际上,它可以消除复杂的失真,鉴于三维视点的复杂性质超出了旋转和缩放变换的范围,这是非常有价值的。

Image for post
Spatial Transformer Networks. Image free to share. 空间变压器网络。 图片免费分享。

Several other architectures, like the Scale-invariant Convolutional Neural Network (SiCNN), have been recently proposed.

最近已经提出了其他几种架构,例如尺度不变卷积神经网络(SiCNN)

More famously, Geoff Hinton proposes a capsule network, which explicitly builds the idea of recognizing individual parts — which he argues is the natural, human method of recognition — through hierarchies.

更著名的是,杰夫·辛顿(Geoff Hinton)提出了一个胶囊网络,该网络明确建立了通过层次结构识别各个部分的想法,他认为这是自然的,人为的识别方法。

Hinton points out that the task of computer vision is really inverse computer graphics. Graphics programs use hierarchical models that compute the spatial structure based on position-invariant matrices. Viewpoints are simply a matrix multiplication.

欣顿指出,计算机视觉的任务实际上是逆向计算机图形学。 图形程序使用层次模型,这些模型基于位置不变矩阵计算空间结构。 观点只是矩阵乘法。

Thus, it should be the goal of an image recognition network to find the connection between a viewpoint representation and the ‘intrinsic’ object representation, which is the same regardless of viewpoint.

因此,图像识别网络的目标应该是找到视点表示与“本征”对象表示之间的连接,无论视点如何,连接都是相同的。

Each capsule is delegated to one of these intrinsic objects, and recognizes them regardless of the angle they are viewed at by forcing the model to learn feature variances. This leads to better extrapolation, which means that image models begin to truly generalize well across all viewpoints of the camera.

每个胶囊都委派给这些固有对象之一,并通过强制模型学习特征变化来识别它们,而与观察它们的角度无关。 这将导致更好的外推,这意味着图像模型开始在相机的所有视点上真正地得到很好的概括。

Capsules encode spatial information and only proceed with ‘routing by agreement’, meaning that the network only sends lower-level features like eyes, noses, and lips to higher-level layers if their contents are similar.

胶囊对空间信息进行编码,并且仅以“通过协议路由”进行处理,这意味着网络仅在内容相似的情况下才将眼睛,鼻子和嘴唇之类的较低级特征发送到较高层。

Obviously, this is a completely different paradigm than convolutional neural networks. Perhaps, however, it is this shift in approaching image recognition that is necessary to move past the days of designing for datasets and instead towards implementing more intelligent and robust models that perform better on increasingly complex and in-demand real-world tasks.

显然,这是与卷积神经网络完全不同的范例。 但是,也许是在逼近图像识别方面的这种转变,对于过去的数据集设计日渐重要,而朝着实现更智能,更强大的模型的方向发展是必要的,这些模型在日益复杂和需求量大的现实世界任务中表现更好。

翻译自: https://towardsdatascience.com/we-need-to-rethink-convolutional-neural-networks-ccad1ba5dc1c

卷积神经网络不考虑偏置么

  • 2
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值