In 1998, Yann LeCun and Yoshua Bengio introduced what is now one of the most popular models in Deep Learning, “Convolutional Neural Networks”[1](CNNs). A simple grid-like topology can help solve time-series problems using a 1D convolution or can work on image data treating it as a 2D grid. Fast forward to 2017, and here stands Geoffrey Hinton with his own improved version of Convolutional Networks, “Capsule Networks”[2]. A simple question arises, “Why replace something that already works so well?”. Well, maybe CNNs don’t really work as well as they could. Here are a few reasons why:
1998年,Yann LeCun和Yoshua Bengio推出了如今在深度学习中最受欢迎的模型之一,即“卷积神经网络” [CNN]。 一个简单的网格状拓扑可以使用一维卷积帮助解决时间序列问题,或者可以处理将其视为2D网格的图像数据。 快进到2017年,杰弗里·欣顿(Geoffrey Hinton)展示了他自己改进的卷积网络版本“胶囊网络” [2]。 出现一个简单的问题,“为什么要替换已经很好用的东西?”。 好吧,也许CNN并没有真正发挥出应有的作用。 原因如下:
Reason 1: The basic working of Convolutional Neural Networks is that each convolutional layer extracts features sequentially. The initial layers extract the most blatant features and as the model progresses it learns to extract more features. As a result, convolutional layers do not encode spatial relations.
原因1:卷积神经网络的基本工作是每个卷积层都按顺序提取特征。 初始层提取最明显的特征,并且随着模型的进行,它将学习提取更多特征。 结果,卷积层不编码空间关系。
Let me illustrate this with an example shown in Figure 1.
让我用图1所示的示例进行说明。
Convolutional Networks would recognise both the images as faces.
卷积网络会将这两个图像识别为人脸。
CNNs do not encode pose and angular information.
CNN不对姿势和角度信息进行编码。
As a result, CNNs would categorise this as a face as well. CNNs are also incapable of recognising orientation and Max Pooling does not provide “ViewPoint Invariance”.
结果,CNN也将其分类为面部。 CNN也无法识别方向,并且最大池化不提供“视点不变性”。
Example: SmallNORB images (Figure 3.) from different viewpoints. The dataset contains images of the objects captured from different angles, and the objective is to identify the object with a model trained on the same images viewed from another viewpoint. Capsule Networks trained to detect objects in this database increased the accuracy by 45% over traditional CNN models. The dataset involves the same object shot from different viewpoints. Because the Capsule Networks has the added advantage of “ViewPoint Invariance”, it tends to classify these images of the same object in different orientations better.
示例:来自不同视点的SmallNORB图像(图3.)。 数据集包含从不同角度捕获的对象的图像,目的是通过对从另一个视点观看的相同图像训练的模型来识别对象。 与传统的CNN模型相比,受过培训的可检测该数据库中对象的胶囊网络将准确性提高了45%。 数据集涉及从不同角度拍摄的相同对象。 由于胶囊网络具有“视点不变性”的附加优点,因此它倾向于更好地对不同方向上的同一对象的这些图像进行分类。
Reason 2: Highly inefficient Pooling layers
原因2:效率极低的池化层
In the process of Max Pooling (Figure 4), lot of important information is lost because only the most active neurons are chosen to be moved to the next layer. This operation is the reason that valuable spatial information gets lost between the layers. To solve this issue, Hinton proposed that we use a process called “routing-by-agreement”. This means that lower level features (fingers, eyes, mouth) will only get sent to a higher level layer that matches its contents. If the features it contains resemble that of an eye or a mouth, it will get to a “face” or if it contains fingers and a palm, it will get sent to a “hand”. This complete solution that encodes spatial info into features while also using dynamic routing (routing by agreement) was presented by Geoffrey Hinton, at NIPS 2017; Capsule Networks.
在最大池化过程中(图4),由于仅选择了最活跃的神经元移动到下一层,因此丢失了许多重要信息。 该操作是有价值的空间信息在层之间丢失的原因。 为了解决这个问题,欣顿建议我们使用一种称为“按协议路由”的流程。 这意味着较低级别的要素(手指,眼睛,嘴巴)只会发送到与其内容匹配的较高级别的图层。 如果其中包含的特征类似于眼睛或嘴巴,则将变为“脸部”,如果包含手指和手掌,则将其变为“手”。 Geoffrey Hinton在NIPS 2017上提出了这种完整的解决方案,该解决方案将空间信息编码为要素,同时还使用动态路由(按协议路由)。 胶囊网络。
What are capsules?
什么是胶囊?
Capsule Networks are a new class of networks that rely more on modelling the hierarchical relationships in understanding an image to mimic the way a human brain learns. This is completely different from the approach adopted by traditional neural networks.
胶囊网络是一类新型的网络,在理解图像以模仿人脑学习方式时,它更多地依赖于建模层次关系。 这与传统神经网络所采用的方法完全不同。
A traditional neuron in a neural net performs the following scalar operations:
神经网络中的传统神经元执行以下标量运算:
- Weighting of inputs 输入权重
- Sum of weighted inputs加权输入总和
- Nonlinearity (Activation)非线性(激活)
Capsules have the following steps:
胶囊具有以下步骤:
- Matrix multiplication of input vectors with weight matrices. This encodes important spatial relationships between low-level features and high-level features within the image. 输入向量与权重矩阵的矩阵相乘。 这对图像中低层特征和高层特征之间的重要空间关系进行了编码。
Weighting input vectors. These weights decide which higher level capsule the current capsule will send its output to. This is done through a process of dynamic routing.
加权输入向量。 这些权重决定了当前容器将其输出发送到哪个更高级别的容器。 这是通过动态路由过程完成的。
- Sum of weighted input vectors. 加权输入向量的总和。
- Nonlinearity using “squash” function. This function takes a vector and “squashes” it to have a maximum length of 1, and a minimum length of 0 while retaining its direction. 非线性使用“壁球”功能。 此函数采用一个向量并将其“压扁”,使其最大长度为1,最小长度为0,同时保持其方向。
Dynamic Routing:
动态路由:
In this process of routing, lower level capsules send its input to higher level capsules that “agree” with its input. For each higher capsule that can be routed to, the lower capsule computes a prediction vector by multiplying its own output by a weight matrix. If the prediction vector has a large scalar product with the output of a possible higher capsule, there is top-down feedback which has the effect of increasing the coupling coefficient for that high-level capsules and decreasing it for others.
在此路由过程中,较低级别的容器将其输入发送到“同意”其输入的较高级别的容器。 对于可以路由到的每个较高的胶囊,较低的胶囊通过将其自身的输出乘以权重矩阵来计算预测向量。 如果预测向量的标量乘积较大,可能输出更高的胶囊,则存在自上而下的反馈,这具有增加该高水平胶囊的耦合系数并降低其他胶囊的耦合系数的效果。
This brings us to using Capsules in the real world.
这使我们能够在现实世界中使用胶囊。
Do capsules effectively replace convolutional layers in other models such as Segmentation and Object detection?
胶囊是否可以有效替代其他模型(例如分段和对象检测)中的卷积层?
Let’s look at segmentation using Capsules.
让我们看看使用胶囊的细分。
We especially focus on the SegCaps model based on a UNet and an application in the ophthalmology field.
我们特别关注基于UNet的SegCaps模型及其在眼科领域的应用。
We can change the convolutional layers with capsule layers and create a new architecture. The main advantage here is that we can encode spatial relations for Semantic segmentation. The architecture is given below(Fig 6.).
我们可以使用胶囊层更改卷积层并创建新的体系结构。 这里的主要优点是我们可以为语义分割编码空间关系。 架构如下(图6)。
An advantage is the removal of heavy backbones. We achieve a large decrease in the number of trainable parameters. A regular U-Net has 31.1 M parameters but the SegCaps has only 1.3M parameters. However dynamic routing requires iterative updation and thus we do have some increase in complexity.
一个优势是可以去除沉重的主干网。 我们大大减少了可训练参数的数量。 常规U-Net具有31.1 M个参数,而SegCaps仅具有1.3M个参数。 但是,动态路由需要迭代更新,因此复杂度确实有所增加。
We test this model out on the OCTAGON dataset[3] for OCT-A scans to segment and generate the Foveal Avascular Zone or FAZ area[5].
我们在OCTAGON数据集[3]上测试该模型以进行OCT-A扫描,以分割并生成小凹无血管区域或FAZ区域[5]。
Results
结果
While the images seem very similar, in our observation Capsule Networks seem to be more sensitive to irregularities as compared to a normal U-Net while having a considerably lighter model and completely removing the need for a heavy backbone.
尽管图像看起来非常相似,但在我们的观察中,与普通的U-Net相比,胶囊网络似乎对不规则更为敏感,同时其模型更轻巧,并且完全消除了对笨重骨架的需求。
We compared our model performance with a U-Net over a test set of 14 images and the results are as shown in Table 1. We use a common metric used to evaluate segmentation models called Dice Coefficient (Dice Coefficient is 2 X the Area of Overlap divided by the total number of pixels in both images). As seen in Table 1, The Dice Coefficient for Capsule Networks is 0.84 compared to that of 0.7 for CNNs, indicating better results with Capsule Networks.
我们在14个图像的测试集上将模型性能与U-Net进行了比较,结果如表1所示。我们使用一种通用的度量标准来评估分割模型,称为骰子系数(骰子系数为2 X重叠面积)除以两张图片中的像素总数)。 如表1所示,胶囊网络的骰子系数为0.84,而CNN的骰子系数为0.7,表明胶囊网络的结果更好。
Summary
概要
Although Convolutional Neural Networks(CNNs) work satisfactorily, there are a few limitations in the architecture. Capsule Networks aim to solve these by providing a novel approach. Capsules and their ability to encode spatial relations could be the next big thing in Computer Vision as it seems to be of great purpose in pose related models.
尽管卷积神经网络(CNN)可以令人满意地工作,但该体系结构仍有一些限制。 胶囊网络旨在通过提供一种新颖的方法来解决这些问题。 胶囊及其对空间关系进行编码的能力可能是Computer Vision中的下一件大事,因为在姿势相关模型中似乎很有用。
Further, with ever growing model sizes and an ever rising number of trainable parameters, it is necessary to limit the increase and focus more on practicality and adhering to the current hardware limits. Capsule Networks are a step in the right direction towards building truly intelligent Deep Learning models.
此外,随着模型尺寸的不断增长和可训练参数的数量不断增长,有必要限制这种增长,并更多地关注实用性并遵守当前的硬件限制。 胶囊网络是朝着建立真正智能的深度学习模型的正确方向迈出的一步。
Please do visit our Apps & Demos : https://onestop.ai
请访问我们的应用程序和演示: https : //onestop.ai
For further information, please contact: info@algoanalytics.com
欲了解更多信息,请联系: info@algoanalytics.com
Object Recognition with Gradient Based Learning by Yann LeCun http://yann.lecun.com/exdb/publis/pdf/lecun-99.pdf
Yann LeCun的基于梯度学习的对象识别http://yann.lecun.com/exdb/publis/pdf/lecun-99.pdf
Dynamic Routing between Capsules by Geoffrey Hinton https://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.pdf
胶囊之间的动态路由,作者Geoffrey Hinton https://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.pdf
Octagon Dataset by VARPA group http://www.varpa.es/research/ophtalmology.html
VARPA组的八边形数据集http://www.varpa.es/research/ophtalmology.html
Capsules for Object Segmentation by Rodney LaLonde https://arxiv.org/abs/1804.04241
Rodney LaLonde的用于对象细分的胶囊https://arxiv.org/abs/1804.04241
Foveal Avascular Zone by Bryan Willian Jones https://webvision.med.utah.edu/2011/08/foveal-avascular-zone/
中央凹无血管区Bryan Willian Jones https://webvision.med.utah.edu/2011/08/foveal-avascular-zone/
Code available at: https://github.com/kevins99/SegCaps-Keras
代码可在以下网址获得: https : //github.com/kevins99/SegCaps-Keras
翻译自: https://medium.com/algoanalytics/capsule-networks-what-they-are-and-its-applications-4dfd957a2ed9