Capsule Networks胶囊网络（一）

最新推荐文章于 2024-01-11 15:50:58 发布

_Summer tree

最新推荐文章于 2024-01-11 15:50:58 发布

阅读量1.2k

点赞数

分类专栏：人工智能深度学习文章标签：神经网络人工智能胶囊网络 capsule 计算机视觉

本文链接：https://blog.csdn.net/NGUever15/article/details/105441682

版权

深度学习同时被 2 个专栏收录

40 篇文章 7 订阅

订阅专栏

人工智能

21 篇文章 2 订阅

订阅专栏

原文链接：小样本学习与智能前沿
在这里插入图片描述

author: Sargur Srihari srihari@buffalo.edu

This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/CSE676

Limitations of Convolutional Networks

ConvolutionalNeuralNetworks

在这里插入图片描述

Source: https://hackernoon.com/ what-is-a-capsnet-or-capsule- network-2bfbe48769cc

与常规神经网络相比，将计算量最小化
卷积极大地简化了计算，而不会丢失数据的本质
擅长处理图像分类
在所有图像位置使用相同的知识

Processing Steps and Training for ConvNets

Givenaninputimage,asetofkernelsorfiltersscan it and perform the convolution operation.
This creates a feature map inside the network.
These features next pass via activation and pooling layers
• Activation layers, e.g., ReLU, induce nonlinearity
• Pooling (eg: max pooling) helps in reducing the training time.
（pooling实现子区域的摘要，实现不变性）
At the end, it will pass via a classifier sigmoid/softmax
Training is based on back propagation（反向传播） of error matched against labeled data.
（非线性也有助于解决消失的梯度问题）

Pooling and Invariance

（池化和不变性）
Pooling应该获得位置，方向，比例或旋转不变性。
在这里插入图片描述
Every input value changed, but only half the output values have changed because maxpool is only sensitive to max value in neighborhood not exact value.

Example of CNN Limitation

CNN to recognize faces extracts features from image

在这里插入图片描述
与顺序无关，位置不对CNN也能进行识别

Motivation for CapsNets

Caps nets are an improvement on CNNs

They are the next version of CNNs
Solve problems due to max pooling and deep nets
Loss of information regarding order and feature orientation
Hinton: “The pooling operation used in CNNs is a big mistake and the fact that it works so well is a disaster”

Solution offered by CapsNets

Low level features should also be arranged in a certain order for the object to be classified as a face
（排序低级特征）
Order is determined during training when the network learns not only what features to look for but also what their relationships to one another should be （顺利由训练决定，不仅学习特征，还要学习特征之间的关系）
具有特征顺序特征的图像才会被识别为人脸。

Visual Fixation

（视觉固定）

Human vision uses saccades

（人类视觉使用扫视）

通过仔细的固定顺序忽略无关的细节
确保仅以最高的分辨率处理光学阵列的一小部分

We assume a single fixation will give us
• Much more than a single identified object and its properties
• Assume our multi layer visual system creates a parse tree on each fixation
• We ignore coordination of parse trees（解析树） over multiple fixations
在这里插入图片描述

Parse Tree of a Fixation

对于单个注视，
从固定的多层神经网络中刻出一个分析树
像岩石上的雕塑
每层将被分成许多小的神经元组，称为“胶囊”
解析树中的每个节点将对应一个活动胶囊

Activation is a likelihood

神经元的激活水平可以解释为检测到特定特征的可能性
在这里插入图片描述
胶囊是一组神经元，不仅捕获可能性，而且捕获特定特征的参数。

CNN versus CapsNets

max pooling layers 获取图片的重要特征，但丢失了特征的结构取向
CNN 只检测特征是否存在，而不考虑位置

Capsnets replace scalar-output feature detectors with vector-output capsules and max-pooling with routing- by-agreement.
Capsnet用向量输出封装代替标量输出特征检测器，用按协议路代替最大池化。