Neural Networks and Deep Learning阅读笔记(第六章第三节)

最新推荐文章于 2024-07-25 07:37:07 发布

EnEn1998

最新推荐文章于 2024-07-25 07:37:07 发布

阅读量170

点赞数

分类专栏：深度学习文章标签：神经网络机器学习人工智能

本文链接：https://blog.csdn.net/weixin_39929275/article/details/108200642

版权

深度学习专栏收录该内容

6 篇文章 0 订阅

订阅专栏

阅读笔记

Recent progress in image recognition
Other approaches to deep neural nets
不理解的问题

初次编辑于2020.8.24 by EnEn
再次修改于2020.8.25 by EnEn

Recent progress in image recognition

接下来将描述近十年来，图像识别的发展。包括

The 2012 LRMD paper
The 2012 KSH paper
The 2014 ILSVRC competition

The 2012 LRMD paper

LRMD ( the last names of the first four authors) 利用 NN 为来自ImageNet 的图片分类。这个图片集共有 $160 万$ 张全彩图，共分为 $2 万$ 类。最后正确率为 $15.8\%$ (在此之前为ImageNet 的图片分类的最高正确率为 $9.3\%$ )
下列为ImageNet中的图片示例 ImageNet的图片

The 2012 KSH paper

KSH(Krizhevsky, Sutskever and Hinton)利用DCNN为来自ImageNet的restricted子集图片(来自ILSVRC)分类。
Training set: 120万张图片，1000个类别;
Validation set: 5万张图片,1000个类别；
Test set:15万张图片,1000个类别

困难1：
一张图片中包含多种主体
解决方法：
一张图片有5种（最有可能的）分类(top 5 criterion)，如果结果在在5种分类中，则算为正确，正确率为 $84.7\%$ (在此之前为ImageNet 的图片分类的最高正确率为 $73.8\%$ ) 若使用更严格的metric，则正确率仅为 $63.3\%$

困难2：
网络太大，GPU内存不够
解决方法：
KSH在两个GPU上训练DCNN，为此把网络分成了两个部分。

困难3：
ImageNet包含不同的分辨率，而NN的输入层的尺寸是固定的
解决方法：
KSH仅选取每个图像的一部分作为输入。具体操作是在图像的中心选取 $256\times 256$ 的区域（仍然包含原图像的主体）。然后在这区域里随机提取 $224\times 224$ 并进行水平反射。
在这个过程，可以扩展训练数据，减少过度拟合。

KSH‘s Network
图1（下图）KSH’s Network KSH's Network

输入层 $3\times 224\times 224$ 个神经元，其中 $3$ 代表RGB三种颜色通道。裁剪了原图像，使原图像种有 $224\times 224$ 个像素点作为网络的输入。

7层隐含层 前5层隐含层是卷积层（包含最大池化），后两层是全连接层。

第 $1$ 个隐含层（卷积-最大池化层）
- 卷积层
  $11\times 11$ local receptive fields
  stride length $= 4$
  $96$ feature maps (其中，分为两组分别存储在两个GPU，每个GPU包含48个feature maps)
- 最大池化层（allowed to overlap,）
  $3\times 3$ pooling windows
第 $2$ 个隐含层（卷积-最大池化层）
- 卷积层
  $5\times 5$ local receptive fields
  $256$ feature maps (其中，分为两组分别存储在两个GPU，每个GPU包含128个feature maps)
  Note that the feature maps only use 48 input channels, not the full 96 output from the previous layer (as would usually be the case). This is because any single feature map only uses inputs from the same GPU. In this sense the network departs from the convolutional architecture we described earlier in the chapter, though obviously the basic idea is still the same.
- 最大池化层（allowed to overlap,）
  $3\times 3$ pooling windows
第 $3, 4, 5$ 个隐含层（不包含最大池化层）
- 卷积层3 (some inter-GPU communication)
  $3\times 3$ local receptive fields
  $256$ input channels
  $384 f e a t u r e m a p s$
- 卷积层4
  $3\times 3$ local receptive fields
  $192$ input channels
  $384 f e a t u r e m a p s$
- 卷积层5
  $3\times 3$ local receptive fields
  $192$ input channels
  $s t r i d e l e n g t h = ?$
  $256 f e a t u r e m a p s$
第 $6, 7$ 个隐含层（全连接层）
每层有 $4096$ 个神经元

输出层 具有1000个神经元的softMax层，这1000个神经元对应1000个图像分类。

用到的策略

ReLU (speed up)
L2 Regularization (over-fitting)
Drop-out (over-fitting)

The 2014 ILSVRC competition

训练集： $120 万$ 张图片， $1000$ 个类别
分类方法：Top-5 (5 out of 1000 categories)
正确率： $93.33\%$ (GoogLeNet)

GoogLeNet classification error $6.8\%$
My own error $5.1\%$
While sometimes (a smaller sample of images) human expert’s classification error can be much lower than the NN

Other activity

Recognizing street numbers in Google’s Street View imagery (DCNN)
Blind Spots of Deep Networks (adversarial)
下图中，左侧的图是分类正确的图，而最右边则是加了噪声 (slightly perturbed by the middle image)后分类错误的图(被称作adversarial images)。
论文作者使用了连续的激活函数，但是结果看起来在计算中，方程是不连续的。问题需要解决：到底是损失函数，还是激活函数，还是网络的结果，或者是其他的什么导致这样的不连续性。
这些对抗性样本(adversarial images)的结果与神经网络的高泛化性能相矛盾。对抗性样本结果错误的原因可能是对抗性样本的出现率极低，测试集从未观察到，因此测试的时候就容易出错。The explanation is that the set of adversarial negatives is of extremely low probability, and thus is never (or rarely) observed in the test set, yet it is dense (much like the rational numbers), and so it is found near virtually every test case.
根据第三点中提到的论文，还有许多后续。Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images 一文表示训练过的网络产生了可以被正确分类到已知类别的== human like white noise==

Other approaches to deep neural nets

接下来将简要讨论部分本书未提到的内容，包括

Recurrent neural networks (RNNs)
Long short-term memory units (LSTMs)
Deep belief nets, generative models, and Boltzmann machines

Recurrent neural networks (RNNs)

**描述 **
具有时变行为的神经网络称为RNN .
RNN中，隐藏神经元不仅由前一个隐藏层的激活函数决定，还由之前（时间）的激活函数决定。隐藏神经元和输出神经元不仅由当前时刻的网络的输入决定，还由之前（时间）的输入决定。

**应用方向 **
- 语音识别
- 语言模型(有助于消除歧义)

Long short-term memory units (LSTMs)

RNN的困难：梯度不稳定
对于早期模型，学习时，梯度越来越小，学习越来越缓慢。而对于RNN，情况更严重，梯度不仅随着层向后传播，也随着时间向后。
RNN的解决：LSTM
LSTM的目标: 帮助解决梯度不稳定的问题

Deep belief nets, generative models, and Boltzmann machines

DBN：

为什么有趣（优点）
1. 具有生成模型的能力
  在前馈网络，我们指定了输入层的激活函数，而激活函数决定了接下来几层的神经元是否被激活。一个像DBN一样的生成模型也可以指定接下来几层的神经元，然后后反馈 ，为输入层的神经元的值决定值。（例如，识别手写数字图像的 DBN 也可以生成手写数字的图像）
2. 不需要监督或者半监督的学习模式
  DBN可以学习图像上有用的特征去理解其他的（未标注的）图像。
为什么不再流行（缺点）
1. Feedforward and Recurrent nets
  前馈网络和Recurrent网络分别在图像与语音方面取得许多突破。

不理解的问题

以下这句话中 more restrictive metric 是指什么？Using the more restrictive metric of getting the label exactly right, KSH’s network achieved an accuracy of $\%$ .
答：top1到top4中的某一个。原文用的是top-5,而 metric 的意思是 衡量的标准，more restrictive metric即指更严格的衡量标准。
ImageNet包含不同的分辨率和NN的输入层的尺寸是什么关系？这里输入层的尺寸能理解为神经元的数量吗？以及是不是一张图片用一个神经网络判断它属于哪种分类，而其他图片也用同一个神经网络判断其类别。所以这个神经网络的输入层的神经元的数目是固定的，如果分辨率不同，则输入神经元的数目就不同。可以这样子理解吗？ImageNet contains images of varying resolution. This poses a problem, since a neural network’s input layer is usually of a fixed size.
答：神经网络的输入是一个固定长度的向量（向量的元素即神经元的输入），但是图片因为分辨率不同，所以像素的数量也不同，导致输入不同。为了让网络的输入的神经元数目相同，因此需要对输入图片进行缩放，以便使其分辨率与神经网络的输入向量长度匹配。
这句话是什么意思？为什么上一层是96个feature maps(被分别平均储存在两个GPU上)这一层就变成256个feature maps(同样被分别平均储存在两个GPU上)然后这句话里的解释说这些（一个GPU上的128个）feature maps仅有48个通道？The second hidden layer is also a convolutional layer, with a max-pooling step. It uses 5×5 local receptive fields, and there’s a total of 256 feature maps, split into 128 on each GPU. Note that the feature maps only use 48 input channels, not the full 96 output from the previous layer (as would usually be the case). This is because any single feature map only uses inputs from the same GPU. In this sense the network departs from the convolutional architecture we described earlier in the chapter, though obviously the basic idea is still the same
答：256个feature maps 是因为这一层有256个卷积核，每层有多少个feature map取决于该层有多少个卷积核，与上一层有多少个feature map无关。“feature maps仅有48个通道”,不是仅有，而是仅使用，每个GPU卡上的代码只能访问该卡上的数据，上一层一共有96个feature map，有一半，也就是48个存储在单个GPU上，因此每个GPU上的本层的每一个卷积核就只能使用前一层的48个feature map作为输入。
这句话这么理解对吗？对抗性样本结果错误的原因可能是对抗性样本的出现率极低，测试集从未观察到，因此测试的时候就容易出错。The explanation is that the set of adversarial negatives is of extremely low probability, and thus is never (or rarely) observed in the test set, yet it is dense (much like the rational numbers), and so it is found near virtually every test case。以及为什么出现率低就容易犯错？神经网络的训练集和测试集里的图片是不一样的，但是训练的好，测试集的正确率也高。还是adversarial negatives与普通的图片有什么本质的不同吗？adversarial images说是通过原图片slightly perturbed得来的。slightly perturbed是指加上一些噪音吗？
答：这里说的是恶意样本，也是我现在正在研究的一种因神经网络缺陷而无法正确处理的样本。
“测试集从未观察到，因此测试的时候就容易出错” 不对，原因比较复杂，和神经网络的结构和激活函数都有关系，目前学者的较为一致的意见是：神经网络所模拟的函数不够光滑，导致在一些恶意样本（与正常样本很像）的输入会导致输出的剧烈变化，因而输出不符合预期。
“为什么出现率低就容易犯错” 没看懂这个问题。
“adversarial negatives与普通的图片有什么本质的不同吗” 在神经网络的输出上有很大不同，尽管输入很接近。
“adversarial images说是通过原图片slightly perturbed得来的” slightly perturbed是指加上一些噪音吗？”加一种特殊的扰动，该扰动能导致神经网络输出剧烈变化。
问：‘’测试集从未观察到，因此测试的时候就容易出错”这句话是从‘The explanation is that the set of adversarial negatives is of extremely low probability, and thus is never (or rarely) observed in the test set’而其中的explanation是为什么adversarial negatives 的结果是错的的解释。现在再读一遍原文，可能这个解释是adversarial negatives与原图片之间的区别。
“为什么出现率低就容易犯错”这句话是从“adversarial negatives is of extremely low probability”以及了解到的adversarial negatives会导致网络的输出错误的。所以想知道两者（extremely low probability和adversarial negatives的结果是错误）的关系。
此外，我再读了一遍句子， “low probability” 与 “it is dense (much like the rational numbers), and so it is found near virtually every test case”这不是相互矛盾的一句话吗？

EnEn1998

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Neural Networks and Deep Learning阅读笔记(第六章第三节)

阅读笔记Recent progress in image recognitionThe 2012 LRMD paperThe 2012 KSH paperThe 2014 ILSVRC competitionOther activityOther approaches to deep neural netsRecurrent neural networks (RNNs)Long short-term memory units (LSTMs)Deep belief nets, generative model
复制链接

扫一扫