Deep Learning 学习 Toolbox学习记录一 CNN例子的学习理解

Deep Learning 学习 Toolbox学习记录一 CNN例子的学习理解


      结合《Note on convolutional neutral networks》

一、卷积层

0、CNN中卷积核k与普通神经网络权值W的关系

普通神经网络中的输入a与权值W的乘积,可以表示成a*W',向量a和W中对应元素乘积再求和。与普通神经网络不同的是,CNN中输入a和权值W都是以方阵形式表示,Toolbox中a和W是5x5方阵。要实现a和W中元素一 一对应相乘求和,可以将a和W展开成向量相乘,也可以应用卷积运算来实现。但是由于卷积运算会把乘数矩阵翻转180°,则必须将所需要的权值W元素翻转,构造成卷积核k,再将a与k卷积,这样就实现了a和W中元素一 一对应相乘。看似两个方法计算量差不多,但CNN输入图像远大于感受野大小(与卷积核大小相同),向量乘积的运算会变的非常复杂(需要提取相同大小的块,reshape成向量相乘),而卷积运算不受原图大小限制,能够自动移位计算,得到了极大的简化。

CNN的权值共享概念就是针对同一幅图,神经网络的权值都是同一个卷积核,一个卷积核可以实现整幅图的一种特征提取。

理解:CNN要求实现神经网络的权值共享和感受野,就要把两个方阵(权值矩阵和图像的patch)各个元素对应相乘,并且乘数矩阵(即权值矩阵)可以自动平移和原图其他部分也相乘,最好办法就是使用卷积运算。只要在卷积运算之前,将乘数矩阵(权值矩阵)rot180°得到卷积核,例如[1,2;3,4]rot180后变成[4,3;2,1],然后去卷积原图,就相当与“扫描”过去似的。rot180相当于向量转置一样,把矩阵元素重新排列了。

1、CNN的卷积不是数学意义上的卷积,而是Matlab的conv2的valid型

5x5经过3x3的卷积kernel卷积后得到3x3

2、对于同一个输入map,它连接到不同输出map所采用的卷积核是不同的。

这里可以结合toolbox理解

原文:however for a particular output map, the input maps will be convolved with distinct

kernels. That is to say, if output mapj and mapkboth sum over input mapi, then the kernels

applied to mapiare different for output mapsjandk.

3、反向传播=卷积层的残差计算delta,进而求取梯度

卷积层C1的上一层是下采样层S2,如何根据采样层S2的残差d(2)计算卷积层C1的残差d(1)?

关键在于如何处理卷积层特征图大小(28x28)与下采样层特征图大小(14x14)不同的问题。

Note中采用了一种“上采样”的概念,将下采样层S2特征图扩展,使得其大小和卷积层C1特征图相同。原文:

To compute the sensitivities at layer `efficiently, we can upsamplethe downsampling layer’s sensitivity

map to make it the same size as the convolutional layer’s map.

接下来求残差δ的过程和神经网络差不多

①定义δ


②输出层δ


③反向传播其他各层δ


④CNN的卷积层的δ计算

由于C1层到S2层连接的神经网络只有一个单元及一条连接,(如下图Wx+1),即③中的权值W只是一维的常数。

β表示权值,up()表示上采样。toolbox中β=1。

⑤求卷积层C1参数卷积核k、偏移b的修正量Δk、Δb,W是权值=rot180(k),x是上一层的输出。

正如0点所述,卷积运算是将运算对象翻转180°再对应相乘,这里要根据C1层的残差δ来计算出修正量Δk、Δb。

首先,将δ矩阵rot180°,与x卷积得到δ与x元素一 一对应的乘积,即得到了权值修正量ΔW,要得到Δk需将ΔW翻转180°,实际运算如下。

原文Here we rotate theδimage in order to perform cross-correlation rather than convolution, and rotate

the output back so that when we perform convolution in the feed-forward pass, the kernel will have

the expected orientation.

二、采样层

1、采样层残差


2、采样层权值修正量



由于大小不同,需要将卷积层C的输出进行下采样再计算。

Toolbox 里边没有更新权值,这和它模型设置相关,其下采样层只是简单的将4像素求和平均,没有引入神经网络,也就没有参数更新的问题。

三、整个网络输出层

0、首先明确CNN输出层是一个12x(4x4)单元,而整个网络输出长度为10的向量(与标签y相同,向量的10个元素表示数字0~9)。

1、特征向量

Toolbox里边CNN网络最后一层输出有12个4x4的特征图,即CNN将原图28x28=784降维成12x4x4=192维。然而192维特征还是太大了,再设计一个简单神经网络,其输入层为192,输出层为10(这是因为标签是长为10的向量)。这样就得到了原图的特征向量,与标签y对比可以得到2中的残差od。

2、残差计算

最终输出长为10的向量,与标签向量y相减得到误差e,进而计算残差net.od = net.e .* (net.o .* (1 - net.o)); 这里可以根据普通神经网络计算方法得到。同样的根据普通神经网络的残差计算,得到该神经网络输入层向量化的残差net.fvd = (net.ffW' * net.od) *f'(*); 其中f'(*)是sigmoid激活函数的导数,f'(*)= net.fv .* (1 - net.fv)。

关键在于必须将残差fvd反向传播到CNN各层中去,这就必须将残差fvd进行reshape,将该向量转换成12*{4x4}单元——即得到CNN输出层的残差d(n)。

这样就可以根据一、二中C层S层残差计算方法得出CNN各层的残差d(i)。

3、权值修正量

方法同普通神经网络。

四、分类器设计

Toolbox只是单纯的提取出CNN输出特征向量的最大值在向量里的位置,位置对应于10个数字0~9,如向量L1=[0 0.9 0.1 0.2 0 0 0 0 0 0 ],0.9最大,在第二个位置,分类到数字1的类中。

当然也可以进行其他分类器设计,CNN的主要任务是进行特征提取,对应分类器设计没有要求。



深度学习工具包 Deprecation notice. ----- This toolbox is outdated and no longer maintained. There are much better tools available for deep learning than this toolbox, e.g. [Theano](http://deeplearning.net/software/theano/), [torch](http://torch.ch/) or [tensorflow](http://www.tensorflow.org/) I would suggest you use one of the tools mentioned above rather than use this toolbox. Best, Rasmus. DeepLearnToolbox ================ A Matlab toolbox for Deep Learning. Deep Learning is a new subfield of machine learning that focuses on learning deep hierarchical models of data. It is inspired by the human brain's apparent deep (layered, hierarchical) architecture. A good overview of the theory of Deep Learning theory is [Learning Deep Architectures for AI](http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf) For a more informal introduction, see the following videos by Geoffrey Hinton and Andrew Ng. * [The Next Generation of Neural Networks](http://www.youtube.com/watch?v=AyzOUbkUf3M) (Hinton, 2007) * [Recent Developments in Deep Learning](http://www.youtube.com/watch?v=VdIURAu1-aU) (Hinton, 2010) * [Unsupervised Feature Learning and Deep Learning](http://www.youtube.com/watch?v=ZmNOAtZIgIk) (Ng, 2011) If you use this toolbox in your research please cite [Prediction as a candidate for learning deep hierarchical models of data](http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6284) ``` @MASTERSTHESIS\{IMM2012-06284, author = "R. B. Palm", title = "Prediction as a candidate for learning deep hierarchical models of data", year = "2012", } ``` Contact: rasmusbergpalm at gmail dot com Directories included in the toolbox ----------------------------------- `NN/` - A library for Feedforward Backpropagation Neural Networks `CNN/` - A library for Convolutional Neural Networks `DBN/` - A library for Deep Belief Networks `SAE/` - A library for Stacked Auto-Encoders `CAE/` - A library for Convolutional Auto-Encoders `util/` - Utility functions used by the libraries `data/` - Data used by the examples `tests/` - unit tests to verify toolbox is working For references on each library check REFS.md Setup ----- 1. Download. 2. addpath(genpath('DeepLearnToolbox')); Example: Deep Belief Network --------------------- ```matlab function test_example_DBN load mnist_uint8; train_x = double(train_x) / 255; test_x = double(test_x) / 255; train_y = double(train_y); test_y = double(test_y); %% ex1 train a 100 hidden unit RBM and visualize its weights rand('state',0) dbn.sizes = [100]; opts.numepochs = 1; opts.batchsize = 100; opts.momentum = 0; opts.alpha = 1; dbn = dbnsetup(dbn, train_x, opts); dbn = dbntrain(dbn, train_x, opts); figure; visualize(dbn.rbm{1}.W'); % Visualize the RBM weights %% ex2 train a 100-100 hidden unit DBN and use its weights to initialize a NN rand('state',0) %train dbn dbn.sizes = [100 100]; opts.numepochs = 1; opts.batchsize = 100; opts.momentum = 0; opts.alpha = 1; dbn = dbnsetup(dbn, train_x, opts); dbn = dbntrain(dbn, train_x, opts); %unfold dbn to nn nn = dbnunfoldtonn(dbn, 10); nn.activation_function = 'sigm'; %train nn opts.numepochs = 1; opts.batchsize = 100; nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.10, 'Too big error'); ``` Example: Stacked Auto-Encoders --------------------- ```matlab function test_example_SAE load mnist_uint8; train_x = double(train_x)/255; test_x = double(test_x)/255; train_y = double(train_y); test_y = double(test_y); %% ex1 train a 100 hidden unit SDAE and use it to initialize a FFNN % Setup and train a stacked denoising autoencoder (SDAE) rand('state',0) sae = saesetup([784 100]); sae.ae{1}.activation_function = 'sigm'; sae.ae{1}.learningRate = 1; sae.ae{1}.inputZeroMaskedFraction = 0.5; opts.numepochs = 1; opts.batchsize = 100; sae = saetrain(sae, train_x, opts); visualize(sae.ae{1}.W{1}(:,2:end)') % Use the SDAE to initialize a FFNN nn = nnsetup([784 100 10]); nn.activation_function = 'sigm'; nn.learningRate = 1; nn.W{1} = sae.ae{1}.W{1}; % Train the FFNN opts.numepochs = 1; opts.batchsize = 100; nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.16, 'Too big error'); ``` Example: Convolutional Neural Nets --------------------- ```matlab function test_example_CNN load mnist_uint8; train_x = double(reshape(train_x',28,28,60000))/255; test_x = double(reshape(test_x',28,28,10000))/255; train_y = double(train_y'); test_y = double(test_y'); %% ex1 Train a 6c-2s-12c-2s Convolutional neural network %will run 1 epoch in about 200 second and get around 11% error. %With 100 epochs you'll get around 1.2% error rand('state',0) cnn.layers = { struct('type', 'i') %input layer struct('type', 'c', 'outputmaps', 6, 'kernelsize', 5) %convolution layer struct('type', 's', 'scale', 2) %sub sampling layer struct('type', 'c', 'outputmaps', 12, 'kernelsize', 5) %convolution layer struct('type', 's', 'scale', 2) %subsampling layer }; cnn = cnnsetup(cnn, train_x, train_y); opts.alpha = 1; opts.batchsize = 50; opts.numepochs = 1; cnn = cnntrain(cnn, train_x, train_y, opts); [er, bad] = cnntest(cnn, test_x, test_y); %plot mean squared error figure; plot(cnn.rL); assert(er<0.12, 'Too big error'); ``` Example: Neural Networks --------------------- ```matlab function test_example_NN load mnist_uint8; train_x = double(train_x) / 255; test_x = double(test_x) / 255; train_y = double(train_y); test_y = double(test_y); % normalize [train_x, mu, sigma] = zscore(train_x); test_x = normalize(test_x, mu, sigma); %% ex1 vanilla neural net rand('state',0) nn = nnsetup([784 100 10]); opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples [nn, L] = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.08, 'Too big error'); %% ex2 neural net with L2 weight decay rand('state',0) nn = nnsetup([784 100 10]); nn.weightPenaltyL2 = 1e-4; % L2 weight decay opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex3 neural net with dropout rand('state',0) nn = nnsetup([784 100 10]); nn.dropoutFraction = 0.5; % Dropout fraction opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex4 neural net with sigmoid activation function rand('state',0) nn = nnsetup([784 100 10]); nn.activation_function = 'sigm'; % Sigmoid activation function nn.learningRate = 1; % Sigm require a lower learning rate opts.numepochs = 1; % Number of full sweeps through data opts.batchsize = 100; % Take a mean gradient step over this many samples nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex5 plotting functionality rand('state',0) nn = nnsetup([784 20 10]); opts.numepochs = 5; % Number of full sweeps through data nn.output = 'softmax'; % use softmax output opts.batchsize = 1000; % Take a mean gradient step over this many samples opts.plot = 1; % enable plotting nn = nntrain(nn, train_x, train_y, opts); [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); %% ex6 neural net with sigmoid activation and plotting of validation and training error % split training data into training and validation data vx = train_x(1:10000,:); tx = train_x(10001:end,:); vy = train_y(1:10000,:); ty = train_y(10001:end,:); rand('state',0) nn = nnsetup([784 20 10]); nn.output = 'softmax'; % use softmax output opts.numepochs = 5; % Number of full sweeps through data opts.batchsize = 1000; % Take a mean gradient step over this many samples opts.plot = 1; % enable plotting nn = nntrain(nn, tx, ty, opts, vx, vy); % nntrain takes validation set as last two arguments (optionally) [er, bad] = nntest(nn, test_x, test_y); assert(er < 0.1, 'Too big error'); ``` [![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/rasmusbergpalm/deeplearntoolbox/trend.png)](https://bitdeli.com/free "Bitdeli Badge")
Deep Learning Toolbox™提供了一个框架,用于设计和实现具有算法,预训练模型和应用程序的深度神经网络。您可以使用卷积神经网络(ConvNets,CNN)和长期短期记忆(LSTM)网络对图像,时间序列和文本数据进行分类和回归。应用程序和图表可帮助您可视化激活,编辑网络体系结构以及监控培训进度。 对于小型训练集,您可以使用预训练的深层网络模型(包括SqueezeNet,Inception-v3,ResNet-101,GoogLeNet和VGG-19)以及从TensorFlow™-Keras和Caffe导入的模型执行传输学习。 了解深度学习工具箱的基础知识 深度学习图像 从头开始训练卷积神经网络或使用预训练网络快速学习新任务 使用时间序列,序列和文本进行深度学习 为时间序列分类,回归和预测任务创建和训练网络 深度学习调整和可视化 绘制培训进度,评估准确性,进行预测,调整培训选项以及可视化网络学习的功能 并行和云中的深度学习 通过本地或云中的多个GPU扩展深度学习,并以交互方式或批量作业培训多个网络 深度学习应用 通过计算机视觉,图像处理,自动驾驶,信号和音频扩展深度学习工作流程 深度学习导入,导出和自定义 导入和导出网络,定义自定义深度学习图层以及自定义数据存储 深度学习代码生成 生成MATLAB代码或CUDA ®和C ++代码和部署深学习网络 函数逼近和聚类 使用浅层神经网络执行回归,分类和聚类 时间序列和控制系统 基于浅网络的模型非线性动态系统; 使用顺序数据进行预测。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值