2016--MatConvNet Convolutional Neural Networks for MATLAB

Abstract
摘要
MatConvNet is an implementation of Convolutional Neural Networks (CNNs) for MATLAB. The toolbox is designed with an emphasis on simplicity and flexibility. It exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing linear convolutions with filter banks, feature pooling, and many more. In this manner, MatConvNet allows fast prototyping of new CNN architec- tures; at the same time, it supports efficient computation on CPU and GPU allowing to train complex models on large datasets such as ImageNet ILSVRC. This document provides an overview of CNNs and how they are implemented in MatConvNet and gives the technical details of each computational block in the toolbox.
MatConvNet是卷积神经网络(CNNs)在MATLAB中的一种实现。工具箱的设计强调简单性和灵活性。它将CNNs的构建块公开为易于使用的MATLAB函数,提供了使用滤波器组计算线性卷积、特征池等的例程。通过这种方式,MatConvNet允许对新的CNN架构进行快速原型设计;同时,它支持CPU和GPU上的高效计算,允许在大型数据集(如ImageNet ILSVRC)上训练复杂模型。本文概述了cnn及其在MatConvNet中的实现方式,并给出了工具箱中每个计算块的技术细节。

Introduction

MatConvNet is a MATLAB toolbox implementing Convolutional Neural Networks (CNN) for computer vision applications. Since the breakthrough work of [7], CNNs have had a major impact in computer vision, and image understanding in particular, essentially replacing traditional image representations such as the ones implemented in our own VLFeat [11] open source library.
MatConvNet是一个用于计算机视觉应用的实现卷积神经网络(CNN)的MATLAB工具箱。自从[7]的突破性工作以来,CNNs在计算机视觉,特别是图像理解方面产生了重大影响,基本上取代了传统的图像表示,例如在我们自己的VLFeat[11]开源库中实现的图像表示。

While most CNNs are obtained by composing simple linear and non-linear filtering op- erations such as convolution and rectification, their implementation is far from trivial. The reason is that CNNs need to be learned from vast amounts of data, often millions of images, requiring very efficient implementations. As most CNN libraries, MatConvNet achieves this by using a variety of optimizations and, chiefly, by supporting computations on GPUs.
尽管大多数cnn是通过组合简单的线性和非线性滤波操作(如卷积和校正)获得的,但它们的实现却远不是简单的。原因是cnn需要从大量数据(通常是数百万张图像)中学习,需要非常高效的实现。作为大多数CNN库,MatConvNet通过使用各种优化来实现这一点,主要是通过支持gpu上的计算。

Numerous other machine learning, deep learning, and CNN open source libraries exist. To cite some of the most popular ones: CudaConvNet,1 Torch,2 Theano,3 and Caffe4. Many of these libraries are well supported, with dozens of active contributors and large user bases. Therefore, why creating yet another library?
还有许多其他的机器学习、深度学习和CNN开源库。举几个最受欢迎的:CudaConvNet、1 Torch、2 Theano、3和Caffe4。这些库中的许多都得到了很好的支持,有许多活跃的贡献者和庞大的用户群。因此,为什么还要创建另一个库?
1https://code.google.com/p/cuda-convnet/
2 http://cilvr.nyu.edu/doku.php?id=code:start
3 http://deeplearning.net/software/theano/
4 http://caffe.berkeleyvision.org

The key motivation for developing MatConvNet was to provide an environment par- ticularly friendly and efficient for researchers to use in their investigations.5 MatConvNet achieves this by its deep integration in the MATLAB environment, which is one of the most popular development environments in computer vision research as well as in many other areas. In particular, MatConvNet exposes as simple MATLAB commands CNN building blocks such as convolution, normalisation and pooling (chapter 4); these can then be combined and extended with ease to create CNN architectures. While many of such blocks use optimised CPU and GPU implementations written in C++ and CUDA (section section 1.4), MATLAB native support for GPU computation means that it is often possible to write new blocks in MATLAB directly while maintaining computational efficiency. Compared to writing new CNN components using lower level languages, this is an important simplification that can significantly accelerate testing new ideas. Using MATLAB also provides a bridge towards other areas; for instance, MatConvNet was recently used by the University of Arizona in planetary science, as summarised in this NVIDIA blogpost.6
开发MatConvNet的关键动机是为研究人员提供一个特别友好和高效的环境,供他们在研究中使用。MatConvNet通过在MATLAB环境中的深度集成实现了这一点,MatConvNet是计算机视觉研究中最受欢迎的开发环境之一,在许多其他环境中也是如此地区。具体来说,MatConvNet公开了简单的MATLAB命令CNN构建块,如卷积、标准化和池(第4章);然后可以轻松地组合和扩展这些命令,以创建CNN体系结构。虽然许多这样的块使用优化的CPU和GPU实现在C++和CUDA(第1.4节)中编写,但是MATLAB对GPU计算的本地支持意味着在MATLAB中经常可以直接写入新的块,同时保持计算效率。与使用低级语言编写新的CNN组件相比,这是一个重要的简化,可以显著加快测试新思想的速度。使用MATLAB也提供了一个通向其他领域的桥梁;例如,最近亚利桑那大学在行星科学领域使用MatConvNet,如NVIDIA博客中所总结的.

5 While from a user perspective MatConvNet currently relies on MATLAB, the library is being devel-
从用户的角度来看,MatConvNet目前依赖于MATLAB,这个库正在开发中-
oped with a clean separation between MATLAB code and the C++ and CUDA core; therefore, in the future the library may be extended to allow processing convolutional networks independently of MATLAB.
OPED与MATLAB代码和C++和CUDA核心之间的干净分离,因此,在未来的图书馆可以扩展到允许卷积网络独立于MATLAB处理。

MatConvNet can learn large CNN models such AlexNet [7] and the very deep net- works of [9] from millions of images. Pre-trained versions of several of these powerful models can be downloaded from the MatConvNet home page7. While powerful, MatConvNet remains simple to use and install. The implementation is fully self-contained, requiring only MATLAB and a compatible C++ compiler (using the GPU code requires the freely-available CUDA DevKit and a suitable NVIDIA GPU). As demonstrated in fig. 1.1 and section 1.1, it is possible to download, compile, and install MatConvNet using three MATLAB com- mands. Several fully-functional examples demonstrating how small and large networks can be learned are included. Importantly, several standard pre-trained network can be immedi- ately downloaded and used in applications. A manual with a complete technical description of the toolbox is maintained along with the toolbox.8 These features make MatConvNet useful in an educational context too.9
MatConvNet可以从数百万张图片中学习大型CNN模型,比如AlexNet[7]和非常深入的网络作品[9]。可以从MatConvNet主页7下载这些功能强大的机型的预训版本。虽然功能强大,MatConvNet仍然易于使用和安装。该实现完全是自包含的,只需要MATLAB和兼容的C++编译器(使用GPU代码需要自由可用的CUDA DEVITKit和合适的NVIDIA GPU)。如图1.1和第1.1节所示,可以使用三个MATLAB命令下载、编译和安装MatConvNet。包括几个演示如何学习小型和大型网络的全功能示例。重要的是,一些标准的预先训练的网络可以立即下载并用于应用程序中。一本包含工具箱完整技术说明的手册与工具箱一起维护。8这些特性使MatConvNet在教育环境中也很有用。9
MatConvNet is open-source released under a BSD-like license. It can be downloaded from http://www.vlfeat.org/matconvnet as well as from GitHub.10.
MatConvNet是在类似BSD的许可下发布的开源软件。它可以从http://www.vlfeat.org/matconvnet和GitHub.10下载。

1.1 Getting started

MatConvNet is simple to install and use. fig. 1.1 provides a complete example that clas- sifies an image using a latest-generation deep convolutional neural network. The example includes downloading MatConvNet, compiling the package, downloading a pre-trained CNN model, and evaluating the latter on one of MATLAB’s stock images.
MatConvNet易于安装和使用。图1.1提供了使用最新一代深卷积神经网络分类图像的完整示例。这个例子包括下载MatConvNet,编译包,下载一个预先训练好的CNN模型,并在一个MATLAB的股票图像上评估后者。

The key command in this example is vl_simplenn, a wrapper that takes as input the CNN net and the pre-processed image im_ and produces as output a structure res of results. This particular wrapper can be used to model networks that have a simple structure, namely a chain of operations. Examining the code of vl_simplenn (edit vl_simplenn in MatCon- vNet) we note that the wrapper transforms the data sequentially, applying a number of MATLAB functions as specified by the network configuration. These function, discussed in detail in chapter 4, are called “building blocks” and constitute the backbone of MatCon- vNet.
本例中的关键命令是vl_simplenn,它是一个包装器,以CNN网络和预处理的图像im廑作为输入,并生成结果的结构res。这个特定的包装器可以用来为具有简单结构(即操作链)的网络建模。检查vl_simplenn(在MatCon-vNet中编辑vl_simplenn)的代码时,我们注意到包装器按顺序转换数据,应用网络配置指定的许多MATLAB函数。这些功能在第4章中详细讨论,称为“构建块”,构成了MatCon-vNet的主干。

While most blocks implement simple operations, what makes them non trivial is their efficiency (section 1.4) as well as support for backpropagation (section 2.3) to allow learning CNNs. Next, we demonstrate how to use one of such building blocks directly. For the sake of the example, consider convolving an image with a bank of linear filters. Start by reading an image in MATLAB, say using im = single(imread(‘peppers.png’)), obtaining a H × W × D array im, where D = 3 is the number of colour channels in the image. Then create a bank of K = 16 random filters of size 3 × 3 using f = randn(3,3,3,16,‘single’). Finally, convolve the image with the filters by using the command y = vl_nnconv(x,f,[]). This results in an array y with K channels, one for each of the K filters in the bank.
虽然大多数块实现简单的操作,但使它们不平凡的是它们的效率(第1.4节)以及对反向传播(第2.3节)的支持,以允许学习cnn。接下来,我们演示如何直接使用其中一个构建块。为了这个例子,考虑用一组线性滤波器卷积图像。首先在MATLAB中读取图像,比如使用im=single(imread(‘peppers.png’),得到一个H×W×D数组im,其中D=3是图像中的颜色通道数。然后使用f=randn(3,3,3,16,“single”)创建一个K=16的大小为3×3的随机滤波器组。最后,使用命令y=vl_nnconv(x,f,[])将图像与过滤器卷积。这就产生了一个带K个通道的数组y,每个通道对应一个滤波器组中的K个滤波器。

While users are encouraged to make use of the blocks directly to create new architectures, MATLAB provides wrappers such as vl_simplenn for standard CNN architectures such as AlexNet [7] or Network-in-Network [8]. Furthermore, the library provides numerous examples (in the examples/ subdirectory), including code to learn a variety of models on the MNIST, CIFAR, and ImageNet datasets. All these examples use the examples/cnn_train training code, which is an implementation of stochastic gradient descent (section 3.3). While this training code is perfectly serviceable and quite flexible, it remains in the examples/ subdirec- tory as it is somewhat problem-specific. Users are welcome to implement their optimisers.
虽然鼓励用户直接使用块来创建新的架构,但是MATLAB为标准CNN架构(如AlexNet[7]或Network in Network[8])提供了vl_simplenn等包装器。此外,该库还提供了许多示例(在examples/子目录中),包括学习MNIST、CIFAR和ImageNet数据集上的各种模型的代码。所有这些例子都使用examples/cnn_训练代码,这是随机梯度下降的一种实现(第3.3节)。尽管此培训代码完全可用且非常灵活,但它仍保留在示例/子记录中,因为它有点特定于问题。欢迎用户实施他们的优化。

1.2 MatConvNet at a glance

MatConvNet has a simple design philosophy. Rather than wrapping CNNs around complex layers of software, it exposes simple functions to compute CNN building blocks, such as linear convolution and ReLU operators, directly as MATLAB commands. These building blocks are easy to combine into complete CNNs and can be used to implement sophisticated learning algorithms. While several real-world examples of small and large CNN architectures and training routines are provided, it is always possible to go back to the basics and build your own, using the efficiency of MATLAB in prototyping. Often no C coding is required at all to try new architectures. As such, MatConvNet is an ideal playground for research in computer vision and CNNs.
MatConvNet有一个简单的设计理念。它没有将CNN包装在复杂的软件层上,而是将计算CNN构建块的简单函数(如线性卷积和ReLU运算符)直接作为MATLAB命令公开。这些构造块易于组合成完整的cnn,并可用于实现复杂的学习算法。虽然提供了一些大小CNN架构和训练例程的真实示例,但始终可以回到基础并构建自己的,在原型中使用MATLAB的效率。尝试新的架构通常根本不需要C代码。因此,MatConvNet是计算机视觉和CNNs研究的理想场所。

MatConvNet contains the following elements:

CNN computational blocks. A set of optimized routines computing fundamental building blocks of a CNN. For example, a convolution block is implemented by y=vl_nnconv(x,f,b) where x is an image, f a filter bank, and b a vector of biases (sec- tion 4.1). The derivatives are computed as [dzdx,dzdf,dzdb] = vl_nnconv(x,f,b,dzdy) where dzdy is the derivative of the CNN output w.r.t y (section 4.1). chapter 4 de- scribes all the blocks in detail.
CNN计算块。一组优化的程序,计算CNN的基本组成部分。例如,卷积块由y=vl_nconv(x,f,b)实现,其中x是图像,f是滤波器组,b是偏差向量(第4.1节)。导数计算为[dzdx,dzdf,dzdb]=vl_nconv(x,f,b,dzdy),其中dzdy是CNN输出w.r.t y的导数(第4.1节)。第四章详细描述了所有的方块。

CNN wrappers. MatConvNet provides a simple wrapper, suitably invoked by vl_simplenn, that implements a CNN with a linear topology (a chain of blocks). It also provides a much more flexible wrapper supporting networks with arbitrary topologies, encapsulated in the dagnn.DagNN MATLAB class.
CNN包装。MatConvNet提供了一个简单的包装器,由vl_simplenn适当调用,它实现了具有线性拓扑(块链)的CNN。它还提供了一个更加灵活的包装器,支持具有任意拓扑的网络,封装在dagnn.dagnn MATLAB类中。

• Example applications. MatConvNet provides several examples of learning CNNs with stochastic gradient descent and CPU or GPU, on MNIST, CIFAR10, and ImageNet data.
•示例应用程序。MatConvNet提供了在MNIST、CIFAR10和ImageNet数据上学习具有随机梯度下降和CPU或GPU的cnn的几个例子。

• Pre-trained models. MatConvNet provides several state-of-the-art pre-trained CNN models that can be used off-the-shelf, either to classify images or to produce image encodings in the spirit of Caffe or DeCAF.
•经过 训练的model。MatConvNet提供了几种最先进的预先训练的CNN模型,可以在现成的情况下使用,既可以对图像进行分类,也可以根据Caffe或DeCAF的精神生成图像编码。

1.3 Documentation and examples

There are three main sources of information about MatConvNet. First, the website con- tains descriptions of all the functions and several examples and tutorials.11 Second, there is a PDF manual containing a great deal of technical details about the toolbox, including detailed mathematical descriptions of the building blocks. Third, MatConvNet ships with several examples (section 1.1).
关于MatConvNet的信息主要有三个来源。首先,网站包含所有功能的描述和一些示例和教程。11其次,有一个PDF手册,其中包含关于工具箱的大量技术细节,包括构建块的详细数学描述。第三,MatConvNet提供了几个例子(第1.1节)。
Most examples are fully self-contained. For example, in order to run the MNIST example, it suffices to point MATLAB to the MatConvNet root directory and type addpath ← examples followed by cnn_mnist. Due to the problem size, the ImageNet ILSVRC example requires some more preparation, including downloading and preprocessing the images (using the bundled script utils/preprocess−imagenet.sh). Several advanced examples are included as well. For example, fig. 1.2 illustrates the top-1 and top-5 validation errors as a model similar to AlexNet [7] is trained using either standard dropout regularisation or the recent batch normalisation technique of [3]. The latter is shown to converge in about one third of the epochs (passes through the training data) required by the former.
大多数例子都是完全独立的。例如,为了运行MNIST示例,只需将MATLAB指向MatConvNet根目录,然后键入addpath☆examples和cnn_MNIST即可。由于问题的大小,ImageNet ILSVRC示例需要更多的准备工作,包括下载和预处理图像(使用捆绑的脚本utils/preprocess-ImageNet.sh)。还包括几个高级示例。例如,图1.2示出了前1和前5验证错误,因为类似于AlexNet的模型[7]是使用标准辍学正则化或最近的批处理规范化技术[3]训练的。后者被证明在大约三分之一的时间段内收敛(通过训练数据)。
The MatConvNet website contains also numerous pre-trained models, i.e. large CNNs trained on ImageNet ILSVRC that can be downloaded and used as a starting point for many other problems [1]. These include: AlexNet [7], VGG-S, VGG-M, VGG-S [1], and VGG-VD- 16, and VGG-VD-19 [10]. The example code of fig. 1.1 shows how one such model can be used in a few lines of MATLAB code
MatConvNet网站还包含许多预先训练的模型,即在ImageNet ILSVRC上训练的大型cnn,可以下载并用作许多其他问题的起点[1]。其中包括:AlexNet[7]、VGG-S、VGG-M、VGG-S[1]、VGG-VD-16和VGG-VD-19[10]。图1.1的示例代码显示了如何在几行MATLAB代码中使用这样的模型

11See also http://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html.

1.4 Speed

Efficiency is very important for working with CNNs. MatConvNet supports using NVIDIA GPUs as it includes CUDA implementations of all algorithms (or relies on MATLAB CUDA support).
效率对于使用CNNs非常重要。MatConvNet支持使用NVIDIA gpu,因为它包括所有算法的CUDA实现(或依赖于MATLAB CUDA支持)。
To use the GPU (provided that suitable hardware is available and the toolbox has been compiled with GPU support), one simply converts the arguments to gpuArrays in MATLAB, as in y = vl_nnconv(gpuArray(x), gpuArray(w), []). In this manner, switching between CPU and GPU is fully transparent. Note that MatConvNet can also make use of the NVIDIA CuDNN library with significant speed and space benefits.
要使用GPU(只要有合适的硬件可用,并且工具箱已使用GPU支持进行编译),只需在MATLAB中将参数转换为gpuArray,如y=vl_nconv(gpuArray(x),gpuArray(w),[])。以这种方式,CPU和GPU之间的切换是完全透明的。请注意,MatConvNet还可以使用NVIDIA CuDNN库,具有显著的速度和空间优势。
Next we evaluate the performance of MatConvNet when training large architectures on the ImageNet ILSVRC 2012 challenge data [2]. The test machine is a Dell server with two Intel Xeon CPU E5-2667 v2 clocked at 3.30 GHz (each CPU has eight cores), 256 GB of RAM, and four NVIDIA Titan Black GPUs (only one of which is used unless otherwise noted). Experiments use MatConvNet beta12, CuDNN v2, and MATLAB R2015a. The data is preprocessed to avoid rescaling images on the fly in MATLAB and stored in a RAM disk for faster access. The code uses the vl_imreadjpeg command to read large batches of JPEG images from disk in a number of separate threads. The driver examples/cnn_imagenet.m is used in all experiments.
接下来,我们评估MatConvNet在ImageNet ILSVRC 2012挑战数据上训练大型架构时的性能[2]。测试机器是一台戴尔服务器,有两个Intel Xeon CPU E5-2667 v2,时钟为3.30 GHz(每个CPU有八个内核)、256 GB RAM和四个NVIDIA Titan黑色GPU(除非另有说明,否则仅使用其中一个)。实验使用MatConvNet beta12、CuDNN v2和MATLAB R2015a,在MATLAB中对数据进行预处理,避免了图像的动态缩放,并将数据存储在RAM磁盘中,以提高访问速度。该代码使用vl_imreadjpeg命令,在多个独立线程中从磁盘读取大量JPEG图像。在所有的实验中都使用了driver examples/cnn_imagenet.m。
We train the models discussed in section 1.3 on ImageNet ILSVRC. table 1.1 reports the training speed as number of images per second processed by stochastic gradient descent. AlexNet trains at about 264 images/s with CuDNN, which is about 40% faster than the vanilla GPU implementation (using CuBLAS) and more than 10 times faster than using the CPUs. Furthermore, we note that, despite MATLAB overhead, the implementation speed is comparable to Caffe (they report 253 images/s with CuDNN and a Titan – a slightly slower GPU than the Titan Black used here). Note also that, as the model grows in size, the size of a SGD batch must be decreased (to fit in the GPU memory), increasing the overhead impact somewhat.
我们在ImageNet ILSVRC上训练第1.3节中讨论的模型。表1.1以随机梯度下降每秒处理的图像数报告训练速度。AlexNet使用CuDNN以大约264个图像/秒的速度进行训练,比普通GPU实现(使用CuBLAS)快40%,比使用cpu快10倍以上。此外,我们注意到,尽管有MATLAB的开销,但实现速度与Caffe相当(他们报告了253张带有CuDNN和Titan的图像/秒,比这里使用的Titan Black的GPU稍慢)。还需要注意的是,随着模型大小的增加,SGD批处理的大小必须减小(以适应GPU内存),从而在一定程度上增加了开销影响。
table 1.2 reports the speed on VGG-VD-16, a very large model, using multiple GPUs. In this case, the batch size is set to 264 images. These are further divided in sub-batches of 22 images each to fit in the GPU memory; the latter are then distributed among one to four GPUs on the same machine. While there is a substantial communication overhead, training speed increases from 20 images/s to 45. Addressing this overhead is one of the medium term goals of the library.
表1.2报告了VGG-VD-16上使用多个gpu的速度,VGG-VD-16是一个非常大的型号。在这种情况下,批大小设置为264个图像。这些图像进一步分为22个图像的子批,每个图像都放入GPU内存中;然后将后者分发到同一台计算机上的一到四个GPU中。虽然通信开销很大,但训练速度从20个图像/秒提高到45个。解决这一开销是图书馆的中期目标之一。

1.5 Acknowledgments

MatConvNet is a community project, and as such acknowledgements go to all contributors. We kindly thank NVIDIA supporting this project by providing us with top-of-the-line GPUs and MathWorks for ongoing discussion on how to improve the library.
MatConvNet是一个社区项目,因此所有贡献者都会得到确认。感谢英伟达为本项目提供的最先进的GPU和MathWorks,帮助我们不断讨论如何改进图书馆。
The implementation of several CNN computations in this library are inspired by the Caffe library [5] (however, Caffe is not a dependency). Several of the example networks have been trained by Karen Simonyan as part of [1] and [10].
在这个库中,几个CNN计算的实现受到Caffe库的启发[5](但是,Caffe不是依赖项)。凯伦·西蒙尼亚在[1]和[10]中训练了几个示例网络。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值