keras 受限玻尔兹曼机_资源 | 博士生开源深度学习C++库DLL：快速构建卷积受限玻尔兹曼机...

最新推荐文章于 2020-12-23 05:13:44 发布

weixin_39561179

最新推荐文章于 2020-12-23 05:13:44 发布

阅读量136

点赞数

文章标签： keras 受限玻尔兹曼机

本文链接：https://blog.csdn.net/weixin_39561179/article/details/111512327

版权

选自baptiste-wicht

机器之心编译

参与：刘晓坤、蒋思源

Baptiste Wicht公布了自己编写的深度学习库DLL1.0，可以通过C++接口使用。文中通过几个例子介绍了DLL调用全连接网络、DNN的能力，并通过实验和其它流行框架如TensorFlow、Keras、Torch和Caffe作了综合性能比较。

很高兴公布深度学习库 Deep Learning Library(DLL)1.0 的第一个版本。DLL 是一个神经网络库，致力于提供快速和易用的使用体验。

我从四年前为完成 Ph.D. 论文而开始搭建这个库。我需要一个好用的库来训练和使用受限玻尔兹曼机(RBMs)，而当时并没有这样的条件。因此，我决定自己编写。现在它能很完美的支持 RBM 和卷积 RBM(CRBM)模型。RBMs(或深度信念网络，DBNs)的堆栈可以用对比分歧(Contrastive Divergence)预训练，然后用 mini-batch 梯度下降或共轭梯度法进行微调，或者直接作为特征提取器。经过多年发展，该库已经扩展到可以处理人工神经网络(ANNs)和卷积神经网络(CNNs)了。其中网络还可以训练常规的自动编码器。还能使用多种高级层比如 Dropout 或 Batch 正则化，以及自适应学习率技术比如 Adadelta 和 Adam。这个库还能集成支持这几个数据集：MNIST，CIFAR-10 和 ImageNet。

这个库可以通过 C++接口使用，完全的仅有标头档(fully header-only)，需要 C++14 编译器，即至少需要 clang3.9 或 GCC6.3。

调用案例

我们先看看下面这个使用库的例子：

#include"dll/neural/dense_layer.hpp"

#include"dll/network.hpp"

#include"dll/datasets.hpp"

intmain(int/*argc*/,char*/*argv*/[]){

// Load the dataset

autodataset=dll::make_mnist_dataset(dll::batch_size<100>{},dll::normalize_pre{});

// Build the network

usingnetwork_t=dll::dyn_network_desc<

dll::network_layers<

dll::dense_layer<28*28,500>,

dll::dense_layer<500,250>,

dll::dense_layer<250,10,dll::softmax>

,dll::updater<:updater_type::nadam>// Nesterov Adam (NADAM)

,dll::batch_size<100>// The mini-batch size

,dll::shuffle// Shuffle before each epoch

>::network_t;

autonet=std::make_unique();

// Train the network for performance sake

net->fine_tune(dataset.train(),50);

// Test the network on test set

net->evaluate(dataset.test());

return0;

}

这个例子是在 MNIST 数据集上训练并测试的简单 3 层全连接神经网络。

首先，对于头文件(include)，你需要包含你将使用的层，在这个例子中只有密集层(dense layer)。然后，你需要包含 network.hpp，这是每一个网络的基本头文件。而且最后的标头就是数据集支持。

在 main 函数中，首先需要加载全部 MNIST 数据集，然后给出两个选项(该函数有一系列可选的参数)。在这里，我们设置批量大小，并指示每一个样本都必须归一化为平均值为 0，方差为 1。

之后是很重要的部分，即网络的声明。在 DLL 中，一个网络是一个类型(type)。类型有两个性质，层(layers，包含在 dll::network_layers 中)，和选项(options，一系列参数选项)，跟随层之后。在这个例子中，我们声明了三个层，第一层有 500 个隐藏单元，第二层有 250 个，而最后一层有 10 个。每一层都有一系列参数选项。最后一层使用 Softmax 激活函数，而不是默认的 Sigmoid 函数。网络本身有 3 个选项。我们将使用 Nesterov Adam(NAdam)优化器，批量大小为 100(必须等于前面数据集提取时声明的批量大小)，而在每一个 epoch 之前数据集将被重组(shuffled)。

然后，我们将简单地使用 std::make_unique 命令创建该网络，在训练集上训练 50 个 epoch，并在测试集上测试。

以下是该网络的代码：

Networkwith3layers

Dense(dyn):784->SIGMOID->500

Dense(dyn):500->SIGMOID->250

Dense(dyn):250->SOFTMAX->10

Totalparameters:519500

Dataset

Training:In-MemoryDataGenerator

Size:60000

Batches:600

AugmentedSize:60000

Testing:In-MemoryDataGenerator

Size:10000

Batches:100

AugmentedSize:10000

Trainthe networkwith"Stochastic Gradient Descent"

Updater:NADAM

Loss:CATEGORICAL_CROSS_ENTROPY

EarlyStop:Goal(error)

Withparameters:

epochs=50

batch_size=100

learning_rate=0.002

beta1=0.9

beta2=0.999

Epoch0/50-Classificationerror:0.03248Loss:0.11162Time3187ms

Epoch1/50-Classificationerror:0.02737Loss:0.08670Time3063ms

Epoch2/50-Classificationerror:0.01517Loss:0.04954Time3540ms

Epoch3/50-Classificationerror:0.01022Loss:0.03284Time2954ms

Epoch4/50-Classificationerror:0.00625Loss:0.02122Time2936ms

Epoch5/50-Classificationerror:0.00797Loss:0.02463Time2729ms

Epoch6/50-Classificationerror:0.00668Loss:0.02066Time2921ms

Epoch7/50-Classificationerror:0.00953Loss:0.02710Time2894ms

Epoch8/50-Classificationerror:0.00565Loss:0.01666Time2703ms

Epoch9/50-Classificationerror:0.00562Loss:0.01644Time2759ms

Epoch10/50-Classificationerror:0.00595Loss:0.01789Time2572ms

Epoch11/50-Classificationerror:0.00555Loss:0.01734Time2586ms

Epoch12/50-Classificationerror:0.00505Loss:0.01446Time2575ms

Epoch13/50-Classificationerror:0.00600Loss:0.01727Time2644ms

Epoch14/50-Classificationerror:0.00327Loss:0.00898Time2636ms

Epoch15/50-Classificationerror:0.00392Loss:0.01180Time2660ms

Epoch16/50-Classificationerror:0.00403Loss:0.01231Time2587ms

Epoch17/50-Classificationerror:0.00445Loss:0.01307Time2566ms

Epoch18/50-Classificationerror:0.00297Loss:0.00831Time2857ms

Epoch19/50-Classificationerror:0.00335Loss:0.01001Time2931ms

Epoch20/50-Classificationerror:0.00378Loss:0.01081Time2772ms

Epoch21/50-Classificationerror:0.00332Loss:0.00950Time2964ms

Epoch22/50-Classificationerror:0.00400Loss:0.01210Time2773ms

Epoch23/50-Classificationerror:0.00393Loss:0.01081Time2721ms

Epoch24/50-Classificationerror:0.00415Loss:0.01218Time2595ms

Epoch25/50-Classificationerror:0.00347Loss:0.00947Time2604ms

Epoch26/50-Classificationerror:0.00535Loss:0.01544Time3005ms

Epoch27/50-Classificationerror:0.00272Loss:0.00828Time2716ms

Epoch28/50-Classificationerror:0.00422Loss:0.01211Time2614ms

Epoch29/50-Classificationerror:0.00417Loss:0.01148Time2701ms

Epoch30/50-Classificationerror:0.00498Loss:0.01439Time2561ms

Epoch31/50-Classificationerror:0.00385Loss:0.01085Time2704ms

Epoch32/50-Classificationerror:0.00305Loss:0.00879Time2618ms

Epoch33/50-Classificationerror:0.00343Loss:0.00889Time2843ms

Epoch34/50-Classificationerror:0.00292Loss:0.00833Time2887ms

Epoch35/50-Classificationerror:0.00327Loss:0.00895Time2644ms

Epoch36/50-Classificationerror:0.00203Loss:0.00623Time2658ms

Epoch37/50-Classificationerror:0.00233Loss:0.00676Time2685ms

Epoch38/50-Classificationerror:0.00298Loss:0.00818Time2948ms

Epoch39/50-Classificationerror:0.00410Loss:0.01195Time2778ms

Epoch40/50-Classificationerror:0.00173Loss:0.00495Time2843ms

Epoch41/50-Classificationerror:0.00232Loss:0.00709Time2743ms

Epoch42/50-Classificationerror:0.00292Loss:0.00861Time2873ms

Epoch43/50-Classificationerror:0.00483Loss:0.01365Time2887ms

Epoch44/50-Classificationerror:0.00240Loss:0.00694Time2918ms

Epoch45/50-Classificationerror:0.00247Loss:0.00734Time2885ms

Epoch46/50-Classificationerror:0.00278Loss:0.00725Time2785ms

Epoch47/50-Classificationerror:0.00262Loss:0.00687Time2842ms

Epoch48/50-Classificationerror:0.00352Loss:0.01002Time2665ms

Epoch49/50-Classificationerror:0.00232Loss:0.00668Time2747ms

Restorethe best(error)weights from epoch40

Trainingtook142s

error:0.02040

loss:0.08889

首先正如代码中所示，是网络和数据集的展示，然后是网络的训练过程的每一个 epoch 的信息，最后是评估的结果。在大约 2 分半的时间内就能训练一个可以识别 MNIST 数字的网络，而错误率是 2.04%，这个结果不错，但还能继续优化。

简单介绍一下如何编译。可以直接使用 sudo make install_headers 命令下载 dll 库到你计算机 checked-out dll 文件夹上，然后使用一下命令对文件进行简单的编译:

clang++ -std=c++14 file.cpp

或者，如果需要将 dll 复制到本地的 dll 目录中，你需要具体说明头文件的文件夹：

clang++ -std=c++14 -Idll/include -Idll/etl/lib/include -dll/Ietl/include/ -Idll/mnist/include/ -Idll/cifar-10/include/ file.cpp

以下几个编译选项可以帮助你提升性能：

-DETL_PARALLEL：允许并行计算

-DETL_VECTORIZE_FULL：允许算法的完全向量化

-DETL_BLAS_MODE：将使该库 know about 一个 BLAS 库(比如 MKL)，你必须为 BLAS 库添加一个头文件选项和连接选项作为可选项。

-DETL_CUBLAS_MODE：使该库知道 NVIDIA cublas 是可用的，必须添加合适的选项(头文件目录和连接库)

-DETL_CUDNN_MODE：使该库知道 NVIDIA cudnn 是可用的，必须添加合适的选项(头文件目录和连接库)

-DETL_EGBLAS_MODE：使该库知道你安装了 etl-gpu-blas，必须添加合适的选项(头文件目录和连接库)

如果想要得到最佳的 CPU 性能，需要用到前面 3 个选项。如果想要得到最佳的 GPU 性能，需要用到后面 3 个选项。由于有些算法并不是完全在 GPU 上计算的，最好使用所有的选项。

接下来我们重复上述实验，但这次使用的是一个包含两个卷积层和两个池化层的卷积神经网络：

#include"dll/neural/conv_layer.hpp"

#include"dll/neural/dense_layer.hpp"

#include"dll/pooling/mp_layer.hpp"

#include"dll/network.hpp"

#include"dll/datasets.hpp"

#include"mnist/mnist_reader.hpp"

#include"mnist/mnist_utils.hpp"

intmain(int/*argc*/,char*/*argv*/[]){

// Load the dataset

autodataset=dll::make_mnist_dataset(dll::batch_size<100>{},dll::scale_pre<255>{});

// Build the network

usingnetwork_t=dll::dyn_network_desc<

dll::network_layers<

dll::conv_layer<1,28,28,8,5,5>,

dll::mp_2d_layer<8,24,24,2,2>,

dll::conv_layer<8,12,12,8,5,5>,

dll::mp_2d_layer<8,8,8,2,2>,

dll::dense_layer<8*4*4,150>,

dll::dense_layer<150,10,dll::softmax>

,dll::updater<:updater_type::nadam>// Momentum

,dll::batch_size<100>// The mini-batch size

,dll::shuffle// Shuffle the dataset before each epoch

>::network_t;

autonet=std::make_unique();

// Display the network and dataset

net->display();

dataset.display();

// Train the network

net->fine_tune(dataset.train(),25);

// Test the network on test set

net->evaluate(dataset.test());

return0;

}

比起之前的例子来看并没有太多变化。这个网络起始于一个卷积层，然后是一个池化层，然后又是一个卷积层和池化层，最后是两个全连接层。另一个区别是我们将输入除以 255((dll::scale_pre<255>{}))而不是归一化。最后，我们只训练了 25 个 epoch。

一旦进行编译和运行，结果将是如下所示的样子：

Networkwith6layers

Conv(dyn):1x28x28->(8x5x5)->SIGMOID->8x24x24

MP(2d):8x24x24->(2x2)->8x12x12

Conv(dyn):8x12x12->(8x5x5)->SIGMOID->8x8x8

MP(2d):8x8x8->(2x2)->8x4x4

Dense(dyn):128->SIGMOID->150

Dense(dyn):150->SOFTMAX->10

Totalparameters:21100

Dataset

Training:In-MemoryDataGenerator

Size:60000

Batches:600

AugmentedSize:60000

Testing:In-MemoryDataGenerator

Size:10000

Batches:100

AugmentedSize:10000

Trainthe networkwith"Stochastic Gradient Descent"

Updater:NADAM

Loss:CATEGORICAL_CROSS_ENTROPY

EarlyStop:Goal(error)

Withparameters:

epochs=25

batch_size=100

learning_rate=0.002

beta1=0.9

beta2=0.999

Epoch0/25-Classificationerror:0.09392Loss:0.31740Time7298ms

Epoch1/25-Classificationerror:0.07005Loss:0.23473Time7298ms

Epoch2/25-Classificationerror:0.06915Loss:0.22532Time7364ms

Epoch3/25-Classificationerror:0.04750Loss:0.15286Time7787ms

Epoch4/25-Classificationerror:0.04082Loss:0.13191Time7377ms

Epoch5/25-Classificationerror:0.03258Loss:0.10283Time7334ms

Epoch6/25-Classificationerror:0.03032Loss:0.09791Time7304ms

Epoch7/25-Classificationerror:0.02727Loss:0.08453Time7345ms

Epoch8/25-Classificationerror:0.02410Loss:0.07641Time7443ms

Epoch9/25-Classificationerror:0.02448Loss:0.07612Time7747ms

Epoch10/25-Classificationerror:0.02023Loss:0.06370Time7626ms

Epoch11/25-Classificationerror:0.01920Loss:0.06194Time7364ms

Epoch12/25-Classificationerror:0.01810Loss:0.05851Time7391ms

Epoch13/25-Classificationerror:0.01575Loss:0.05074Time7316ms

Epoch14/25-Classificationerror:0.01542Loss:0.04826Time7365ms

Epoch15/25-Classificationerror:0.01392Loss:0.04574Time7634ms

Epoch16/25-Classificationerror:0.01287Loss:0.04061Time7367ms

Epoch17/25-Classificationerror:0.01167Loss:0.03779Time7381ms

Epoch18/25-Classificationerror:0.01202Loss:0.03715Time7495ms

Epoch19/25-Classificationerror:0.00967Loss:0.03268Time7359ms

Epoch20/25-Classificationerror:0.00955Loss:0.03012Time7344ms

Epoch21/25-Classificationerror:0.00853Loss:0.02809Time7314ms

Epoch22/25-Classificationerror:0.00832Loss:0.02834Time7329ms

Epoch23/25-Classificationerror:0.00807Loss:0.02603Time7336ms

Epoch24/25-Classificationerror:0.00682Loss:0.02327Time7335ms

Trainingtook186s

error:0.01520

loss:0.05183

这个网络比之前的稍微要好一些，在 3 分钟的时间里达到了 1.52% 的错误率。如果你感兴趣的话，可以在 Github 中找到更多的例子。

性能

如果你看过我最新的博客，那么你可能已经看过以下部分信息，但我仍然想在这里强调一下。我在 DLL 库的性能表现上做了大量工作。我决定将 DLL 的性能和流行的框架如 TensorFlow、Keras、Torch 和 Caffe 做个对比。我也试过 DeepLearning4J，但出现了很多问题使我不得不先放弃它。如果有人对其中的结果有兴趣我也可以在某个网站发布。所有的框架都以默认选项安装，并且都可以使用 MKL。

第一个实验是在 MNIST 数据集上训练一个 3 层网络：

对于 CPU，DLL 训练这个网络是最快的。比 TensorFlow 和 Keras 快大约 35%，比 Torch 快 4 倍，比 Caffe 快 5 倍。而对于 GPU，Caffe 是最快的，紧接着是 Keras，TensorFlow 和 DLL，而 Torch 是最慢的。

以下是在同样的任务中使用 CNN 训练的结果：

再一次，对于 CPU，DLL 是最快的，非常明显，比起 TensorFlow 和 Keras 快 4 倍，比 Torch 和 Caffe 快 5 倍。对于 GPU，DLL 和 TensorFlow 以及 Keras 持平，比 Caffe 快 3 倍，比 Torch 快 5 倍。

以下是在 CIFAR-10 上用更大的 CNN 训练的结果：

在更大的 CNN 中，区别没有之前的那么明显，尽管如此，对于 CPU，DLL 仍然是最快的，比 TensorFlow、Keras 和 Torch 快两倍，比 Caffe 快 3 倍。对于 GPU，DLL 比 TensorFlow 和 Keras 稍快，比 Caffe 快 2.7 倍，比 Torch 快 5 倍。

最后一个实验是在 Imagenet 上用 12 层的 CNN 训练。mini-batch 设置为 128。

DLL 无论是在 CPU 还是 GPU 上都比其它所有框架要快。DLL 和 TensorFlow、Keras 的最大的不同主要是由于用 Python 代码读取 ImageNet 的图片的能力很差，而 DLL 中的代码已经优化过。

综上，在所有的实验中，DLL 的 CPU 计算都是最快的。对于 GPU，除了超小型全连接神经网络，DLL 也总是最快的，和 TensorFlow、Keras 并驾齐驱。

如果感兴趣，可以在这里找到实验的代码：https://github.com/wichtounet/frameworks

下一步

我并不知道下一个版本的 DLL 将具体包括哪些函数，但我知道它在以后的发展方向。

我真的希望能使用 DLL 执行文本分类任务。计划第一步将支持文本的嵌入学习，并在嵌入上使用 CNN。我同样计划添加支持合并 CNN 层的能力，从而我们能使用各种大小的滤波器，希望第一步不要花太多时间。第二步希望将循环神经网络(RNN)纳入该框架中。当然首先只会支持简单的 RNN 单元，但后来会添加 LSTM 单元和 GRU 单元的支持。这一部分肯定需要很长的时间，但我真的希望能通过这个理解这些循环神经网络的原理到底是什么。

我关注的下一件事是该神经网络库的文档构建。当然现在使用案例来了解各种函数的用法是比较好的，但还是需要列出可能的神经网络层函数和它们所有的可选参数。我还希望能有更多的实现案例，特别是当添加嵌入和 RNN 支持的时候。

此外，虽然性能一般来说还是不错的，但还有一些地方需要改进。例如目前很多运算(如批量归一化和 Dropout)在 GPU 上是比较低效的，我希望所有运算在 GPU 中都能高效地执行。还有一些运算如批量归一化或 SGD 优化器等在 CPU 上运行比较低效，所以我还需要解决这一些问题。理想的情况是，即使不使用性能库，DLL 也能表现的比较好。

最后，我还希望能提升编译时间。虽然最近的修正已经令 DLL 程序的编译过程更加快速，但我还希望取得更快的速度。

下载 DLL

读者可以在 GitHub 中下载 DLL，如果你们对 1.0 版本比较感兴趣，可以直接查看发布页面(Releases pages)或复制 tag 1.0。下面还有一些分支：

master 是永远的开发分支，可能并不是太稳定

stable 分支永远指向最新的 tag，并不会经常更新

对于未来的版本，总会有 tag 指向对应的 commits，你可以通过 GitHub 或发布的 tag 访问以前的版本。

对于文档，当前最好的文档说明是目前可用的实现案例。你可以查看测试案例的源代码，其中该软件库每一个函数都得到了使用。如果这一次开发的神经网络库得到较多的关注，后面我们将关注文档的构建。

原文链接：https://baptiste-wicht.com/posts/2017/10/deep-learning-library-10-fast-neural-network-library.html

✄------------------------------------------------

加入机器之心(全职记者/实习生)：hr@jiqizhixin.com

投稿或寻求报道：content@jiqizhixin.com

广告&商务合作：bd@jiqizhixin.com

weixin_39561179

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
keras 受限玻尔兹曼机_资源 | 博士生开源深度学习C++库DLL：快速构建卷积受限玻尔兹曼机...

选自baptiste-wicht机器之心编译参与：刘晓坤、蒋思源Baptiste Wicht公布了自己编写的深度学习库DLL1.0，可以通过C++接口使用。文中通过几个例子介绍了DLL调用全连接网络、DNN的能力，并通过实验和其它流行框架如TensorFlow、Keras、Torch和Caffe作了综合性能比较。很高兴公布深度学习库DeepLearningLibrary(DLL)1.0的第...
复制链接

扫一扫

keras 受限玻尔兹曼机_资源 | 博士生开源深度学习C++库DLL：快速构建卷积受限玻尔兹曼机...

“相关推荐”对你有帮助么？