【阅读笔记】AutoEncoder by Forest+Deep Forest+Ladder Networks+PU learning

本文探讨了深度森林(Deep Forest)的gcForest方法,它借鉴了深度学习的特性并用随机森林构建多层模型。此外,文章还介绍了基于森林的自编码器(AutoEncoder by Forest),展示其在重建误差和训练速度上的优势,以及Ladder Networks如何结合有监督和无监督学习。最后,文章简述了Positive and Unlabeled (PU)学习算法,用于处理仅有正例和无标注数据的分类问题。
摘要由CSDN通过智能技术生成

这周看的文章,感觉都还挺有意思,但是实用价值一般,就简要的存个档。

前两篇时周志华的深度森林3弹的前两弹,第三弹以前介绍过,我也是跟风读了读,毕竟周志华知名度还是比较高的,感觉还是挺有想法的,但是距离实用还是有一些距离。总体而言,第一篇借鉴神经网络多层的想法,然随机森林也搞成多层,每层也做表征学习。第二篇是自编码,用最大兼容规则(MCR)重构出原图。第三篇是借鉴反向传播,构造逆函数,把误差传回去更精确的学习。

Ladder Network和PU learning(Positive and Unlabeled Learning)都属于半监督学习,都是标注数据占很少的一部分的情况。

Deep Forest

作者:
Zhi-Hua Zhou, Ji Feng
National Key Laboratory for Novel Software Technology,
Nanjing University, Nanjing 210023, China
{zhouzh, fengj}@lamda.nju.edu.cn

Abstract

探索对于不可微模型构造深度模型。
深度学习成功背后的原因是有这三个特性: layer-by-layer processing, in-model feature transformation and sufficient model complexity.
我们提出的gcForest在生成深度模型的同时保持了深度学习的三个特性。
而且树模型与深度学习相比,有更少的超参数,这可以使得模型调参工作大大减轻。

1 Introduction

2 Inspiration

2.1 Inspiration from DNNs

representation learning(layer-by-layer processing)

2.2 Inspiration from Ensemble Learning

It is well known that an ensemble can usually achieve better generalization performance than single learners.
To construct a good ensemble, the individual learners should be accurate and
diverse.

3 The gcForest Approach

3.1 Cascade Forest Structure

we include different types of forests to encourage the diversity, because diversity is crucial for ensemble construction.
随机森林+极端随机数->每一个类别的概率+原始输入作为下一层的特征。最后一层再平均

3.2 Multi-Grained Scanning

多个窗口扫描作为特征输入,产生的结果作为输出concatenate到后面作为那一层的输出。

4 Experiments

4.1 Configuration

In all experiments gcForest is using the same cascade structure: Each level consists of 4 completely-random tree forests and 4 random forests, each containing 500 trees.
Three-fold cross validation is used for class vector generation.
The number of cascade levels is automatically determined.
把数据分为80%的growing set和20%的estimating set,当在estimating set准确率不在提升时,层数停止生长。
For d d raw features, we use feature windows with sizes of [ d / 16 ] , [ d / 8 ] , [ d / 4 ]

4.2 Results

Image Categorization

Comparison of test accuracy on MNIST

gcForest 99.26%
LeNet-5 99.05%
Deep Belief Net 98.75%
SVM (rbf kernel) 98.60%
Random Forest 96.80%

Comparison of test accuracy on CIFAR-10

ResNet 93.57%
AlexNet 83.00%
gcForest(gbdt) 69.00%
gcForest(5grains) 63.37%
Deep Belief Net 62.20%
gcForest(default) 61.78%
Random Forest 50.17%
MLP 42.20%
Logistic Regression 37.32%
SVM (linear kernel) 16.32%

AutoEncoder by Forest

Abstract

Experiments show that, compared with DNN autoencoders, eForest is able to obtain lower reconstruction error with fast training speed, while the model itself is reusable and damage-tolerable.

1. Introduction

In this paper, we present the EncoderForest, (abbrv. eForest), by enabling a tree ensemble to perform forward encoding and backward decoding operations and can be trained in both supervised or unsupervised fashion. Experiments showed the eForest approach has the following advantages:

  • Accurate: Its experimental reconstruction error is lower than a MLP or CNN based auto-encoders.
  • Efficient: eForest on a single KNL (many-core CPU) runs even faster than a CNN auto-encoder runs on a Titan-X GPU for training.
  • Damage-tolerable: The trained model works well even when it is partially damaged.
  • Reusable: A model trained from one datase

3. The Proposed Method

AutoEncoder有两个基本功能:编码和解码。对于随机森林来说,编码是没有困难的,因为叶节点的信息可以被看作是一种编码;更不用说,节点的子集甚至是路径的分支都可以提供更多的编码信息。

eForest编码过程:给定训练好的森基森林,把输入数据输入到森林的每颗树中,将每个数结果叶子结点的索引的集合作为编码特征。该编码过程与如何分割树节点的特定学习规

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值